-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lillygo T5-4.7: task_wdt: Task watchdog got triggered. #387
Comments
Try putting a:
inside that while loop just to see if it makes any difference. |
Adding I subsequently added the following to aid additional debugging:
and with that, just before the watchdog, I get:
The tx_start bit is clear, interrupt is enabled, and no interrupt in the status bit. Reading the ESP32 documentation, it isn't clear under what circumstances the tx_start bit clears, but I'm guessing it clears when the RMT returns to idle state. If that is a correct assumption, then it looks like there could be a problem with interrupt delivery. With a bit of further investigation, I noticed that |
... and unfortunately it still failed at 37 minutes. |
Add also in the debugging a int showing how much RAM you’ve left just to discard the program has some leak and you are at the limit of available RAM |
Printing the esp_get_free_heap_size() value reports just under 3MiB. |
I think I've found the issue:
If we set rmt_tx_done true before clearing the status, and this occurs on CPU 0, surely we are racing with CPU 1 executing pulse_ckv_ticks(), potentially triggering the next RMT pulse before clearing the interrupt has taken effect - especially as the above gets assembled to:
Note that the write to int_clr has no memw after it. If the xtensa CPUs use weak ordering (which I guess they do as they have the memw instruction to ensure that preceeding writes don't pass following writes) and this is an interrupt handler, it means that it's indeterminate when that write is going to take effect - and I believe that in the failing case, we get the following order: With the order in rmt_interrupt_handler() reversed (writing int_clr then rmt_tx_done) but with some additional debug code, the platform at this exact moment has been running without this issue for more than 1h40. The additional debugging that adds an additional memw after the store to rmt_tx_done in the interrupt handler (incrementing a count of the number of RMT interrupts received), and without that it could be that CPU 1 still watchdogs because of the write to rmt_tx_done not being observable by CPU 1 quickly enough. I'll update when I have a chance to do further testing. Clearly, "volatile" doesn't have the desired effect here - it doesn't guarantee that the write to the volatile variable (whether that's RMT.int_clr or rmt_tx_done, which-ever is last in I'm now testing this fix (annoyingly, GitHub doesn't support attaching patches/diffs to comments, so this is probably white space damaged!)
With this,
|
It looks like it fixed the issue. When you are completely sure it does not trigger back in, can you make a Pull Request to suggest this change? |
When using epdiy (master branch) with esp-idf 5.4, I'm running into this after the platform has been up for a while:
The display gets updated about once a minute, and the above seems to trigger in about 20 minutes.
The problem appears to be that rmt_tx_busy is false on entry to pulse_ckv_ticks().
Building with xtensa-esp-elf-cc (crosstool-NG esp-13.2.0_20230928) 13.2.0
The text was updated successfully, but these errors were encountered: