Intermittent TMC driver crash - Unable to read tmc uart registers

Hi guys, starting a new topic for this as requested.

Since updating Klipper I have been having issues with crashing due to failed TMC register reads when homing. Error messages logged to the terminal are of the format:

Transition to shutdown state: Unable to read tmc uart ‘extruder’ register DRV_STATUS

It’s not always the same register and I have seen the errors from all four steppers on the printer. I have attached a klippy log with the errors from my most recent testing post-reboot just now.

No improvement when explicity enabling the steppers either as suggested with ‘SET_STEPPER_ENABLE’ commands or by setting ‘SENDDELAY=2’ or ‘SENDDELAY=0’ (which I believe was the old value.

Steps to reproduce - homing the printer at any time. This is highly intermittent, as you will see in the log sometimes it will home a few times fine and then crash on the final attempt.

This seems to be related to the new checking code which was recently introduced. The errors only pop up when homing, so typically a print will complete then the crash will occur on homing at the end of the print.
I have checked all the UART wiring and it’s all still as it was the day I installed it. It was all working fine up until I brought klipper up to date. I had been trying the input shaping and thought it could have been related but it still happens even with input shaping disabled.

Any help would be greatly appreciated!

klippy.log (188.0 KB)

1 Like

The UART pins should be interrupt capable. Those with PCINT in the table…

There are not many, but you you also can address the TMCs from 0 to 3 with the jumpers beneath them.

For the printer is already homed at the start of the print, at the end I would just do a G1 X0 Y0

Oh interesting. Is the interrupt pin requirement mentioned in the documentation anywhere?

I’ll give this a go later and report back.

Appreciate the help!

Just given this a try using digital pins 50-53 but it’s still throwing the same errors, still seeing errors for all four drivers (extruder included even though it’s not involved in homing) despite changing the pins for all of them.

There must be more to it than this, there are now a few people describing similar problems since updating on here…

Any other ideas are welcomed! Appreciate you taking the time.

There is no requirement to use an interrupt capable pin in Klipper. See the FAQ: Frequently Asked Questions - Klipper documentation .

-Kevin

Oops - seems I missed that :anguished:

I think I understand what is occurring here. Unfortunately, I don’t have a good solution right now.

What I think is occurring is that the homing speed is resonating with the TMC UART transmission speed on AVR micro-controllers.

On this delta printer a homing speed of 35mm/s results in 2800 steps/s (a step every ~357us). Because of the way deltas work, every step is actually three steps (one for each tower) scheduled at the same time. Similarly, there are 2800 endstop checks per second (and again, the three endstop checks are all scheduled at the same time). This means the (relatively slow) atmega2560 mcu has a lot of work to do on bursts that occur 2800 times per second.

On the atmega, the TMC UART code sends/reads a bit 9000 times per second (~111us). I fear some of the TMC bit transmissions sometimes get scheduled near the homing operation bursts, and that is resulting in scheduling jitter that corrupts the transmission. Although Klipper retries on a failure of this kind, it seems this issue could occur with sufficient frequency that sometimes even multiple back-to-back retransmits all fail.

If I’m correct, this issue is likely limited to AVR MCUs controlling delta printers (or printers with 3+ synchronized Z axes) using TMC UART drivers.

To test if theory is correct, you could try changing your homing speed to see if that makes the issue more or less likely to occur. I also suspect that issuing explicit SET_STEPPER_ENABLE commands will make the issue less likely (as that reduces the number of TMC transmissions that occur during homing).

-Kevin

I should also add, that it’s possible the UART lines on this particular printer are just noisy in general. That is, the root issue may not be software, but electronic noise. It is difficult to determine from the logs alone what the root cause of the corrupt transmissions are. Additional testing could help make that more clear.

-Kevin

Many thanks for that detailed writeup.

I have reduced the homing speed to 20 and, while its pretty slow, I haven’t managed to reproduce the errors in my short testing.

Whether it’s a noise issue or a processing speed issue, reducing the homing speed seems to have sorted the issue for now.

Thanks for your help.

Interesting. If you try 40mm/s, 37mm/s, or similar - does that make a change?

-Kevin

Just given those both a go and plenty of errors on both speeds. Don’t know if it’s significant given the short testing but 37 seems worse if anything. 25 also produces errors.

Switched back to 20 and it’s back to being error free as far as I can tell.

Out of curiosity I also reduced microstepping to 8 (from 16) and ran the tests again. Error free when homing at 40, 45 and 50 mm/s, errors start again at 55mm/s. I don’t know if that points more toward it being a processing issue or if the noise produced would be significantly different at lower microstepping rates.

Interesting. That sounds more like general mcu load than a “scheduling resonance”.

If you comment out your display section, does that change anything?

-Kevin

Doesn’t seem to have any effect, still getting errors at 25 and above.

I’ve put some experimental code up on github in a new work-schedcache-20210916 branch. If you are up for an experiment, you could try that and see if it improves results for you. This code is highly experimental, so extra care should be taken when running it.

cd ~/klipper ; git fetch ; git checkout origin/work-schedcache-20210916 ; sudo service klipper stop ; make ; make flash ; sudo service klippper start

-Kevin

Applying the branch solved the problem.
G28 → ok
G0Z10F5000 ; M84 ; G28 → ok
klippy.log (57.1 KB)

Yep I can confirm this fixes the issue. Homing error free even at 100mm/s.

Thank you for your help!

Should I be safe to continue using this code?

Interesting. Thanks.

The code lacks wide spread testing. Otherwise, I don’t know of any issues with it.

-Kevin

FYI, the code on the work-schedcache-20210916 branch has been merged into the Klipper master branch (as part of PR #4832 - Repeating mcu schedule insert position optimization by KevinOConnor · Pull Request #4832 · Klipper3d/klipper · GitHub )

-Kevin

Unfortunately, there was a serious error in the work-schedcache-20210916 code. Anyone that was using that branch should update to the latest code and reflash their micro-controller. See the announcement at Flashing of micro-controller code needed .

-Kevin