Intermittent TMC UART weirdness

I am thrilled to report that it appears that I found a solution with a simple workaround for now, and indirectly a confirmation of the root cause.

The issue is definitely related to timing on the TMC UART. While testing the work-tmc-20210715 branch I decided to tweak some TMC2209 settings that I read about in the data sheet. Specifically, since I suspected some potential timing issue, the SLAVECONF register was of interest. Ultimately, changing the SENDDELAY value from the default of 8 to 6 or lower completely eliminates all occurrences of all TMC read faults. In many hours of testing with both my Duet Mini and the SKR Mini I was not able to trigger even a single retransmit! This was done with baud rates of 9,000, 40,000 and 57,600 and with the work-tmc-20210715 branch as well as the main (trunk) branch of Klipper. It looks like Klipper expect the response from the drivers a little bit sooner than it receives it with the default SENDDELAY setting of 8.

Hopefully this is easy to fix, but for now I am using the following macro as a workaround:

[gcode_macro TMC_SENDDELAY]
gcode:
  SET_TMC_FIELD STEPPER=stepper_x FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=stepper_y FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=stepper_z FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=stepper_z1 FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=extruder FIELD=SENDDELAY VALUE=2

Peter

EDIT: Upon another look at the (very confusing) TMC data sheet I realized that the default value for SENDDELAY is actually “8 bit times” which I believe is equivalent to SENDDELAY VALUE=0. I am therefore retesting with SENDDELAY VALUE=2 (one step increase above default). This, however, does not invalidate my previous results that show flawless operation with SENDDELAY VALUE=6. But it does make some of my statements above invalid, such as Klipper expecting the response sooner - it seems that the response comes too soon instead and must be delayed.

EDIT2: I can confirm that there are no TMC UART retransmits with SENDDELAY VALUE=2. I have revised the above macro accordingly. I would consider this part of the investigation closed.

Interesting!

Changing SENDDELAY is not hard. I’m pretty sure this isn’t a Klipper issue, but it may be an errata in the tmc2209 chips. (In particular, I wonder if the chips are being confused by each other’s responses when a single uart line is shared between multiple chips.)

Would you be able to confirm that explicitly setting SENDDELAY=0 still causes issues? (That is, can you confirm this isn’t a misunderstanding of the default tmc2209 setting for SENDDELAY.)

-Kevin

Indeed quite interesting!

I can swear that I read somewhere that SENDDELAY has to have a minimum value other than zero when using multiple slaves, but I am unable to find it in the data sheets so I think I may be confusing it with something else. RRF also explicitly sets the value to 0 stating // we don't need any delay between transmission and reception.

I can also confirm that explicitly setting SENDDELAY=0 causes the retransmissions to return.

Does this help? TMC-API/TMC2209_Fields.h at 558113493a8cde0eb68a3794b77752622e9ed39e · trinamic/TMC-API · GitHub

Thank you for the link, I was not aware that TMC had a Github repository.

I am not sure how Kevin would like to proceed from here. Setting SENDDELAY=2 as default in Klipper would be extremely simple, I imagine. But the question remains why the default value results in missed responses while it (maybe) works fine with RRF, etc. If there is nothing obvious in Klipper code then I would imagine a question to Trinamic engineering would perhaps be appropriate.

I think we should do two things: 1) set SENDDELAY=2 on tmc2209 drivers and 2) increase the UART speed to 40000 on non-AVR micro-controllers.

I hope to get a PR up with the above sometime this week.

-Kevin

1 Like

That’s perfect for my installation :slightly_smiling_face:

Thank you very much,
Peter.

I am posting this for the sake of documenting some additional information.

I was chatting with Desuuuu who is using an SKR board with TMC2209 drivers. His board has independent UART interfaces for each driver. He did some testing with the additional debug logging patch and had 0 failed reads at both 9kbps and 40kbps. At 200kbps he gets a few failed reads per minute. The results were exactly the same with SENDDELAY set to 2.

This at least confirms that the issue is isolated to TMC drivers that share a single physical UART.

Peter.

Is there a reason SENDDELAY=2 is only set for the 2209 and not for the 2208?
I just updated my klipper installation and started to see similar issues with TMC2208s on my Trigorilla board.

Our testing showed that even TMC2209 was not affected by this issue in configurations where each driver has a separate UART interface. Only installations with TMC2209 where multiple drivers (up to 4) are using a single physical UART interface with soft slave addressing were showing this issue. TMC2208 does not support multiple slave addressing and each driver has to use a separate dedicated UART and therefore is not impacted.

I am also not personally aware of any prior reports of UART retransmit errors on 2208.

Interesting … I just ran into the issues again (everytime it’s a different stepper and/or register it fails for). I restartet the Klipper host multiple times and everytime I tried homing it failed again. Then I added

self.fields.set_field(“SENDDELAY”, 2)

to tmc2208.py and restarted the host. Everything worked fine afterwards. So while I might have a different problem, the fix might still be the same.

@Nitek May I ask, if the change solved your problems permanently?

I also have a Trigorilla (1.0) board and installed the TMC2208 v3 a few days ago. I notice the same strange behaviour (different stepper fails, always repeatable with a G28/M84/G28 combo, etc.) and tried to solve it with sending a different SENDDELAY value, but that didn’t solve the problem. Then I have to turn off the printer, wait some time, turn it back on again and only send a G28 at beginning of a print, but not manually…

Generally, if I issue the G28, the printer starts to home, but stops the steppers randomly and mcu is shutdown. So sadly, everytime the stepper communication fails, I am unable to execute a DUMP_TMC so I cannot get further information.

What I find really strange is that I was able to succesfully end a 3 hour print, BUT in my end gcode is a G28 and it failed again.

EDIT: Oh, I see, it seems to be same as in: TMC2208: periodically gets errors while homing - #7 by massild

Reverted to my old stepper drivers for now.

Well, yes and no. It doesn’t happen after every print anymore, but I noticed that it again after a series of prints just before shutting down the printer. So for me the situation seems to have at least improved.

I appear to also now be having these issues since updating Klipper to the latest release yesterday. Previously have been using TMC2209 over UART (each with its own UART pin) for months without issue. I suppose this is likely related to the new change to periodically check the status of the drivers.

The issue only appears to occur during homing - printing is fine then the error is thrown during homing at the end.

Given it was previously working perfectly, is there any way to disable the new functionality or another workaround to prevent these crashes?

There is no option to disable the periodic driver checks.

If you’re having an issue with the latest code, best would be to open a new topic here on Discourse with the full Klipper log file and a description of the steps necessary to produce the error. Hopefully, someone will be able to help identify the problem.

Separately, a few people have reported that issuing explicit SET_STEPPER_ENABLE commands for the steppers prior to homing them has improved stability. It is unknown why this is the case.

-Kevin

I’m just catching up on this after someone mentioned this issue on Discord and it got me thinking… any reason why I can’t or shouldn’t just add this to my TMC2209 powered printer?

[delayed_gcode TMC_SENDDELAY]
initial_duration: 5.
gcode:
  M117 Running startup macro...
  SET_TMC_FIELD STEPPER=stepper_x FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=stepper_y FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=stepper_z FIELD=SENDDELAY VALUE=2
  SET_TMC_FIELD STEPPER=extruder FIELD=SENDDELAY VALUE=2
  M117 Startup macro done!

The latest Klipper code automatically sets SENDDELAY to 2 on tmc2209 drivers. So, no manual intervention is necessary.

-Kevin

1 Like