Issues with stepper drift on latest Klipper

Basic Information:

Printer Model: Voron 2.4
MCU / Printerboard: BTT Manta M8P V2.0 + SB2209 RP2040
Host / SBC: Raspberry Pi CM4
klippy.log: klippy.zip (4.5 MB)

Fill out above information and in all cases attach your klippy.log file (use zip to compress it, if too big). Pasting your printer.cfg is not needed
Be sure to check our “Knowledge Base” Category first. Most relevant items, e.g. error messages, are covered there

Describe your issue:

I’ve recently upgraded from Klipper v0.13.0-33-g413ff19ea to v0.13.0-78-g1cc639807 and noticed that my prints began drifting gradually to a side on the X axis to the point that they were leaning a few mm to that side after 10mm height or so. Note, this is not a singular layer shift, but a gradual change.

First, I thought it’s a mechanical issue and checked all the belt paths, tension, tightened screws, etc. but nothing helped. Then, I thought the new 520MHz STM32H723 MCU clock change could have something to do with the TMC Autotune plugin or microsteps interpolation and disabled those — didn’t help either.

Eventually I rolled back to v0.13.0-33-g413ff19ea (including MCU firmware) and nothing drifts anymore, dimensions are all as expected again.

Additional observation: while nothing seems to be visibly wrong with the A/B drives and the toolhead motion, I inspected the height of the X gantry corners and all 4 Z drive joints after the print and some of them ended up being much higher than others (1-2mm depending on how high the print was terminated at) despite running QGL without issues. Running QGL manually after the failed prints would end up perfectly aligned on all 4 joints and then drift away during the print again. This is not happening after the firmware rollback anymore. I would imagine A/B steppers might a have similar issue with one drifting away from another, but I didn’t have an easy way of verifying this.

2 Likes

So with

Git version: 'v0.13.0-33-g413ff19ea-dirty'
Untracked files: klippy/extras/autotune_tmc.py, klippy/extras/gcode_shell_command.py, klippy/extras/motor_constants.py, klippy/extras/cartographer.py, klippy/extras/idm.py, klippy/extras/scanner.py

and

Git version: 'v0.13.0-78-g1cc639807-dirty'
Untracked files: klippy/extras/autotune_tmc.py, klippy/extras/gcode_shell_command.py, klippy/extras/motor_constants.py, klippy/extras/cartographer.py, klippy/extras/idm.py, klippy/extras/scanner.py

You may try a prestine setup: The "dirty" Flag and the Team's Position

Thanks for pointing me to it! Unfortunately, I can’t do a completely pristine setup since one of those plugins is for the Cartographer 3D probe, which won’t work otherwise and render the printer unable to actually do a successful print in order to replicate this gradual drift behaviour.

I understand the position and that this might make this issue ignored by the team, but I think it’s still worth keeping this report as there’s at least one significant change between those two versions that affects stepper drivers, which other people might encounter too. Perhaps someone could replicate it with a pristine setup.

Stepper stepper_x phase change (was 123 now 77)
Stepper stepper_y phase change (was 123 now 9)

Generally, on TMC, if there is something with steps, we will see the messages above.
They are present in the log.

I would explicitly validate the timings on my Octopus Pro board for TMC.
But I would not expected that they are screwed, not I expect that on your board something like this could happen in way where MCU execute them too fast.

You could try increasing the pulse duration for your board, something like this should be harmless:

[stepper_x]
step_pulse_duration: 0.000000500

And probably work around the issue if it is present.


allocate_oids count=1
config_stepper oid=0 step_pin=PF9 dir_pin=PF10 invert_step=-1 step_pulse_ticks=52
finalize_config crc=0

SET start_clock {clock+freq}
SET ticks 70

reset_step_clock oid=0 clock={start_clock}
set_next_step_dir oid=0 dir=0
queue_step oid=0 interval={ticks} count=60000 add=0
set_next_step_dir oid=0 dir=1
queue_step oid=0 interval=3000 count=1 add=0

ECHO Test result is: {"%.0fK" % (1. * freq / ticks / 1000.)}
...
Eval: SET start_clock 298895843119
Eval: reset_step_clock oid=0 clock=298895843119
Eval: queue_step oid=0 interval=70 count=60000 add=0
Eval: ECHO Test result is: 7429K

So, we would expect to see:
1 / 7429000 = 0.000_000_135

From timer perspective everything is okay.

From the scope, the same


~150 pulse width, well in spec of TMC (>=100ns).
If I roughly recalculate time from trigger 486/4 = 121.5ns
(we step on the edge, not pulse)

So, at least from firmware perspective everything should work as expected.

Your report seems strange. FWIW, the cartographer and similar probe modifications are quite “invasive” and might potentially cause unwanted effects. Whether this is the case here, I cannot determine.

The TMC tuning has been extensively tested by some knowledgeable Klipper experts, and the consensus is that it offers no real-world benefit and may even make things worse. Perhaps you could try at least without this modification.

Otherwise, I’m not sure how to debug this further. As @nefelim4ag noted, the phase changes are at least suspicious. Unfortunately, as far as I know, there is not much more information in the log to provide additional insight. Maybe try the approach detailed here: Frequently Asked Questions - Klipper documentation


Datasheet: https://dfsimg1.hqewimg.com/group5/M00/01/F7/wKhk3mK-nbqAXix7AAU8re8yaTM190.pdf

Hmmm…


idk, someone who knows how to estimate the MOSFET performance, could probably say, if they are capable of doing 5MHz peak pulse rate.

Your description of the problem and the Stepper stepper_x phase change (was 123 now 77) type messages indicate that there are “lost step pulses” between MCU and TMC driver. So something like, for example, the MCU told the stepper motor driver to take 100 steps, but the stepper motor driver only thought it was told to move 96 steps.

I have similar comments to others here, if you can’t reproduce on the pristine code, you could try reproducing with as few changes as possible. I’d definitely try removing the TMC autotune stuff as the issue appears tightly related to the stepper motor drivers.

As also indicated previously, you could try setting a higher step_pulse_duration for the motor drivers. Though it certainly seems odd that would be a culprit as the defaults in practice should now be larger than they were before.

-Kevin

Hi folks,
I can confirm the problem too with version 0.13.0-78-g1cc639807.
My model in FreeCAD is 6mm high, in the slicer (OrcaSlicer) too.
But when I print it on my Voron 2.4r2 350, it gets 7mm tall in the left rear and front right ( steppers z1 and z3). In the other corners (z0 and z2) the dimension seams to be okay.

Switching to an older branch (for example 0.13.0-33-g413ff19ea) and dimensions are all as expected again.

Please share the klippy.log file from this version. It would be interesting to see if there are any commonalities with the above case.

No problem :smile:
Thanks for your support.
klippy.log.2025-05-04.zip (3.6 MB)
klippy.log.2025-05-05.zip (2.8 MB)

1 Like

Thanks for reporting. Very strange. It would help if we could get some additional details.

What mcu board are you using?

Your log shows the signs of lost steps between MCU and motor driver. This problem shows up in the log as Stepper stepper_z2 phase change (was 477 now 475) type messages during probing and homing. It seems like the log is reporting this frequently in your setup.

Would you be able to run some tests. The first test would be to add step_pulse_duration: 0.000000200 to all of your [stepper_?] config sections. Then restart the host software (RESTART command) and run a bunch of probe attempts. Does the change fix the problem (the log no longer shows “phase change” messages)?

The second test is a bit more involved, but would help a lot. To run this test, return the config file to how you originally had it (remove the “step_pulse_duration” changes). The test involves performing a “git bisect” to determine the exact commit that causes the change in behavior. There are resources online going over how to use “git bisect”, but in short it involves ssh’ing into the machine and running some commands. Something like cd ~/klipper, git bisect good 413ff19ea, git bisect bad 1cc639807. This will chechout a new revision of the code, you’ll need to compile and flash the mcu (make menuconfig, make clean, make, make flash, sudo service klipper restart - don’t skip any of these steps including running “make menuconfig” and saving on every new build). Then run the code and run the probe sequence a few times and look for “phase change” messages in the log. If the log looks good, you’d issue a git bisect good and if the log shows “phase change” messages you’d issue a git bisect bad. After doing this, you’ll get a new commit, you’ll need to recompile, reflash, etc. and try again. Within a few iterations you should be able to identify the exact commit that introduced the change in behavior. When you are completely done, you should issue git bisect abort.

Let us know your results (success or failure).

-Kevin

Hi,
my Voron 2.4r2 350 is driven by BTT M8PV2 (with TMC 2209 drivers) + BTT CB1 + EBB SB2209 CAN Toolhead.

git bisect good 413ff19ea, git bisect bad 1cc639807 :
still phase change in klippy.log

with git bisect bad:
413ff19ea8e0e5e09c0233225ffa8ad07b1233d6 war sowohl good als auch bad

Version: v0.13.0-33-g413ff19ea :
no phase change in klippy(2).log
klippy.log (1.5 MB)
klippy(2).log (812.8 KB)

so this seams the last commit without problems.

With step_pulse_duration: 0.000000200 to all steppers the Stepper stepper_z phase change and Stepper stepper_z2 phase change still exists.
klippy(2).log (199.1 KB)

This matches the first report:

Actually, this issue dates back to a commit that is already 3 weeks old. It would mean that the offending commit is toolhead: Avoid LookAheadQueue calling back into toolhead class · Klipper3d/klipper@6202a0f · GitHub.

Could you verify the finding by doing:

sudo service klipper stop
git reset --hard 6202a0f
make clean
make menuconfig
make
make flash FLASH_DEVICE=....

Then, please provide the corresponding klippy.log. Thanks!

No Problem.
With git reset --hard 6202a0f there is no more phase change in the klippy.log
klippy.log (168.8 KB)

Alas, something didn’t go right with the “git bisect”. Make sure you run git bisect start then git bisect good 413ff19ea then git bisect bad 1cc639807. This will checkout a new commit (the first commit you’ll try is cc919a5f). You’ll need to completely reflash and restart (make sure you run “make menuconfig” on each new commit that you test, and make sure you run “sudo service klipper restart”). After each of these tests you’ll run either git bisect good or git bisect bad. This process will involve testing at least 5-6 commits and should take at least about an hour to run all these tests. There are guides online explaining the process (for example Git bisect | GeeksforGeeks ).

Thanks for helping to track this down.
-Kevin

So here are the results:

v0.13.0-55-gcc919a5f: good
v0.13.0-66-g9c37a918: good
v0.13.0-72-g7b697105: bad
v0.13.0-69-g5b2f8104: good
v0.13.0-71-g3cf8899a: good


7b697105b32cf48dfe677e047935d413f3324775 is the first bad commit
commit 7b697105b32cf48dfe677e047935d413f3324775
Author: Kevin O'Connor <kevin@koconnor.net>
Date:   Mon Apr 28 20:05:59 2025 -0400

    stm32: Use 12Mhz nominal internal clock in stm32f0_i2c.c
    
    Increase the internal nominal clock from 8Mhz to 12Mhz - this improves
    support for higher chip frequencies.
    
    Signed-off-by: Kevin O'Connor <kevin@koconnor.net>

 src/stm32/stm32f0_i2c.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

klippy_logs.zip (68.2 KB)

Sorry, v0.13.0-71-g3cf8899a seams bad too, if I reset the klipper install to that commit and flash it normally.
However v0.13.0-69-g5b2f8104 work by resetting klipper to it and flash it normally.
I will test it from 69 on to 71 with git bisect.

So here is the second result:

b7c243db198c72170dad05c0f3da0aedac8ca5ff is the first bad commit
commit b7c243db198c72170dad05c0f3da0aedac8ca5ff
Author: Kevin O'Connor <kevin@koconnor.net>
Date:   Tue Apr 29 13:02:34 2025 -0400

    docs: Note functioning canbus required even in bridge mode in CANBUS.md
    
    Signed-off-by: Kevin O'Connor <kevin@koconnor.net>

 docs/CANBUS.md | 9 +++++++++
 1 file changed, 9 insertions(+)

klippy.log (185.0 KB)

This result seems very strange because this merge only affects documentation and includes no functional changes whatsoever.