Timer Too Close error after M600 pause in recent Klipper builds

Basic Information:

Printer Model: Ender3V2
MCU / Printerboard: Creality 4.2.2
Host / SBC: RPi0 2W
klippy.log (6.7 MB)
color.cfg (4.2 KB)

Hi all,

I’ve been testing my filament change (M600) macros on the latest Klipper builds and discovered a new issue that did not exist in earlier versions.

What happens:

  • During a filament change, the printer pauses and unloads the filament normally.

  • If I immediately press Load → Continue, everything works fine.

  • But if I wait ~30 seconds (or longer) before pressing Load → Continue, the printer crashes with a timer too close error.

Findings:

  • This behavior started after commit 7ea5f5d (v0.13.0-261)

  • The issue only appears after a longer pause, not when continuing immediately.

  • My macros(in color.cfg) use:

    • SAVE_GCODE_STATE / RESTORE_GCODE_STATE

    • UPDATE_DELAYED_GCODE (for buzzer)

    • Extruder moves via G1 E…

  • Interestingly, the buzzer (delayed gcode) still works, but extruder moves do not execute, and the system throws the timer error.

Hypothesis:
This may be related to the new timer-based step generation. When pause + delayed gcode are combined, the motion queue seems to run out of safe timing margin, leading to the error.

Question:

  • Has anyone else observed this issue with M600 or pause/resume macros?

  • Is there a recommended way to adapt macros to the new timer queuing system?

Note:
I’ve tested by rolling back to the commit right before 7ea5f5d(v0.13.0-260), and with that version my macros work normally without any errors.

Thanks in advance!

1 Like

(I’ve tried to reproduce it on my machine, and I’m unable to. I’m probably having bad reproducing capabilities of performance issues on RPI5.)

Alas, too many TTCs.
Too many TTC sources and places, I do not even know where to start.
Generally, there is a large lag between when things should happen and when they actually happen.
The strangest thing to see for me is when it happens during homing.

The thing is, new code, with threading, should generally perform better than the old one; well, it is simply faster.

Some errors happen while there is a spike in sysload, so probably something is overloading your host. Maybe it is Obico.

Some errors happen when there is virtually low load on the host. Which is confusing.
But the klippy CPU time is ~50ms.
I would guess that computing of next 50ms of moves takes the 50ms, so even if it seems like 5% load, it is actually 100% for some reason.

It is possible, though, that recent code slightly changed the timings, but I would suggest investigating the host CPU usage with some htop or something. Because most of what I see seems to be triggered by load.

There is also some retransmission things, but I would guess it is related to the shutdown moment, where the MCU reloads itself into the shutdown state.

I could also guess that maybe it is the thermal issue/CPU governor/voltage issue.
I would suggest monitoring the CPU usage/Temperature/Frequency during printing.

Hope that helps.

It looks like you have modified the code (by adding code to the klippy/extras/ directory and possibly other changes). See if you can reproduce the problem with the pristine unmodified Klipper code.

If you can reproduce it, please upload the full Klipper log file from the event (and ideally clean up the log before hand so that that the log contains just one shutdown report).

-Kevin

Hi Kevin,

As a developer myself, I fully understand your concerns regarding modified code. I can assure you that, apart from adding the autospeed plugin, I have not made any other changes to the Klipper codebase — including the klippy/extras/ directory. The autospeed extension is purely a calibration tool, not active during normal printing, and I do not use any macros related to it in my M600 macro set.

Following your advice, I updated to the latest commit (v0.13.0-288), reset the logs, and reproduced the problem. I managed to trigger the shutdown twice, and the attached log(klippy.log) now contains only these two sessions.

I also recorded a short screen capture to illustrate exactly when and how the issue occurs: [Video link].
To keep the video concise, I trimmed out some waiting moments (such as bed heating and non-essential idle movements). Nothing else was done during these cuts — just waiting for the printer.

As an additional note, I want to clarify that this issue does not occur if I confirm the Mainsail prompt and continue printing immediately after the M600 macro is triggered.
However, in practice, this is not really possible, since I need to perform a manual filament change during that time.

Thanks for your support,
Sezgin

There may be something to this.

Someone i was helping was getting a timer too close after end print.
It seemed to be getting caused by not disabling the filament sensor before unloading filament at the end of a multifilament print thus calling a pause, so fits with what you are seeing.

Unfortunately i don’t have logs from this.

As you have a runout sensor and are doing changes, i wonder if its something where you wind up in pause state but are actually still printing - often seen as then trying to pause and getting a printer already paused error.

Hi mjfsch,
In my M600 macro, I disable the filament sensor before calling the PAUSE and UNLOAD macros.
Then, in the end_color_change macro, I re-enable the filament sensor before calling the RESUME macro.

maybe something else then, but does seem a bit of a coincidence, out of interest are you using any [extruder_stepper]?

No, I don’t have any extra extruder motors and I am not using [extruder_stepper].

If it would be helpful, I can share my full stepper configurations in steppers.cfg and steppers_tmc.cfg .

As I mentioned in my previous message, I’m using a Creality V4.2.2 board. Normally, this board does not have UART connections for TMC drivers, but I modified the board slightly to enable these connections;

So, the presence of [tmc2208] sections is not an error and should not be misleading.

I noticed that if I trigger the M600 macro manually from the console while the printer is idle (i.e., no job is running), the issue does not occur, even if I leave the printer in pause for a long time.

This suggests that the problem may be related to resuming M600 prompts during an active print, possibly in conjunction with the printer performing immediate G1 or G0 movements right after resuming.

Random idea here, but what if you put an M400 at the very top of your M600 macro? That should complete any remaining moves before continuing with the pause.

I would expect the G-code parser to handle this when the PAUSE macro is called, but I’ll try it out and share the results.

Edit after trying: No effect.

1 Like

Possible relation to memory usage increase

After additional testing, I believe this issue may be indirectly related to increased RAM usage introduced after the commit.

Here are my findings:

  • On the version right before the problematic commit, RAM usage during printing is around 62% on my hardware.

  • On the version with the problematic commit, using the same configuration and the same printing, RAM usage rises to around 81%.

  • I repeated this test several times, switching back and forth between the two versions, and consistently observed higher memory usage on the newer version.

My hardware is already running close to its memory limits, so the increased RAM usage seems to indirectly trigger the “Timer too close” issue in my case.

However, I think the higher RAM consumption itself may also be a problem, even if it does not trigger errors on other setups — since it suggests increased resource usage compared to previous versions.

This may explain why I only see the problem on my setup, while others may not be able to reproduce it.
With this update, it seems that the Raspberry Pi Zero 2 W is no longer on the list of recommended hardware. :frowning:

From what I see, you can check your values against mine:

  • Klippy (Klipper’s host side) uses about 50Mb of RSS generally, and about 20Mb of shared memory.


    (regardless of do I print something or not)

  • Moonraker 50Mb of RSS and 10Mb of shared memory.

Both seem okay to me, and well, within the specs of RPI Zero 2W (512Mb).
Both were like that before and seems like that now.
It would be hard to really decrease the memory usage further (I’ve tried).

About the probable root cause, I already suggested looking at what happens with Htop.
Additionally, I’ve invested some time to gather the CPU/Memory stall information, 2 patches available here: GitHub - nefelim4ag/klipper at statistics-track-stall

If I try to reproduce the memory stall, it would look like so:

Stats 76698.3: gcodein=0 ... sysload=0.45 cputime=2.506 memavail=1370224 cpustall=0.170 memstall=0.000
Stats 76699.3: gcodein=0 ... sysload=0.45 cputime=2.531 memavail=1366372 cpustall=0.177 memstall=0.000
Stats 76705.5: gcodein=0 ... sysload=1.62 cputime=2.695 memavail=1483960 cpustall=0.256 memstall=2.350

If I try to reproduce the CPU stall, I’m not very successful, but the cpustall counter would increase really fast:

Stats 77346.5: gcodein=0 ... sysload=20.75 cputime=5.346 memavail=2901784 cpustall=9.253 memstall=0.016
Stats 77347.6: gcodein=0 ... sysload=27.42 cputime=5.370 memavail=2898312 cpustall=9.575 memstall=0.016
Stats 77348.6: gcodein=0 ... sysload=27.42 cputime=5.391 memavail=2895996 cpustall=9.824 memstall=0.016

Hope that helps.


By your usage of Python 3.11, I assume your system is recent enough to provide those counters.
For the sake of comparison, I use the latest Raspbian and Python 3.13:

$ python3 --version
Python 3.13.5
$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 13 (trixie)
1 Like

Yes, now when I monitor only the Klipper service via htop, I see that in both cases (new version and old version) the memory usage is below the values you provided. Previously, I was basing my observations on the system’s total memory usage, which misled me.

As you said, the memory usage of Klipper is roughly similar in both versions.
So currently, it seems there is no memory-related issue in Klipper.
But I guess this is even worse :sweat_smile: — at least before we had a potential cause to investigate, now we have nothing at all.

I didn’t include separate screenshots for the old and new versions because the memory usage numbers are really close.

If you can reliably reproduce an issue, it should be possible to track it down. It’s a little time consuming, but generally possible.

The steps are:

  1. Make sure you’re running the unmodified code. (Placing code into the klippy/extras/ directory is a code modification.)
  2. Reproduce the issue on the latest upstream code.
  3. Find a recent version of the upstream code that does not show the issue.
  4. Use git bisect to track down which commit first introduced the regression - Klipper Versioning, Downgrading, and Bisecting .

Once the individual commit that caused the change in behavior has been identified then it’s often possible to figure out why there is a change in behavior.

-Kevin

Hi Kevin,

I didn’t need to run a full bisect; I tracked the regression by stepping forward from a known-good commit.

Since opening this topic, I’ve been running on the good commit (b60804bb) for 5 full days without a single issue. The system is stable and all prints finish successfully.
However, whenever I switch back to the bad commit (7ea5f5d2), the shutdown problem reappears consistently.

To rule out any external factors, I also removed all third-party plugins from klippy/extras/ and tested again with a clean repo. The same issue still occurs starting from commit 7ea5f5d2, which confirms that the regression is not caused by custom extensions.

Reproduction steps:
Trigger M600 (manual filament change). If I do not confirm the Mainsail prompt within roughly 30 seconds (the time it takes to manually swap filament), the host shuts down. I reproduced this multiple times.

  • If I confirm the prompt immediately (within a few seconds), the issue does not occur.

  • If I confirm the prompt but continue with the same filament (no actual swap), the issue also does not occur.

Thanks for looking into this!

— Sezgin

Ah, okay. Could you attach the full Klipper log file from the event while using the pristine code. All of your previous logs have shown code modifications. (FYI, Klipper does not have and has never had a plugin system - any time someone places code in the Klipper code directory it is a code modification.)

It may also help if you could check if the issue is still present on the very latest pristine code (eg, commit af17c8c2) to see if the behavior changes. It’d be great to see that log as well.

Another thing that may help would be if you could attach the full log from the same commands while using the last known working pristine code. Comparing good to bad may reveal something.

Cheers,
-Kevin

Hi Kevin,

Links to logs and videos:

I’ve prepared separate log files and videos for the two commits in question.

  • One of them corresponds to the last good commit I’ve marked, where everything works correctly — no need to explain further, as it runs perfectly.

  • The other is the current latest commit (af17c8c2). I’ve also recorded a video for the same macro behavior on both commits, and I shared the links above.

Observations:

  • On the good commit, after triggering the M600 macro, I wait roughly 30 seconds (the duration of a manual filament change) before pressing the Continue button. I then point the camera at the extruder gears, and you can see that the G1 moves inside the LOAD_FILAMENT macro are executed successfully.

  • On the af17c8c2 commit, in the video I also press Continue and immediately point the camera at the gears, but they do not turn — meaning the G1 commands inside LOAD_FILAMENT are not executed. However, the buzzer is silenced, indicating that the UPDATE_DELAYED_GCODE ID=M600_BUZZER DURATION=0 line inside LOAD_FILAMENT has run successfully. So, the macro is called correctly, but the G1 moves fail.

  • At this stage, Klipper does not crash immediately, but when the second prompt appears and I press Continue to resume printing, the TTC crash occurs, which you can see in the video.

I didn’t feel the need to repeat this test on my first bad commit (7ea5f5d2) because the behavior is identical to what happens on the latest commit.

I understand you want the Klipper repo to remain clean (no modifications), but in the pristine repo the SPI flash system does not work fully, which makes switching between versions cumbersome for me. I have a local solution for this, and even a PR in the Klipper repo, but applying it triggers the dirty flag, so I can’t use it during tests and freely navigate commits.

If required, I can run the test on the first bad commit and provide logs and video; however, as mentioned, the behavior on that first bad commit is exactly the same.

Thanks,
— Sezgin

Thank you for this. I’ve been facing the same issue, but very randomly, mostly when I start the bed_mesh_calibrate macro but also at other random times.

Rolling back to this commit has solved the issue

2 Likes

Thanks for sharing your findings! I really appreciate you taking the time to report this—it helps confirm that the issue is related to that commit.