"Timer too close", always at the same point in gcode

Basic Information:

Printer Model: DuelingZero
MCU / Printerboard: MKS SKIPR (UART), CANable, EBB36 (2 pcs.)
Host / SBC: MKS SKIPR
klippy.log (6.0 MB)

Describe your issue:

I’m at a loss. I’m already at countless hours debugging. I hope someone can help me.
My Printer stalls with the far too well known error msg: Timer too close. Sometimes due to MCU, sometimes due to CAN0.
It’s always at more or less the same exact spot in the gcode. The Print failed now more than 10x at the same point in the print.

I did check the this post and it doesn’t really help: Timer too close

What I tried and doesn’t work:

  • triple checked all the settings in Superslicer
  • updated Klipper Host and MCU’s
  • reinstalled Klipper (kiauh)
  • resliced gcode with superslicer

What worked:

  • limit acceleration to <100mm/s (default 13800)
  • different part/file

The gcode that throws the error:
D0_bearing_based_spirographic_WDT_tool_2024_11.zip (4.8 MB)

The .3mf that throws the error:
D0_bearing_based_spirographic_WDT_tool_2024_11.zip (178.6 KB)

Load-Graphs:





1 Like
  • It seems, none of the items listed in referenced post about “Timer too close” have been checked
  • Most (all) items you list are typically not related to this error
  • You are using an unofficial modification that in fact could be causing it
  • None of the graphs shows anything suspicious

As I have written, I checked the post and it did not help.

Here you go, detailed answer to every point:

  • High system load of the host
    → not the case (see graph)

  • High disk activity of the host
    → not the case (see graph)

  • Swapping due to low free memory
    → not the case (see graph)

  • Disk errors / dying SD card
    → replaced it, no change

  • Unstable voltage
    → not the case (no Under-voltage detected! errors)

  • Other hardware hogging the USB bus or other system resources
    → not the case (no additional HW)

  • Running in a Virtual Machine
    → not the case (it’s not in a vm)

  • USB, UART or CANBUS wiring faults leading to extremely delayed messages
    → no communication issues so far

  • Electro-Magnetic Interference (EMI) affecting proper signal transmission or leading to high resend rates
    → no communication issues so far

  • Wrong Clock Reference used during building the board’s firmware (make menuconfig), confusing the timing between host and board
    → clock reference is correct, triple checked, even physically

  • Check for other resource-intensive processes running in parallel
    → see graph

  • Remove additional hardware, especially web cams and displays
    → nothing attached

  • Generally try to simplify your setup as much as possible
    –>… well nothing there to remove without breaking the print-function

  • Check SD card for errors / replace SD card
    → done

  • Check for Under-voltage detected! errors / make sure to use a good and adequate power supply
    → no errors found

  • Do not run data lines (USB, UART, CANBUS) close to and in parallel to high current lines like heaters or steppers. Use high quality cables (shielded / ferrite core). Keep such cables as short as possible
    → cables are seperated, STP and as short as possible.

  • Ensure sufficient cooling of the host’s SBC to avoid CPU throttling due to overheating
    → all CPU’s have coolers attached, no overheating in logs / graps

  • Also see Advanced Trouble-Shooting / Graphing Klipper
    → done

  • Verify that the correct Clock Reference during building the board’s firmware is used. Mind that certain boards have various revisions / flavors that might require different clock settings
    → done

As previously stated: You are running an unofficial modification. It is of unknown quality and apparently two years old. This does not need to cause issues, but still a valid option.

For the rest of your points:

  • The stats for the graphs are generated once a second and are rather an indication than a tool that catch each CPU spike or other events, like communication drops (unless gradually building and longer lasting)
  • No bugs causing this error are known in main-line Klipper
  • Out of experience, around 99% of these errors are due to the points listed in the post, so you might recheck and again do not rely purely on the graphs. They are a great help, but may not be the final truth.

Apart from these points, there is not much guidance for us to offer.

Hey thx for the response.

To rule out dualgantry_corexy.py or something wonky in the os, I now installed klipper on a fresh memory card, on the latest stable armbian bookworm with latest Klipper and used the stock corexy kinematics / mainline klipper.

The issue persists:
klippy.zip (1.3 MB)

I’m really at a loss here. How can I debug this further?

The only thing that I can consistently reproduce, is that it always fails at the same point in the print, suggesting it has to do with something with the gcode or the corresponding movement klipper generates out of the gcode commands.

Hey
As you did dig everything what you could
I did try to analyze your logs and try to find at least something.

I see that you have TTC error (Timer Too Close) on Can0, it seems weird because I didn’t spot anything unusual before you get that error, seems everything was fine and spontaneously you did get that error.

As you know TTC error indicate that some issue appeared when communicating with MCU, so I start looking at DUMPs of can0 MCU.

There I found Send logs which indicate that klipper was not sending any data via Can0 in some period but was receiving data from can0 MCU

Sent 95 4861.392973 4861.392668 21: seq: 13, .......
Sent 96 4863.072633 4863.072415 16: seq: 14, .......

As you can see there is a 1.68 second gap - which is weird during print sequence, so I checked dump of main MCU - there I found similar 1.5s time gap in same period in Sent log.

This data is telling me that klipper was doing something and didn’t send data at all and after that registered TTC error.

To understand what Klipper was doing I did analyze dump of Virtual sdcard
which contain latest G-Code commands.

There I found very weird G-Code sequence, basically there is extrusion path - which is normal, but additionally G-Code contains 73 calls of M107 command.

I think it’s overkill, your G-Code is requesting to turn off a FAN 73 times while it extruded just 0.11mm of a filament.

Also your config have custom M107 command which will call a M106 command.
M106 command is also custom and calling SET_FAN_SPEED in the end,
but if you have “Dual_cariage” enabled - your M106 command will call SET_FAN_SPEED twice - so in this case it will be 146 calls to a FAN speed.

Why it’s bad idea to call a FAN commands so often: on each call of SET_FAN_SPEED command it will insert a new additional callback in lookahead chain for processing a fan speed, and of course that callback have some logic, if we push too much of code there - then most probably it will create huge delays.

Why you have there so much M107 commands - I don’t know, must probably your Slicer is generating it, I think it’s enough to call M107 only once when we wish to stop a Fan.

Hope this will help you, if I did identify issue correctly - don’t forget to include here how you did resolve issue with a slicer, so others can find a solution.

8 Likes

Hey gaolst,

Thank you very much for your analyzation. Can I spend you a beer :beer: ? Send me a DM.
I was able to fix the issue by completely removing all M107 commands from the gcode.

There seems to be a bug in superslicer generating far too many M107 commands. Even when i deactivate all FAN control, there are hundred-thousands of M107 commands still in there despite having absolutely zero M106 commands in the gcode.

I’ll open a bug report there. I checked it, the issue starts appearing first in Version 2.5.59.6

Though I do need to say, this could be handled better by klipper software itself.

1 Like

I don’t understand your statement

You think it’s caused by a superslicer bug

How do you expect Klipper to handle faulty gcode?

1 Like

@gaolst
Excellent analysis :+1:

@madejackson
I’m I correct in my understanding that it finally was an overloaded host CPU that lead to the TTC error (caused by an overload of M107 commands)?

If so, how did you diagnose it?

My editor counted in your attached D0_bearing_based_spirographic_WDT_tool_2024_11.zip (4.8 MB) gcode file:

M107: 819629 times
M106: 819629 times

1 Like

What editor did you use? I’m only asking because your numbers above don’t match the sniff test and they don’t match what I’m seeing.

I unzipped the file and it expands to 14.960MB.

Using Notepad++, there are:
M106: 1,973 instances
M107: 230,707 instances

Now, that is still an unreasonable number of Fan Speed/Off Commands but your original numbers (800k+ for each gcode command) would have them make up almost 50% of the gcode file (7,037,032 bytes out of 14.960MB).

Using my counts, only 6% of the file is M106/M107 commands.

Still a problem but a bit more subtle than what your suggesting.

Almost. It’s caused by the result of a superslicer bug. The bug of superslicer itself does not kill klipper, it just produced a lot lof useless M107 commands.

I think your gcode-viewer is lying to you. Up until now I analyzed more than 20x different gcode files with slightly different settings. I posted the screenshot of one of the error throwing ones here:
Issue #4277
In this case it was 1904x M106 vs. 248774x M107, though I had files with 0x M106 vs. 490’000x M107

As described above the gcode isn’t faulty, it’s just not very clean. Klipper doesn’t like the gcode, gives up and throws an arbitrary error without reporting why or where it failed exactly. Klipper could for example try to recover, or at least give more infos about the point where it failed so it would be easier finding the issue.

I’m sorry, it seems I was too unclear in my statement. I don’t think it’s overloading the CPU or MCU, It’s probably more of a timeout issue while waiting for commands. But actually I’m just guessing, based on my understanding from gaolst description and my own observations. Klipper takes too long to process all these M107 commands, resulting in taking too long to respond to other commands, resulting in TTC-Error hence stopping the klipper-process alltogether.

Interestingly, I had the issue today with another part as well and I was able to delay the TTC-Error by simplifying the FAN-script I’m using. This is congruent with the fact, that reducing Acceleration to 100mm2/s also delays or omits the issue. Increasing the amount of time the MCU needs to fulfill a movement reduces the idle time hence increasing the amount of time available to process the M107 command. Reducing the script running time in turn reduces the time needed to process a single M107 command.

These are my scripts for M106/M107:
(M107 was previously calling M106 S0)

[gcode_macro M106]
description: Override "M106" to allow multiple extruders.
gcode:
    {% set dcvars = printer["gcode_macro DC_VARS"] %}
    {% set raw_speed = params.S|default(0)|float %}
    {% set fan_speed = (raw_speed / 255.0)|round(2) %}
    {% if (params.P) is defined %}
        {% set P_fan = dcvars.fan0 if (params.P)|int == 0 else dcvars.fan1 %}
        SET_FAN_SPEED FAN={P_fan} SPEED={fan_speed}
#    {% elif printer.dual_carriage is defined %}
#        _SET_PRINT_FANS_SPEED SPEED={fan_speed}
    {% elif printer.configfile.config.printer.kinematics == "dualgantry_corexy" %}
        _SET_PRINT_FANS_SPEED SPEED={fan_speed}
    {% endif %}
    
[gcode_macro M107]
description: Override "M107" to allow multiple extruders.
gcode:
# Previous command
#    M106 S0

    {% set dcvars = printer["gcode_macro DC_VARS"] %}
    SET_FAN_SPEED FAN={dcvars.fan0} SPEED={fan_speed}
    {% if printer.configfile.config.printer.kinematics == "dualgantry_corexy" %}
      SET_FAN_SPEED FAN={dcvars.fan1} SPEED={fan_speed}
    {% endif %}

[gcode_macro _SET_PRINT_FANS_SPEED]
gcode:
    {% set dcvars = printer["gcode_macro DC_VARS"] %}
    {% if params.SPEED is defined %}
        {% set fan_speed = params.SPEED|float %}
    {% else %}
        # read print fan speed from active extruder
        {% set fan0_speed = printer["fan_generic " + dcvars.fan0].speed|float %}
        {% set fan1_speed = printer["fan_generic " + dcvars.fan1].speed|float %}
        {% set fan_speed = [fan0_speed, fan1_speed]|max %}
    {% endif %}

    SET_FAN_SPEED FAN={dcvars.fan0} SPEED={fan_speed}
    {% if printer.configfile.config.printer.kinematics == "dualgantry_corexy" %}
      SET_FAN_SPEED FAN={dcvars.fan1} SPEED={fan_speed}
    {% endif %}

Thx. BTW, Orca had a similar issue Klipper "timer too close" error on specific model, but Cura works fine .. · Issue #804 · SoftFever/OrcaSlicer · GitHub

Check the scrips to make sure there isn’t a condition adding the M107 to the Gcode file.

The issue with this is, It assumes Klipper is going to parse the GCode in a way that will inform you when something looks wrong. That’s a whole other set of complexity. Klipper assumes the GCode is right (As long as it gives correct GCode commands).

So your M106/M107 dragged down the MCU resources until it couldn’t keep up proper timing and you got a timer too close issue. Which is what Klipper reported, as to the WHY it happened… There could be a million reasons.

What if, instead, it was M104 or M140 (set Extruder/Bed temp) and it tried to change the temperature up and down by 1 degree C hundreds of times. Most likely the same thing would have resulted.

What if it’s because there is a bad cable and the synchronization between Klipper and the board gets skewed? Can result in the same thing.

Klipper only knows WHAT happened, which it told you, knowing WHY is much harder and it’s essentially impossible to code for every possible combination of reasons. Or even single reasons.

2 Likes

not really, no. You’re assuming that.

Just from the top of my head. Klipper could simply…
… NOT halt. Instead klipper could just throw a warning and proceed. I am fairly certain, that klipper destroyed my print because it halted and not because it took too long to process a gcode. If it just proceeded, the part would’ve come out fine.
… slow down and throw a warning when the timer gets close
… report the gcode location where it failed or which cmnd took too long

  1. As @TheFuzzyGiggler correctly pointed out: Klipper only realizes that something is wrong and not WHY it is wrong
  2. There is an abundance of constellations in a 3D printer that could potentially lead to safety implications
  3. It is Klipper’s clear philosophy for good reason: If something is going wrong then we assume an issue and safely shut down the system

It is my strong hope that this philosophy does never change as I do not think there is any room for interpretation and second guessing in the sense of: A bit bad? A bit worse? A bit catastrophic?

2 Likes
  1. Pretty sure Klipper has more info than just “Timer too close”. At the very least it knows which timer was too close and how close it was. All I’m saying is there is more info than that is reported in the error which would be handy for debugging.
  2. Safety first, I agree. Doesn’t mean this error handling can’t be improved.
  3. “Close” already means some margin, right? Somebody had to define a limit to the timer, so klipper is already letting pass some margin of error / inaccuracy. This means the error msg already doesn’t align with that philosophy. Otherwise It should read smthng like “Timer too late”.

I’m not 100% sure, but most probably it’s a single timer which control that we want to deliver a message to MCU which is already expired or very close to it.

As I remember klipper have 2 different threads which runs in parallel, first is calculating what should be happening and when and putting results in a shared buffer, second one is fetching data from that buffer and pushing to MCU.
With this architecture it’s very hard to derive some conclusion what caused an event which become expired or become expired while in flight to MCU.

Totally agree with you on this one, it’s very frustrating to get TTC error which in most cases points to communication/hardware issues but actually it was other issue (software delay), I already stumble on 2 cases in 2 days with same symptoms.
But I don’t see how this kind of errors could be differentiated from others in current klipper architecture, if you have some good valid idea - then feel free to implement it and propose to push it to mainline.

That timer is not something static, amount of a margin is derived in runtime from current state of a different buffers and computed communication delays. Klipper in most cases will catch that this message is still valid, but if we try to deliver it - it could expire - so “Timer too Close” better describe what is actually happening.

As I understand, “timer to close”
Klipper outputs a group of execution commands, including the time to execute and time sync to each MCU. When one reports back that there is not enough time between the group just sent and the first command in that group.

This could be; Klipper had a communications issue that delayed sending the commands to the next MCU therefore this MCU didn’t have enough free cycles to execute the next command at the time required. (resenting failed CAN packets), Kevin has recommended keeping txqueuelen (transmit buffer) low at 128 instead of a higher number.

Also, if additional software is installed on the RPI can slow down Klipper causing this same error.

Also, high number of Gcode commands that have no use or no effect on the print, Klipper is still creating timed events each MCU has to execute, this could overload an MCU consuming processing cycles until at a point where it can’t execute the next one due to lack of free cycles.