"Timer Too Close" at a consistent time in file

Basic Information:

Printer Model: Voron Trident
MCU / Printerboard: Octopus MCU, Mellow Fly SB2040 toolhead board, Pi 4
klippy.log klippy.log (450.0 KB)

Describe your issue:

On certain gcode files, my printer consistently throws a “Timer Too Close” error. I can consistently reproduce this failure on specific sliced files. It always seems to happen at the same point in the file (assuming the same file is used, of course). I have tried: Different SD card, different pi, different CANBUS adapter, different CANBUS adapter firmware, different base OS distributions on the Pi (MainsailOS vs RPiOS Lite), eliminating “extra” services (KlipperScreen, camera streaming, etc), increased bandwidth on can0 (500000->1000000), txqueuelen of 128 and 1024.

And yet, the issue persists. The error always indicates that the toolhead board is at fault (“MCU ‘th1-can’ shutdown: Timer too close”). The load graphs generated by graphstats.py look pretty normal for “overall” (no parameters), but generating the th1-can graph reveals an instantaneous spike in bandwidth to 100% at the moment of the failure.

Overall:

TH1:

For your reference, I’ve attached a few debug files that my research indicated may be useful:
CAN log: mycanlog.txt (1.2 MB)
Impacted GCODE file: TimerTooClose.gcode (69.3 KB) (Note: This is essesntially a “crop” of a file sliced by PrusaSlicer, simplified down to near the minimum required to repro the issue - I was tired of waiting an hour for the failure to occur!)

Maybe you can try lowering the extruder microsteps at least for testing purpose.

Otherwise I think you already read through here:

You read my mind. I finished a test print with microstepping set to 64 just before I saw your message. No errors.

I think this would suggest that something at this point in the print is causing a ton of extruder movement (I thought it might be PA, but adjusting it to 0 had no impact). I don’t quite understand what - it looks to my eye like every other part of the gcode.

I’m going to do a more extended test to see if the issue stays gone, but it’s definitely looking promising.

1 Like

Interesting case…, log is showing that MCU th1-can klipper did overflow can bus by sending too much data in last 3 seconds
klippy.log, Line #2316 “…upcoming_bytes=22550”

Interesting part that dumped send queue contain only commands for single stepper (most probably for extruder)
klipper did generate more than 40kb of commands (in uncompressed state) in 0.1sec and did try to push it.

The problem lines (as you did describe)

G1 X173.008 Y181.296 E3.09602
G1 X173.064 Y180.794 E3.12257
G1 X173.064 Y178.949 E3.21954
G1 X174.064 Y178.949 E3.2721 ; This is the problem line.
G1 X175.664 Y183.329 E3.51718
G1 X173.093 Y183.195 E3.65249

That command means you did ask klipper to do 1mm step in X and do 0.06mm In E and this is a corner move and they are on different MCU and you did increase presure_advance to value 0.035, 30 seconds before that, and your stepper have 128 000 steps for 1 revolution
:slight_smile:

It’s not exactly that command, it’s a sequence of them,
in those conditions it did make klipper to do micromanagement of extruder to sync it’s motion with X stepper but your CanBus can pass only 7.5k per second, your MCU can handle much more steps than that - that’s why disabling PA did not change anything.

You did reduce microsteps, but I think you can reduce it even more because you still can overflow the CanBus.
Try setting it to x16 then you would still have 0.0014mm control of filament
with x128 it was 0.000175mm

Or as alternative you can play with speeds.

2 Likes

Thank you for the detailed analysis!

In further testing, I did find that 64x microstepping could also experience the same issue (as your analysis showed), so I’ve adjusted to 16x.

I think reducing microstepping is an adequate solution for my problem, but I wonder what was special about this one right-angle. The gcode snippet was from a spot ~55 minutes into a print which had plenty of other right angles, and the load graph for those showed that the only time the bandwidth exceeded 20% was the moment of the failure.

Your case is related to combination of timings, StepRates, Accel/Decel, PA internals, Velocities and hardware + software restrictions and limits, and probably some bugs.

You really want to go down in that “rabbit hole” ?
This page is just a surface stuff - the hole is much deeper :slight_smile:

The Klipper host software calculates all the step times, compresses them, and sends that compressed list to the mcu. In practice, certain motor speeds compress better than others. It looks like you hit a motor speed that does not compress well and it put significant overhead on the host processing and host-to-mcu bandwidth.

At the time of the fault, the extruder was moving at around 600,000 steps per second. That’s a challenging requirement to meet. As already discussed, reducing the microstep rate is a good way to reduce that number, and avoid these kind of worst case situations.

More specifically, the rp2040 uses a scheduling timer of 12Mhz (the rp2040 cpu runs at 125Mhz, but klipper schedules actions using a 12Mhz timer on the rp2040). At the time of the fault the extruder was moving at around 603K steps per second. That’s a step every ~19.90 timing intervals. We can’t schedule steps at fractional intervals though, so Klipper “dithers” the interval - alternating between an interval of 19 and 20. This doesn’t compress as well as other speeds.

queue_step oid=6 interval=20 count=107 add=0
queue_step oid=6 interval=19 count=12 add=0
queue_step oid=6 interval=20 count=106 add=0
queue_step oid=6 interval=19 count=12 add=0
queue_step oid=6 interval=20 count=107 add=0
queue_step oid=6 interval=19 count=12 add=0
...

-Kevin

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.