MCU shutdown: move queue overflow

After many months of flawless performance my printer just experienced a hiccup with the MCU shutdown due to move queue overflow.

MCU move queue overflow

It finished a 6+ hour print with no issues and then shutdown about 20 minutes into another print. I have never seen this before and there seems to be almost no reports of similar errors.

I wonder if anyone more experienced could interpret the attached log as I could not identify anything obvious myself. I am wondering if this could be related to my somewhat recent experimenting with very high micro stepping and reduced step pulse duration:

microsteps: 256
step_pulse_duration: 0.000000050        # 50 ns vs. 100 ns default

MCU move queue overflow - klippy.zip (1.8 MB)

Thank you.

It looks to me like the micro-controller got temporarily overloaded and it manifested as a “move queue overflow” error. (Specifically, it seems the micro-controller was unable to clear past moves fast enough to have sufficient space for new moves.)

It doesn’t look like you were pushing that many steps per second, relative to the maximum of the mcu, at the time of the failure. However, it seems the particular high-speed diagonal move at the time (~168mm/s on stepper_x) was at a speed that required more cpu time to process (the duration between steps was small and wasn’t an even multiple of the step clock).

Likely the simplest solution is to not push the micro-controller to such a high step rate.

Separately, setting a step_pulse_duration of less than 100ns has no value as none of the micro-controllers can actually step that fast. If they could, it wouldn’t be valid, because the TMC drivers are only rated to 100ns.

-Kevin

After another look at this, I don’t think it could be explained by micro-controller load. (In order for load to have been the root cause of the issue in this particular case, the mcu would have to have fallen behind by 8+ milliseconds, and were that the case then other errors would have been raised.)

Unfortunately, I don’t have any guesses as to what could have resulted in this error. I suppose it’s possible it is a fluke of some kind. Best might be to monitor it and report any further errors.

-Kevin

On the topic of step puse duration, can TMC driver be configured in klipper to step on rising and falling edge? I searched but could not find anything on it.

Already is. See https://github.com/Klipper3d/klipper/pull/4851

Thanks very much for your time looking at this Kevin!

Interestingly, I got another MCU shutdown on the following day but for for a completely different reason: “Missed scheduling of next digital out event”:

MCU missed scheduling digital out event

This happened almost immediately after I started heating up the printer with no motion at all. Intermittent issues drive me crazy, particularly when they seem to occur randomly :grinning:.

Since then, I reseated all Pi & MCU cable connectors and also pulled fresh Klipper from GitHub and recompiled & flashed the MCU with it (just in case). So far no more strange issues in four days. If anything else happens I will start suspecting the power supply perhaps. The CR-10S Pro power supply gets hit really hard with the ~400W DC bed heater and even though it’s a MW, perhaps it’s starting to deteriorate due to the ripple currents.

In either case, I attached the log from the second event.

Thank you again.

MCU missed scheduling digital out event - klippy.zip (49.8 KB)

That log indicates something caused the host software to restart (eg, via sudo service klipper restart). The error message was a natural result of that (the micro-controller is reporting that it had to disable a heater because the host stopped sending updates). So, this log doesn’t seem related to your earlier log.

-Kevin

Thank you for looking! I’m sorry that I did not have a chance to look at that second log myself and instead just shot-gunned my installation - I have been swamped with work lately. It’s curious why the host software restarted as it certainly was not my doing. I will continue to keep an eye on the printer and will report if anything else untoward happens.

Peter.

I am happy to report absolutely no more misbehaviour with the printer here despite lots of print time. It’s a mystery.

Hi,
Here is some more data, I just had that same error !
My config is quite different from yours, but I don’t have the knowledge to identify the root cause by myself …
Seems like you didn’t face that issue anymore ?
For now, I’ll just keep my finger crossed ^^

klippy (9).log (3.5 MB)

No, the issue has not repeated itself in almost a year now.

I’ve tried to run the same GCODE a second time, and it crashed a second time not exactly but nearly at the same place in the first layer of gyroid infill.
It might be related to my recent addition of a sb2040 can bus tool head.
Hoping it can help you helping me ^^

Unfortunately I cannot help you with this. Someone like Kevin would need to have a look at your log to see if anything can be deduced from it…