When I print a file, the printer ramdomly shutdown as “MCU ‘sb2040’ shutdown: Timer too close” after printing 1~3 hrs.
I try to troubleshoot by using graphstats.py, but it didn’t help. The can bus, host loading, temperature and buffers are all looks normal.
Also I try following actions but didn’t change the result:
Limit the microstep of extruder stepper(which is on sb2040) from 16 to 8
Check your wiring, there is a communication issue somewhere. You’ve got bytes that are building up in the serial queue over time until it can’t keep up.
upcoming_bytes should be 0 ideally, those are stalled messages in the queue.
It’s clearing them out 17 bytes at a time (via the ready_bytes) but it looks like it’s building up faster than it can clear them out.
You mean there are trafic jam in serial port (CAN),right? And this is possiblly caused by hw wiring not stable?
I’ll change another CAN cable and try again tomorrow.
Yes, Means the CAN bus isn’t having clear communications with your sb2040 for whatever reason. When Klipper talks to the mcu it expects an acknowledgement from the mcu that it recieved it successfully.
When it doesn’t get that it queues it up to resend it at the next possible time.
If it gets too bad the timing of the functions gets off (Klipper says do X at Y time, mcu says I did X at Z time) and if they’re way off you get a timer error.
It looks like your queue is filling up, but probably not at a fast enough rate to immediately trigger the error as it clears it out as it can. I didn’t go back far enough to see if it started getting extremely bad after a certain point.
Underlying issue usually is a flaky connection somewhere.
You got the “Timer too close” and restart around the 30000 timestamp mark.
So it did build up rather suddenly but you can see a small blip earlier on too.
Crystal clear!
After temporarily change another CAN for sb2040, the “Timer Too Close” gone. I checked the Upcoming bytes back to less than 10 all the time.
BTW, how to generate the Upcoming Bytes diagram? any command?
Then navigate to wherever you extracted the file and you use it like
python log.py <path-to-log-file> <mcu-name>
as in
python log.py ~/printer_data/klippy.log mcu
or
python log.py ~/tmp/klippy.log mcu
I didn’t want to jack with the Regex more than it was so it just uses the first part of the mcu name before any “_”, in other words, your ercf_mcu would just be “ercf”.
If it can’t find the mcu it will tell you which ones it can find so you can pick from that.
One of these days I’ll get around to making a website where people can upload their klippy.log and it gives them all the info from it. I’ve been thinking about that for a while.
The upcoming_bytes counter indicates data that the Klipper host knows it will need to eventually send to the mcu, but also knows it shouldn’t yet send to the mcu. It is not an indication of an error; it is normal for it to be non-zero. In contrast, a very high (eg, more than 64) value in ready_bytes indicates an issue with communications - this is the counter of bytes that the host would like to send to the mcu, but is unable to due to delays on the communication line.
As to the error reported here - the Klipper host code has been heavily modified (with various ercf and other modifications), so it’s difficult to guess what the host code was attempting at the time of the failure. It does appear the host got overloaded, but why is not clear.
Thanks for the clarification, I was mostly going off of the comparison to the stats from my log and a cursory look at the serialhdl file. I must have gotten the ready_bytes and upcoming bytes mixed up.
klippy.log.20240610.zip (144.3 KB)
This error happen again.
It’s strange as I change another brand new CAN wire and use new CAN solution.
CAN speed: 1M
USB2CAN: Spider v2.2 with TJA1050 CAN transmitter, work in USB CAN bridge mode.
Here I attach my modified files from clean klipper repo which is the latest. modified_files.zip (56.2 KB)
Also I attached my gcode file. Dragon-Z-flat.zip (5.2 MB) Dragon-Z-flat.z01.zip (7.5 MB)
Funny thing is happen: Even the print is paused, the PART FAN is still work with dynamic speed. It looks like the BIG QUEUED command is dynamic fan control.
Now I got this diagram while print pause.
Well, I rework all of my CAN wires and make sure each H/L are twist together.
I’ll print a large file again to check if any communication jam happen again.
After rework all CAN wires, the upcoming_bytes usually below 1000 and ready_bytes usually under 800. The “Timer Too Closed” and “MCU Lost communication” disappeared.
But, if I enable “dynamic hangout fan”, the issue come out again.
anyway, I’m happy that the wiring issue was gone. Looks like too many M106 would cause sb2040 become stucked.