MCU ‘sb2040’ shutdown: Timer too close

Basic Information:

Printer Model: Voron Trident 350
MCU / Printerboard: Spider2.2/Fly sb2040 pro/Fly-ercf-can
Host / SBC: Raspberry Pi 3B+
klippy.log

Describe your issue:

When I print a file, the printer ramdomly shutdown as “MCU ‘sb2040’ shutdown: Timer too close” after printing 1~3 hrs.
I try to troubleshoot by using graphstats.py, but it didn’t help. The can bus, host loading, temperature and buffers are all looks normal.
Also I try following actions but didn’t change the result:

  1. Limit the microstep of extruder stepper(which is on sb2040) from 16 to 8
  2. Stop KlipperScreen

klippy.log.zip (3.8 MB)

Check your wiring, there is a communication issue somewhere. You’ve got bytes that are building up in the serial queue over time until it can’t keep up.

sb2040: 
mcu_awake=0.009 mcu_task_avg=0.000013 
mcu_task_stddev=0.000021 bytes_write=18161615 
bytes_read=6470898 bytes_retransmit=7240 
bytes_invalid=0 send_seq=456818 
receive_seq=456818 retransmit_seq=432300 
srtt=0.002 rttvar=0.001 rto=0.025 
ready_bytes=17 
**upcoming_bytes=187798** 
freq=11999976 adj=11999623 
sd_pos=12701115 

upcoming_bytes should be 0 ideally, those are stalled messages in the queue.
It’s clearing them out 17 bytes at a time (via the ready_bytes) but it looks like it’s building up faster than it can clear them out.

You mean there are trafic jam in serial port (CAN),right? And this is possiblly caused by hw wiring not stable?
I’ll change another CAN cable and try again tomorrow.

Yes, Means the CAN bus isn’t having clear communications with your sb2040 for whatever reason. When Klipper talks to the mcu it expects an acknowledgement from the mcu that it recieved it successfully.

When it doesn’t get that it queues it up to resend it at the next possible time.

If it gets too bad the timing of the functions gets off (Klipper says do X at Y time, mcu says I did X at Z time) and if they’re way off you get a timer error.

It looks like your queue is filling up, but probably not at a fast enough rate to immediately trigger the error as it clears it out as it can. I didn’t go back far enough to see if it started getting extremely bad after a certain point.

Underlying issue usually is a flaky connection somewhere.

Okay I just graphed it instead.

You got the “Timer too close” and restart around the 30000 timestamp mark.
So it did build up rather suddenly but you can see a small blip earlier on too.

Once you had a restart it was fine.

Crystal clear!
After temporarily change another CAN for sb2040, the “Timer Too Close” gone. I checked the Upcoming bytes back to less than 10 all the time.
BTW, how to generate the Upcoming Bytes diagram? any command?

I just made a Python script. Here you go, you’ll just need to drop it on your Pi somewhere.

log.zip (1.0 KB)

You’ll need to install matplotlib

pip install matplotlib

Then navigate to wherever you extracted the file and you use it like

python log.py <path-to-log-file> <mcu-name>

as in

python log.py ~/printer_data/klippy.log mcu
or
python log.py ~/tmp/klippy.log mcu

I didn’t want to jack with the Regex more than it was so it just uses the first part of the mcu name before any “_”, in other words, your ercf_mcu would just be “ercf”.

If it can’t find the mcu it will tell you which ones it can find so you can pick from that.

One of these days I’ll get around to making a website where people can upload their klippy.log and it gives them all the info from it. I’ve been thinking about that for a while.

The upcoming_bytes counter indicates data that the Klipper host knows it will need to eventually send to the mcu, but also knows it shouldn’t yet send to the mcu. It is not an indication of an error; it is normal for it to be non-zero. In contrast, a very high (eg, more than 64) value in ready_bytes indicates an issue with communications - this is the counter of bytes that the host would like to send to the mcu, but is unable to due to delays on the communication line.

As to the error reported here - the Klipper host code has been heavily modified (with various ercf and other modifications), so it’s difficult to guess what the host code was attempting at the time of the failure. It does appear the host got overloaded, but why is not clear.

Cheers,
-Kevin

Thanks for the clarification, I was mostly going off of the comparison to the stats from my log and a cursory look at the serialhdl file. I must have gotten the ready_bytes and upcoming bytes mixed up.

From ready_bytes, it looks normal:


From upcoming_bytes, it indicate abnormal:

After I change the CAN Cable, everything back to normal:


1 Like

Here, I updated the script and renamed it to something more obvious.

loggraph.zip (1.3 KB)

It now takes 3 parameters as defined below.
You can run

python loggraph.py -?

To get this same output in the future

Usage: python script.py <log_file_path> <mcu_name> <data_field>
Available data fields: mcu_awake, mcu_task_avg, mcu_task_stddev, bytes_write, bytes_read, bytes_retransmit, bytes_invalid, send_seq, receive_seq, retransmit_seq, srtt, rttvar, rto, ready_bytes, upcoming_bytes, freq

It will now also automatically label the chart right based on what you choose.

@Sineos - You might like this as well. I have some future ideas regarding something similar.

It needs Matplotlib though

pip install matplotlib

perfect!
I’ll try this new one.
Thank you so much

1 Like