MCU ‘sb2040’ shutdown: Timer too close

StarDoctor · May 12, 2024, 2:34am

Basic Information:

Printer Model: Voron Trident 350
MCU / Printerboard: Spider2.2/Fly sb2040 pro/Fly-ercf-can
Host / SBC: Raspberry Pi 3B+
klippy.log

Describe your issue:

When I print a file, the printer ramdomly shutdown as “MCU ‘sb2040’ shutdown: Timer too close” after printing 1~3 hrs.
I try to troubleshoot by using graphstats.py, but it didn’t help. The can bus, host loading, temperature and buffers are all looks normal.
Also I try following actions but didn’t change the result:

Limit the microstep of extruder stepper(which is on sb2040) from 16 to 8
Stop KlipperScreen

klippy.log.zip (3.8 MB)

TheFuzzyGiggler · May 12, 2024, 7:13am

Check your wiring, there is a communication issue somewhere. You’ve got bytes that are building up in the serial queue over time until it can’t keep up.

sb2040: 
mcu_awake=0.009 mcu_task_avg=0.000013 
mcu_task_stddev=0.000021 bytes_write=18161615 
bytes_read=6470898 bytes_retransmit=7240 
bytes_invalid=0 send_seq=456818 
receive_seq=456818 retransmit_seq=432300 
srtt=0.002 rttvar=0.001 rto=0.025 
ready_bytes=17 
**upcoming_bytes=187798** 
freq=11999976 adj=11999623 
sd_pos=12701115

upcoming_bytes should be 0 ideally, those are stalled messages in the queue.
It’s clearing them out 17 bytes at a time (via the ready_bytes) but it looks like it’s building up faster than it can clear them out.

StarDoctor · May 12, 2024, 2:45pm

You mean there are trafic jam in serial port (CAN),right? And this is possiblly caused by hw wiring not stable?
I’ll change another CAN cable and try again tomorrow.

TheFuzzyGiggler · May 12, 2024, 4:10pm

Yes, Means the CAN bus isn’t having clear communications with your sb2040 for whatever reason. When Klipper talks to the mcu it expects an acknowledgement from the mcu that it recieved it successfully.

When it doesn’t get that it queues it up to resend it at the next possible time.

If it gets too bad the timing of the functions gets off (Klipper says do X at Y time, mcu says I did X at Z time) and if they’re way off you get a timer error.

It looks like your queue is filling up, but probably not at a fast enough rate to immediately trigger the error as it clears it out as it can. I didn’t go back far enough to see if it started getting extremely bad after a certain point.

Underlying issue usually is a flaky connection somewhere.

TheFuzzyGiggler · May 12, 2024, 4:37pm

Okay I just graphed it instead.

You got the “Timer too close” and restart around the 30000 timestamp mark.
So it did build up rather suddenly but you can see a small blip earlier on too.

Once you had a restart it was fine.

StarDoctor · May 13, 2024, 10:38am

Crystal clear!
After temporarily change another CAN for sb2040, the “Timer Too Close” gone. I checked the Upcoming bytes back to less than 10 all the time.
BTW, how to generate the Upcoming Bytes diagram? any command?

TheFuzzyGiggler · May 13, 2024, 11:25am

I just made a Python script. Here you go, you’ll just need to drop it on your Pi somewhere.

log.zip (1.0 KB)

You’ll need to install matplotlib

pip install matplotlib

Then navigate to wherever you extracted the file and you use it like

python log.py <path-to-log-file> <mcu-name>

as in

python log.py ~/printer_data/klippy.log mcu
or
python log.py ~/tmp/klippy.log mcu

I didn’t want to jack with the Regex more than it was so it just uses the first part of the mcu name before any “_”, in other words, your ercf_mcu would just be “ercf”.

If it can’t find the mcu it will tell you which ones it can find so you can pick from that.

One of these days I’ll get around to making a website where people can upload their klippy.log and it gives them all the info from it. I’ve been thinking about that for a while.

koconnor · May 14, 2024, 1:00am

The upcoming_bytes counter indicates data that the Klipper host knows it will need to eventually send to the mcu, but also knows it shouldn’t yet send to the mcu. It is not an indication of an error; it is normal for it to be non-zero. In contrast, a very high (eg, more than 64) value in ready_bytes indicates an issue with communications - this is the counter of bytes that the host would like to send to the mcu, but is unable to due to delays on the communication line.

As to the error reported here - the Klipper host code has been heavily modified (with various ercf and other modifications), so it’s difficult to guess what the host code was attempting at the time of the failure. It does appear the host got overloaded, but why is not clear.

Cheers,
-Kevin

TheFuzzyGiggler · May 14, 2024, 8:53am

Thanks for the clarification, I was mostly going off of the comparison to the stats from my log and a cursory look at the serialhdl file. I must have gotten the ready_bytes and upcoming bytes mixed up.

StarDoctor · May 15, 2024, 2:30pm

From ready_bytes, it looks normal:

From upcoming_bytes, it indicate abnormal:

After I change the CAN Cable, everything back to normal:

TheFuzzyGiggler · May 15, 2024, 3:24pm

Here, I updated the script and renamed it to something more obvious.

loggraph.zip (1.3 KB)

It now takes 3 parameters as defined below.
You can run

python loggraph.py -?

To get this same output in the future

Usage: python script.py <log_file_path> <mcu_name> <data_field>
Available data fields: mcu_awake, mcu_task_avg, mcu_task_stddev, bytes_write, bytes_read, bytes_retransmit, bytes_invalid, send_seq, receive_seq, retransmit_seq, srtt, rttvar, rto, ready_bytes, upcoming_bytes, freq

It will now also automatically label the chart right based on what you choose.

@Sineos - You might like this as well. I have some future ideas regarding something similar.

It needs Matplotlib though

pip install matplotlib

StarDoctor · May 17, 2024, 4:15am

perfect!
I’ll try this new one.
Thank you so much

StarDoctor · June 11, 2024, 7:40am

klippy.log.20240610.zip (144.3 KB)
This error happen again.
It’s strange as I change another brand new CAN wire and use new CAN solution.
CAN speed: 1M
USB2CAN: Spider v2.2 with TJA1050 CAN transmitter, work in USB CAN bridge mode.
Here I attach my modified files from clean klipper repo which is the latest.
modified_files.zip (56.2 KB)
Also I attached my gcode file.
Dragon-Z-flat.zip (5.2 MB)
Dragon-Z-flat.z01.zip (7.5 MB)

StarDoctor · June 11, 2024, 1:31pm

I monitor the whole print progress and extract the klipper.log every 10 mins. Now I find the Upcoming Bytes raise up, I immediately pause the print.

Funny thing is happen: Even the print is paused, the PART FAN is still work with dynamic speed. It looks like the BIG QUEUED command is dynamic fan control.
Now I got this diagram while print pause.

After a while, the dynamic PART FAN stopped.
I got the normal Upcoming_Bytes:

StarDoctor · June 11, 2024, 1:33pm

During the whole things happen, I got nothing abnormal on CAN bus:

For sure, the CAN works normally.

TheFuzzyGiggler · June 11, 2024, 11:48pm

What model USB2CAN and how is it wired to your Spider board, and how is your CAN bus wired to the SB2040?

If you’re using the wiring supplied with the SB2040, did you twist the yellow and white wires at roughly even intervals when you ran the wiring?

StarDoctor · June 15, 2024, 2:36pm

Well, I rework all of my CAN wires and make sure each H/L are twist together.
I’ll print a large file again to check if any communication jam happen again.

StarDoctor · June 17, 2024, 12:11pm

After rework all CAN wires, the upcoming_bytes usually below 1000 and ready_bytes usually under 800. The “Timer Too Closed” and “MCU Lost communication” disappeared.
But, if I enable “dynamic hangout fan”, the issue come out again.
anyway, I’m happy that the wiring issue was gone. Looks like too many M106 would cause sb2040 become stucked.

pterodaktil · June 20, 2024, 11:42am

Same problem

I prepaired simple script to repit problem without printing
Head SB2040 V2

n=10000000
for i in $(seq 1 $n); do 
               echo  "M106 S$[ $RANDOM % 255 + 100 ]";       
                echo  "G4 P$[ $RANDOM % 50  ]" ;    
 done >test.gcode

upcoming_bytes grow very fast when it try quickly change speed of fan

And it fail to timer to close at the end
Is there any timer with 0,1 sec step?
It seems that it fail with 16 bit counter

StarDoctor · June 20, 2024, 12:20pm

aha, you really reproduce this issue in a simple & quick way.

Topic		Replies	Views
MCU 'sb2040' shutdown: Timer too close General Discussion	15	1058	May 4, 2024
MCU 'ercf' shutdown: Timer too close General Discussion	14	141	March 1, 2025
SB2040 Canbus issue General Discussion	4	1039	March 19, 2023
Timer too close with SB2040 General Discussion	13	512	February 10, 2024
Another MCU 'mcu' shutdown: Timer too close General Discussion	5	489	March 11, 2024

MCU ‘sb2040’ shutdown: Timer too close

Basic Information:

Describe your issue:

Related topics