MCU 'ercf' shutdown: Timer too close

Basic Information:

Printer Model: Voron Trident 350
MCU / Printerboard: Spider2.2/Fly sb2040 pro/Fly-ercf-can
Host / SBC: Raspberry Pi 3B+
klippy.log: klippy (29).zip (447.0 KB)

MCU ‘ercf’ shutdown: Timer too close
This often indicates the host computer is overloaded. Check
for other processes consuming excessive CPU time, high swap
usage, disk errors, overheating, unstable voltage, or
similar system problems on the host computer.
Once the underlying issue is corrected, use the
“FIRMWARE_RESTART” command to reset the firmware, reload the
config, and restart the host software.
Printer is shutdown

Describe your issue:

Previously, my printer worked very well. But from one day, this error happen. I try to re-connect the CAN wire of
ERCF, modify the 120om resistor of CAN BUS, can not help.
This just happen at very beginning.
I tried to add dwell in _servo_up & _servo_down, but can not help.

I may suggest to:

  • Upgrade Klipper, there are some recent changes for overloaded hosts
  • I see a high number of bytes_invalid, so make sure your PI running at least 6.6+ kernel in the case of CAN Bus and USB2Can bridges.

Hope it helps.

1 Like

Linux version 6.1.21-v7+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1642 SMP Mon Apr 3 17:20:52 BST 2023

But I use this system for a long time, there’s no such error before.
Is it reasonable due to old kernel?

But I use this system for a long time, there’s no such error before.

I possibly can not know what there was and what has changed, nor I should.
If behavior changes, that means something is changed.
You are free to guess, if you will, or start monitoring everything and try to find what is wrong.
There is general information about this error is available here: Timer too close

Is it reasonable due to old kernel?

There is a recent documentation update:
https://www.klipper3d.org/CANBUS_Troubleshooting.html#check-for-incrementing-bytes_invalid-counter

Which clarify bytes invalid, so I may only suggest following the general guide, if it is expected to not work reliably on old kernel and expected to works reliably on a new one, then it is make sense to just use a new one if possible.

It is hard to say, what specifically caused the error you see, you have a lot of modifications, but as I mentioned above, there are 2 suspicious things, that I can come up with, host load and CAN bytes invalid.

This is possible, that after you upgrade distribution and klipper, it will reproduce differently, and then there should be another log analyzing iteration.

So, right now I suggest doing the above, to possibly solve 2 known things, like slow GC on underpowered hosts, and avoid kernel bugs, which can cause issues with CAN.

OK,I’ll try to upgrade the PiOS kernel. But it’s a big task.

Where did you see high bytes_invalid? bytes_invalid=0 during all his log file.
There is no significant load in his log too.

If it was working and you haven’t changed anything, then what’s the point of updating? On the same printers in factory configuration, some people are printing successfully, but for those who print many parts at once, the issue arises. Stop any unnecessary services if there are any, for example. You could try increasing the scheduling priority (using nice) of the klipper process. If it consistently occurs for you, try to reproduce it. If you can reliably reproduce it, you could, for example, unload the bed mesh and repeat. By the way, you have 20 points for the bed mesh and PPS 2. If your bed is so warped that you need to measure every 1.5 cm, then instead of taking measurements, you should fix the bed. There’s an error in the bed mesh script that causes “timer too close” under certain conditions. In short, simplifying your configuration might be the solution.

Where did you see high bytes_invalid? bytes_invalid=0 during all his log file.

I’m baked. My bad.

This is a retransmit spike, not bytes invalid.

There is no significant load in his log too.

High/low depends on the point of view.
I’m not familiar with ECRF code, or any MMU, but there are no heaters, so most probably TTC on ECRF mcu could happen while load/unload (according to logs this is where it happens)
And most probably, similar to other MMUs this is done in drip mode, where everything is time-tight and should happen in less than 100ms (like in homing).

I can’t distinguish which mode is used by the logs, so this is my assumption here.

So, if drip mode is used, and there is an available 100ms time window, technically any load of klippy >10% is high, because it is technically > 100ms.

This is what has been discussed in the GC Freezy pull request.

Hope it helps.


You could try increasing the scheduling priority (using nice) of the klipper process.

Btw, nice mostly has not worked for 15 years already.

Oh, does it mean it is not caused by the kernel bug of CAN?
I’m updating the klipper and whole system code and fixing the upgrade issues now. Let me try to print something after the system upgraded is finished.

Really? You can check it, but on your own :wink:

Oh, does it mean it is not caused by the kernel bug of CAN?

According to docs, bytes invalid can be caused by the kernel,
So, I misread bytes invalid/retransmitted in the log,
so probably, kernel upgrade - wasn’t necessary.

I’m updating the klipper and whole system code and fixing the upgrade issues now. Let me try to print something after the system upgraded is finished.

I think Klipper upgrade will do the most here, I have high hopes for the GC patch.

Now I finish upgrade all the system:
PiOS is up to date (with kernel 6.12.15).
Klipper FW in ERCF, SB2040, MCU are v0.12.0-439.
I’m printing a model and so far so good.
Let me keep watching.