CANBUS Missed scheduling of next digital out event

Basic Information:

Printer Model: Ender 5
MCU / Printerboard: lpc1769 MKS SGEN V1, Fysetc UCAN, BTT EBB42 V1.2. [v0.12.0-418-g0114d72a6]
Host / SBC: Raspeberry Pi CM3+ [v0.12.0-418-g0114d72a6]
klippy(32).log (925.8 KB)

Klipper CANBUS logs.zip (938.4 KB)

can0: flags=193<UP,RUNNING,NOARP>  mtu 16
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 128  (UNSPEC)
        RX packets 39837  bytes 261446 (255.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 85662  bytes 650474 (635.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Describe your issue:

Eternal battle of the CANBUS…

-I’ve tried flashing different firmware to the USB to CAN board. Now I’m running Katapult/Klipper with the latest version from the git [v0.12.0-418-g0114d72a6].
-I’ve tried changing the TRSYNC_TIMEOUT from 0.025 to 0.05 and back again.
-Wires are properly crimped (had a job that was mainly just crimping xD).
-Twisted cables from UCAN to EBB board, pulled, pushed, shaken, flicked during printing. No issues.
-120 Ohm terminations in place.

bytes_invalid counter is only 0 for the printer mainboard MCU (USB).
For the USB to CAN adapter it maxed out at 22, but the EBB42 board continuously increases…

mcu: 	mcu_awake=0.002 mcu_task_avg=0.000008 mcu_task_stddev=0.000008 bytes_write=1093776 	bytes_read=193878 	bytes_retransmit=9 		bytes_invalid=0 	send_seq=21248 	receive_seq=21248 	retransmit_seq=2 	srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 	freq=120004486 
UCAN: 	mcu_awake=0.002 mcu_task_avg=0.000016 mcu_task_stddev=0.000006 bytes_write=4967 	bytes_read=17464 	bytes_retransmit=221 	bytes_invalid=22 	send_seq=795 	receive_seq=783 	retransmit_seq=795 	srtt=0.005 rttvar=0.006 rto=5.000 ready_bytes=0 upcoming_bytes=9 	freq=47996332 adj=47994498 
EBBCan: mcu_awake=0.001 mcu_task_avg=0.000013 mcu_task_stddev=0.000009 bytes_write=486137 	bytes_read=112769 	bytes_retransmit=2620 	bytes_invalid=506 	send_seq=9463 	receive_seq=9460 	retransmit_seq=9463 srtt=0.022 rttvar=0.012 rto=5.000 ready_bytes=9 upcoming_bytes=2775 freq=63999529 adj=63997011
ambient: temp=-99.6 heater_bed: target=60 temp=58.8 pwm=0.000 sysload=0.41 cputime=54.599 memavail=627072 print_time=692.365 buffer_time=0.000 print_stall=0 extruder: target=205 temp=201.3 pwm=0.000

I’m REALLY enjoying Klipper and the good stuff with it, but I would love to be able to trust it on prints not crashing!

Thanks for any input!!!

Just finished a 40 minute print.

I have grounded the carriage (hot end, extruder…) to see if it helped.
I still see bytes invalid for the UCAN at nearly 700 and 1500 for the EBB42…
klippy(33).log (4.6 MB)
mycanlog.zip (4.7 MB)

Successful print at least hahaha

I can suggest taking a look here, maybe those changes will add some useful insight.

Hiya! Thanks for that, I’ll have a look into it!
I believe this is a fork/pr? I’ll have to find out how to load this onto my machine hahaha

$ cd klipper/
~/klipper $ git fetch --all
~/klipper $ git checkout work-canstats-20250114

Than you need to do something like:

sudo systemctl restart klipper

And recompile/re-flash your CAN boards.

1 Like

Thanks for that!

Another successful print yet invalid_bytes keeps increasing…
klippy(34).log (3.8 MB)
mycanlog.zip (4.8 MB)

I see nothing in the new log fields, all zero or active ( rx_error=0 tx_error=0 bus_state=active)

Guess i’ll have to continue printing and see!

Edit: version changed to v0.12.0-425-ga849b143d

According to your old logs, you always get
b’Got error -1 in can write: (105)No buffer space available

Oh, it is already reflashed.

Btw, there should be a common ground (GND) between EBB and UCAN, and in the end with RPI.

But looks like main mcu and ucan connected by usb, so mcu should share ground with power supply which is connected to ebb (hope this is the same power supply).

klippy(37).zip (690.3 KB)

Aight… I’ve grounded the carriage, added a ground wire from the UCAN to the EBB (all PSU’s share a ground point in the electronics enclosure) and still getting bytes_invalid errors…

At least it seems to be printing hahaha

The canbus statistics PR was merged into the mainline Klipper earlier today. However, those statistics wont help as you have the “increasing bytes_invalid” problem - see: CANBUS Troubleshooting - Klipper documentation

You’ll have to find the fix for this and get the connection to a point where bytes_invalid no longer increases - nothing will solve your problems until you’re able to do that. Unfortunately it’s difficult to give advice on exactly how to fix your particular setup. A lot of people are reporting that switching Linux kernels (eg, going to 32bit kernel if one is available) can fix the issue, but that’ll depend on the hardware that you are running.

Hope that helps a little,
-Kevin

EDIT: Just to be clear, an increasing bytes_invalid is only known to occur due to Linux kernel bugs and/or canbus adapter firmware bugs. Wiring issues would not cause increasing bytes_invalid and thus this issue can not be fixed by changing wiring.

Thanx Kevin,

This is useful, especially:

I’ve always assumed that bytes_invalid increasing could be caused by wiring problems/intermittent connections. I have definitely seen this with users that have U2C adapters (but none with main controller boards running the USB to CAN bridge software).

Are there any Linux Kernels which have been observed to be problematic in this regard?

@Sineos , could information be included in your (excellent) Knowledge Base posts? I’m not really sure what is the right topic.

Thanx!

Heh, just went through the manual install steps of Mainsail with the 32bit lite OS and it would not start… Just saw they have a prebuilt image in the git. Going to try that :woman_shrugging:

Host is running version: v0.12.0-429-g01b0e98ab
MCU’s and CAN version: v0.12.0-425-ga849b143d
It seems better. Going to update all the MCU’s and try again.
It does seem like 64b vs 32b does affect it slightly?
klippy(42).log (4.0 MB)

Stats 3379.5: gcodein=0  mcu: mcu_awake=0.002 mcu_task_avg=0.000009 mcu_task_stddev=0.000013 bytes_write=5685022 bytes_read=854328 bytes_retransmit=128 bytes_invalid=0 send_seq=104808 receive_seq=104808 retransmit_seq=25306 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=120004693 
canstat_UCAN: bus_state=active rx_error=0 tx_error=0 tx_retries=0 UCAN: mcu_awake=0.005 mcu_task_avg=0.000019 mcu_task_stddev=0.000010 bytes_write=34583 bytes_read=100332 bytes_retransmit=0 bytes_invalid=103 send_seq=5734 receive_seq=5734 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47996460 adj=47994546 
canstat_EBBCan: bus_state=active rx_error=0 tx_error=0 tx_retries=0 EBBCan: mcu_awake=0.002 mcu_task_avg=0.000014 mcu_task_stddev=0.000010 bytes_write=2584487 bytes_read=543960 bytes_retransmit=0 bytes_invalid=632 send_seq=50566 receive_seq=50566 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999502 adj=63997100  
ambient: temp=-99.6 heater_bed: target=0 temp=52.8 pwm=0.000 sysload=0.32 cputime=278.759 memavail=684128 print_time=4447.467 buffer_time=0.000 print_stall=1 extruder: target=0 temp=154.0 pwm=0.000

Update:
Host is running version: v0.12.0-429-g01b0e98ab-dirty
MCU’s and CAN version: v0.12.0-429-g01b0e98ab
UCAN got 128 bytes_invalid and EBB42 498.

Should I try increasing the txqueuelen? Or try again with TRSYNC_TIMEOUT?

klippy(43).zip (744.4 KB)

Stats 8680.9: gcodein=0  mcu: mcu_awake=0.002 mcu_task_avg=0.000009 mcu_task_stddev=0.000011 bytes_write=5153925 bytes_read=859740 bytes_retransmit=23 bytes_invalid=0 send_seq=96322 receive_seq=96322 retransmit_seq=1539 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=120005034 
canstat_UCAN: bus_state=active rx_error=0 tx_error=0 tx_retries=0 UCAN: mcu_awake=0.005 mcu_task_avg=0.000019 mcu_task_stddev=0.000010 bytes_write=40145 bytes_read=115856 bytes_retransmit=0 bytes_invalid=128 send_seq=6664 receive_seq=6664 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47996493 adj=47994468 
canstat_EBBCan: bus_state=active rx_error=0 tx_error=0 tx_retries=0 EBBCan: mcu_awake=0.002 mcu_task_avg=0.000014 mcu_task_stddev=0.000011 bytes_write=2442123 bytes_read=564750 bytes_retransmit=405 bytes_invalid=498 send_seq=48918 receive_seq=48918 retransmit_seq=20292 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999716 adj=63997039  
ambient: temp=-72.5 heater_bed: target=0 temp=30.4 pwm=0.000 sysload=0.19 cputime=274.755 memavail=687360 print_time=3198.635 buffer_time=0.000 print_stall=0 extruder: target=0 temp=40.0 pwm=0.000

You have the incrementing bytes_invalid issue. The only known cause is canbus packets being reordered by either a buggy USB-to-CANBUS firmware or a buggy Linux kernel. In your case it looks like you’ve got a buggy Linux kernel. You’ll need to somehow find a Linux kernel that is not buggy, or find some kernel settings (eg, irq routing or power management) that avoids the issue within the Linux kernel.

It may help if you share what OS you installed on the RPi, where you got it, and what kernel version you are currently running (uname -a ; getconf LONG_BIT).

None of my machines have this issue, so unfortunately I can’t give specific advice on how to fix it.

-Kevin

I don’t know which particular kernels are impacted. My speculation is that an error was introduced a couple of years ago, then fixed with can: gs_usb: convert to NAPI/rx-offload to avoid OoO reception · torvalds/linux@24bc41b · GitHub . Then over the last couple of years, various embedded Linux distributions have pulled in the buggy code and/or pulled in the fixed code. I also get the impression that even a buggy kernel can avoid the issue depending on how many cores are active and the routing of usb interrupts to those cores.

The above is speculation though - I’ve done enough investigations of this issue to know what’s going wrong, but I don’t know the exact issue within the Linux kernel nor the exact requirements to avoid it. A large number of people (like myself) don’t have this issue, so it’s definitely possible to fix it.

Hope that helps a little,
-Kevin

2 Likes

I was originally running the 64bit image from the Raspberry Imager tool:

$ uname -a && getconf LONG_BIT
Linux pi3d 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
64

Now I’m running the 32bit version from the git MainsailOS/releases

$ uname -a && getconf LONG_BIT
Linux mainsail 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
32

I will try changing the USB HUB from a MTT one to a STT and see if that has any effect on the bytes_invalid . I originally changed to a MTT HUB because of the issues…

As for the Fysetc UCAN adapted I’ve flashed it with Katapult and Klipper USB to CAN bridge. I’ve tried with what i believe is candlelight before but from my understanding the version in Klipper is a modified/fix version from BigTreeTech?
So if there is any particular FW to flash to the UCAN please let me know, it’s a STM32F072 based adapter.

Thanks a lot for the support from everyone so far! :green_heart:

Have you tried just using the 32bit Raspberry Pi OS Lite and not the Mainsail version?

I’ve tried the Mainsail version and found it to be a bit buggy when working with CAN (I believe I was using a U2C adapter and not a main controller board with CAN to USB bridge hardware built in).

After SSH’ing into the Raspberry Pi, add Klipper/Moonraker/Mainsail using KIAUH:

I’ll try again today, I failed with the manual steps. Hopefully KIAUH will save me from my mistake hahahaha

1 Like

Other than Moonraker being very unhappy about some path stuff (well I am too, took me 3 tries to get Klipper, Moonraker and Mainsail on the 8GB CM3 :sob:) and struggling with the Pi not finding the printer mainboard under /dev/serial/by-id/ (yeah yeah I should have listened to my own words) it’s printing…

bytes_invalid is zero across the board!!!

Stats 1212.6: gcodein=0  mcu: mcu_awake=0.021 mcu_task_avg=0.000026 mcu_task_stddev=0.000054 bytes_write=1667043 bytes_read=301243 bytes_retransmit=74 bytes_invalid=0 send_seq=31858 receive_seq=31858 retransmit_seq=29269 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=120005349 
canstat_UCAN: bus_state=active rx_error=0 tx_error=0 tx_retries=0 UCAN: mcu_awake=0.050 mcu_task_avg=0.000023 mcu_task_stddev=0.000005 bytes_write=15142 bytes_read=45109 bytes_retransmit=0 bytes_invalid=0 send_seq=2496 receive_seq=2496 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47996658 adj=47994564 
canstat_EBBCan: bus_state=active rx_error=0 tx_error=0 tx_retries=0 EBBCan: mcu_awake=0.017 mcu_task_avg=0.000017 mcu_task_stddev=0.000022 bytes_write=714827 bytes_read=189167 bytes_retransmit=391 bytes_invalid=0 send_seq=14822 receive_seq=14822 retransmit_seq=13530 srtt=0.002 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999826 adj=63996962 
sd_pos=648484 ambient: temp=-99.6 heater_bed: target=60 temp=60.1 pwm=0.173 sysload=0.50 cputime=56.751 memavail=690356 print_time=1221.933 buffer_time=2.121 print_stall=0 extruder: target=205 temp=205.0 pwm=0.448

and yes… it seems like the Raspberry Pi Imager MainsailOS 64/32bit has some older kernel?

Raspberry PI OS LITE 32bit (Bookworm):

pi@pi:~ $ uname -a && getconf LONG_BIT
Linux pi 6.6.74+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.74-1+rpt1 (2025-01-27) aarch64 GNU/Linux
32

Mainsail OS 1.3.2 is Bullseye based I guess?

(Any tips on how to proceed with the Moonraker issue would also be greatly appreciated)

/=======================================================\
| a) [Update all]        |               |              |
|                        | Installed:    | Latest:      |
| Klipper & API:         |---------------|--------------|
|  1) [Klipper]          | v0.12.0-429   | v0.12.0-429  |
|  2) [Moonraker]        | v0.9.3-29     | v0.9.3-29    |
|                        |               |              |
| Klipper Webinterface:  |---------------|--------------|
|  3) [Mainsail]         | v2.13.2       | v2.13.2      |
|  4) [Fluidd]           |               | v1.31.4      |
|                        |               |              |
| Touchscreen GUI:       |---------------|--------------|
|  5) [KlipperScreen]    | v0.4.5-40     | v0.4.5-40    |
|                        |               |              |
| Other:                 |---------------|--------------|
|                        |------------------------------|
| 14) [System]           |  System up to date!          |
|-------------------------------------------------------|
|                       B) « Back                       |
\=======================================================/

Could you describe your host hardware in more detail?

You put originally that you have a “CM3+”. I’m guessing that it has 8GB eMMC and is mounted on an IO board?

As you’ve discovered, that’s pretty marginal. Trying to cut things down will lead to all kinds of problems that will take a long time to track down and fix.

My suggestions:

  1. Run the CM3 only from an SD Card and eschew the on board eMMC? That way you have much more Flash available (I tell people to use at least 64GB) and the performance hit is very minimal.
  2. Buy a Raspberry Pi 4B as the CM3 is going EOL shortly and take advantage of more Flash space and a higher performance CPU.