CANBUS Communication Fails During Print

Basic Information:

Printer Model: Voron 2.4r2
MCU / Printerboard: U2C 2.1 USB Canbus adapter / EBB SB2209 CAN Toolboard / SKR Turbo 1.4 Mainboard
Host / SBC: Raspi 3B+
klippy.log

klippy.zip (2.3 MB)

Describe your issue:

I think this is related to this other post. There’s potentially another related thread (unresolved) by VoltexRB as well. Ran out of links to link it.

  • CANBUS works perfectly well at idle
  • Homing, print start works without issue
  • Sometime (typically couple hours into print), print fails due to CANBUS issue
  • bytes_invalid always seems to remain at 0
  • However retransmit_seq and TX Retries seems to increment a LOT
  • Bus is set to run at 1Mbps per the setup instructions

Linux ip command doesn’t report any issues/errors. The txqueuelen parameter has been either 128 or 1024 during test prints; either of which is theoretically sufficient according to the wiki.

The pi was recently flashed with mainsailos so it should be up to date, it is running 64bit ARM. I noticed some people were saying to switch to 32bit due to a Linux bug; but it seems that should have been resolved by now and did not solve the issue in the posts linked above.

Both the 2209 toolhead board and the U2C have had their firmware flashed recently. The 120R jumpers are in place but I haven’t checked the actual bus resistance with a multi-meter yet. Seems like the bus is fine though since it works for hours without issue before failing.

Log Errors

Seems to be three different (related?) errors around each shutdown.

  • MCU 'EBBCan' shutdown: Missed scheduling of next digital out event
  • b’Got error -1 in can write: (105)No buffer space available’
  • b'Halting reads due to CAN write errors.'

Most recent print in the log has a huge jump of TX errors/retries, but it was at the start of the print and didn’t actually seem to actually cause a shutdown. On previous test prints, I noticed the logged canstat TX retries would increase almost linearly throughout the print until shutdown. Until the most recent print, I never observed any RX errors.

The MCU frequency on the toolhead seems to become more erratic until shutdown?

Help?

Any pointers welcome. I have can-utils installed I think, but have yet to figure out how to use them. Based on previous threads, I’m not even sure that would provide much diagnostic utility.

Update: We haven’t been able to run a longer print yet but I went and downgraded to a 32bit OS install of mainsailos on the Pi 3B+. Was the next easiest troubleshooting step I could think of. Had to go find the direct download since the raspi imager tool doesn’t let you directly install the 32-bit OS. So far there aren’t any errors, but it’s probably only been 1-2hrs print time. Fingers crossed this issue is fixed and I can move onto tuning other stuff haha!

image

I presume you’re using KIAUH to install Klipper (and Moonraker/Mainsail) going this route?

Personally, I’ve had problems with the MainsailOS image available on the Raspberry Pi Imager and I’ve seen it with other people - I always recommend using the “Minimal” OS and loading Klipper using KIAUH.

I’m not sure what you mean by this.


One comment about posting, when you’re posting code use the “Preformatted Text” option (“</>” in the toolbar):

zac@echo ~> getconf LONG_BIT 32

it avoids a lot of the weirdness you get when you try to copy and paste from a terminal window.

2 Likes

Not on this printer :slight_smile: this is using MainsailOS. I downloaded the image directly from their repo instead of using the raspi imager utility to automatically download it. I swapped to using the armhf image instead of the default arm64 image.

The ‘imager’ utility allows you to install MainsailOS by just selecting menu items. However, you don’t get a choice between the 32-bit or 64-bit version of the OS (or at least I don’t). I see in older screenshots however that there used to be an option to choose between 32 and 64.

Hopefully more updates tonight or tomorrow. We had some inconsistency in the Z probe height offset last night when trying to start a test print and I need to let it heat soak or something more in the start macro. Hopefully just need to heat soak otherwise we also need to swap out our bed probe.

If you continue to have problems, I suggest that you download a minimal OS and then load Klipper using KIAUH. The time difference isn’t huge and you can ensure you have the latest versions of all the code as well as nothing extra (which is where I think the problems come in).

This also allows you a more apples to apples comparison between the 32bit and 64bit versions of Klipper on your printer.

Finally, how did you set up your CAN Bus? I recommend the Esoterical guide as it is well maintained and responsive to issues:

2 Likes

Update: Switching to the 32-bit OS seems to have resolved it. Frankly no idea why. There are no longer massive amounts of re-transmitted packets, total print time now since the fix is probably 10+ hours. On to the next issues!

3 Likes

Update: Bad news. It crashed again after a few more hours of print time. Slightly different errors this time.

klippy.zip (1.1 MB)

Seems to just timeout:

image

This time, bytes_invalid actually DOES increment before the shutdown occurs. So, that’s cool I guess? The bytes_retransmit started incrementing again before the shutdown as well, but still not nearly as bad as it was. TX retries still seems to increment linearly but I still don’t know what that means. Does anyone else’s canstat’TX Retries’ increment throughout the print?

(RX_errors started going up before shutdown as well)

The wiki states bytes_invalid is not a hardware issue, but I’m not sure how the software could be randomly starting to fail after printing for hours? I dunno. Maybe one of the 120R jumpers fell off.

This ↑ contradicts esoterical’s guide on troubleshooting similar errors, where it states that it probably is a bad connection. I dunno. We will continue to investigate. If I have to hook up a logic analyzer and capture gigabytes of data, I guess I will.

Interesting that you said…

Switching to the 32-bit OS seems to have resolved it. Frankly no idea why. There are no longer massive amounts of re-transmitted packets, total print time now since the fix is probably 10+ hours. On to the next issues!

Regarding your latest log.

Line 45572:
serialhdl.error: mcu 'EBBCan': Serial connection closed

from line 46386:

Extruder max_extrude_ratio=0.266081
mcu 'mcu': Starting serial connect
webhooks client 1970521648: New connection
webhooks client 1970521648: Client info {'program': 'Moonraker', 'version': 'v0.9.3-104-g52e3158'}
Loaded MCU 'mcu' 132 commands (v0.13.0-211-gedbfc6f85 / gcc: (15:12.2.rel1-1) 12.2.1 20221205 binutils: (2.40-2+18+b1) 2.40)
MCU 'mcu' config: ADC_MAX=4095 BUS_PINS_i2c0=P0.28,P0.27 BUS_PINS_i2c1=P0.1,P0.0 BUS_PINS_i2c1a=P0.20,P0.19 BUS_PINS_i2c2=P0.11,P0.10 BUS_PINS_ssp0=P0.17,P0.18,P0.15 BUS_PINS_ssp1=P0.8,P0.9,P0.7 CLOCK_FREQ=120000000 MCU=lpc1769 PWM_MAX=255 RESERVE_PINS_USB=P0.30,P0.29,P2.9 STATS_SUMSQ_BASE=256 STEPPER_OPTIMIZED_EDGE=15 STEPPER_STEP_BOTH_EDGE=1
mcu 'EBBCan': Starting CAN connect
Created a socket
mcu 'EBBCan': Timeout on connect

Measuring temperature on your “EBB SB2209 CAN Toolboard”

[temperature_sensor EBB_NTC]
sensor_type = Generic 3950
sensor_pin = EBBCan:gpio28

You could try for the internal sensor…

[temperature_sensor EBB_NTC]
sensor_type = Generic 3950
sensor_pin = EBBCan:gpio28

[temperature_sensor EBB_int]
sensor_type: temperature_mcu
sensor_mcu: mcu
min_temp: 0
max_temp: 100

…but never tried!

I don’t know the RP2040 that good!
Are you using the internal temperature sensor? I guess not.

If not see here Raspberry Pi Pico using onboard Temperature Sensor.

Question to all: Has anyone used the internal sensor of the RP2040 with Klipper?

Could be a temperature problem on your RP2040?

Thanks for the reply!

Sounds like you’re implying the MCU might be overheating and causing issues; I hadn’t really considered that. It looks like klipper supports reading the RP2040 directly so I will add that when I get a chance. I doubt that 58C would cause issues, but I suppose it’s possible :slight_smile:

Update: Went through about 9hrs of printing last night, without issue:

Then, when pulling filament out of the extruder this morning it crashed with a timeout error, but no bytes_invalid or anything. Just:
Timeout with MCU 'EBBCan' (eventtime=54867.031712)

Wondering if the filament run-out sensor is somehow messing with the rest of the tool-head… :melting_face:

I think the only other change was running updates through mainsail.

Why don’t you do a test without your filament run-out sensor?

2 Likes

I took a brief look at your logs. It’s not clear to me what the root cause of the issue is. If I had to guess, though, I’d guess it is some kind of canbus issue between the rp2040 and the u2c (for example, wiring, resistors, wire crimps, one of the chips overheating, or similar).

The tx_retries isn’t necessarily an error - it can happen during normal activity, but you are showing a very large number which is odd. Also, the incrementing of the rx_error is suspicious (along with it coming in large bursts).

-Kevin

1 Like