CAN Hat RPi , Octopus, EBB, all over CAN

Basic Information:

Printer Model: V-Core 3 500
Klipper: v0.10.0-584-g7527e57e
CanBus: dev-canbus-updates-20220629
MCU0: Raspberry Pi 4
MCU1: Octopus 1.1
MCU2: BTT EBB 36 V 1.1 SMT32G0B

The setup(s):

Off the deep end with CAN bus. Started out with a USB connected Octopus in USB to CAN bus bridge with a BTT EBB connected to the Octopus’s RJ11 CAN interface. 10+ hours of printing with out a single issue. Then I got cocky.

Moved the EBB CAN connection from the Octopus to a Raspberry Pi CAN hat. Minimal testing and all seemed to work as expected.

Currently I have the Octopus and EBB both configured to work with CAN as their communication interface. Both boards a loaded with CanBoot and Klipper. Physical wiring goes from the Octopus (20cm cable) to the CAN Hat to the EBB (100cm cable). 120 ohm termination resistors on both ends Octopus and EBB. 120 ohm termination has been removed from the CAN Hat board.

Octopus w/ 120 ohm, 20cm wire length <–> RPi w/ CAN Hat board /wo 120 ohm <–> EBB w/ 120 ohm, 100cm wire length.

The problem(s):

I can not seem to get this setup to work properly. The machine does move to the correct axis, then I get the following error(s):

Communication timeout during homing probe
Communication timeout during homing x

At speeds of 250K things seem to work, but I get the above timeout(s) while homing. At 500K, if I am lucky I can get the machine to start a print before an mcu times out. At 750K all hell breaks loose and I have a hard time even flashing klipper over the CAN bus. I believe I have the CAN bus wired correctly with 120 ohm resistors at the ends.

Flashing:

I flash CanBoot with the dfu-util command (re-compiled for the target board):

dfu-util -a 0 -d 0483:df11 --dfuse-address 0x08000000:mass-erase:force -D out/canboot.bin

Then use the flash_can.py to find and upload klipper (re-compiled for the target board):

python3 flash_can.py -q
python3 flash_can.py -i can0 -f ~/klipper/out/klipper.bin -u Octopus/EBB_serial_number_here

Plea for help:

I am not sure where to go from here. My printer.cfg and all .configs are attached. Please let me know if it looks like I am doing something stupid, or if I might just be a bit too far on the cutting edge.
Thanks
Adam
debug.zip (10.4 KB)

If it was working with the Octopus in bridge mode, why did you add the hat? It’s not necessary, less reliable than the Octopus, and adds cost.

1 Like

I have similar issue

I use a Utoc 1 + Ebb42

Until a few days ago I had it connected by usb, but I have changed the connection to Can… In my case it does the Homing ok, including I have been able to get the bed mesh … But when printing, in the first layer I skip this error, and the printing stops

MCU ‘EBBCan’ shutdown: Timer too close
This often indicates the host computer is overloaded. Check
for other processes consuming excessive CPU time, high swap
usage, disk errors, overheating, unstable voltage, or
similar system problems on the host computer.
Once the underlying issue is corrected, use the
“FIRMWARE_RESTART” command to reset the firmware, reload the
config, and restart the host software.
Printer is shutdown

Utoc 1 looks like a USB to CAN adapter?

Interesting information. #Peurif, can you let me know if you are getting errors on the CAN bus? Specifically after you get the “MCU ‘EBBCan’ shutdown”.

On the computer/Raspberry Pi that runs Klipper run the following command:

ifconfig can0

I have a sneaky suspicion this might be due to cable impedance. AKA, the wrong type of wire for running CAN data over. Here are the results of my ifconfig, note “RX errors”:

pi@ratos:~ $ ifconfig can0
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 128 (UNSPEC)
RX packets 8445 bytes 66507 (64.9 KiB)
RX errors 4 dropped 0 overruns 0 frame 4
TX packets 6890 bytes 46567 (45.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

#jakep_82. I am not going “Off the deep end with CAN bus” because I want reliability. I want to explore what is beyond the known. Or more simply put “nothing ventured, nothing gained”.

Not much is unknown at this point. Klipper has supported CAN for about 18 months, and a number of people (including me) have been using it that long on printers. It simply flew under the radar until recently due to the chip shortage and a lack of board availability.

Regarding the CAN hat, it won’t reliably go above 500k. This is known per tests performed by many people including the Klipper developers, and it’s why the Klipper docs suggest using 500k even though the CAN spec supports 1M.

If you really want to add another device to the mix, use a USB to CAN adapter like the MKS CANable clone. It uses candlelight firmware, and it’s reliable at speeds up to 1M in my testing.

hmm, I hadn’t thought of that possibility… I’m using 22 AWG silicone wires. Because they were very flexible

Specifically these?

What cables would be the most recommended, for a distance of approximately 1 meter?

I have read that Ebb42 recommened bitrate it’s 250000.

I am in wrong?

I am running a Octopus Pro 429 as a CAN bridge, connected to a EBB42 over CAN, at a speed of 750k.
Wiring is ± 140cm twisted pair telephone cable.

I have not experienced any trouble with this setup. I have done 2day+ prints without any issues.

I also have an EBB42, running on CAN through an SKR-Pico (thanks to the awesome rp2040 CANbridge support)

I dealt with “communication timeout during homing probe” on and off since I got the EBB.

I use a shielded cable for the (very short) wiring from the bltouch to the EBB, and a Igus Chainflex CF113-007-D which is a 4-conductor 22AWG cable with great shielding that’s specifically designed for CANbus use in robotics with frequent motion/bending so I’m fairly confident my wiring is not the issue.

None of that made any noticeable difference. I would still get that error fairly regularly and with no discernable pattern. Rebooting the pi doesn’t help, nothing. I experienced it with a Creality board and the Waveshare CANhat, with the SKR-Pico over usb and the Waveshare CANhat, with a separate rp2040 board acting as the CANbridge, etc.

What worked for me, and worked perfectly, was to change a single value in ~/klipper/klippy/mcu.py

Changing TRSYNC_TIMEOUT = 0.025 to TRSYNC_TIMEOUT = 0.050 (on Line 128) completely solved the problem for me.

It should be noted that doing so will make your klipper directory “dirty” and mainsail/fluidd will not want to update it without reversing that change.

You will have to pull updates yourself. If that mcu.py file is ever modified in a future update you will have to tell git how to handle your changes against the new changes. (Or just delete the mcu.py file and edit it again after the pull completes)

Not a perfect solution in that it involves changing the source files (very slightly) but it does solve this issue for me completely so I think it’s worth mentioning if this problem was frustrating anyone else as much as it was me.

Wow, tons of good info! Much of it helped me hammer out the below debugging process.

After 20 years as a full time Linux systems administrator I have gotten good at bug hunting. As such I started down the path of running this to ground. My test was simple, load CanBoot, then flash Klipper 3 times over the CAN bus. Check for bus errors (ifconfig can0). I realize this is a VERY simple test. I will use test prints for more in depth testing. I made a simple 120 ohm jumper for the Pi CAN Hat as a terminator and directly connected the Octopus 1.1 or EBB board to the Pi CAN Hat.

Base line testing

Testing at 250K - Oct
Flashed clean, TX errors 1 dropped 1 overruns 0 carrier 1 collisions 0
Flashed clean
Flashed clean

Testing at 500K - Oct
Flashing ERROR:root:Can, Read Error, RX errors 1 dropped 0 overruns 0 frame 1
Flashed clean
Flashed clean, RX errors 1 dropped 0 overruns 0 frame 1

Testing at 750K - Oct
Flashing ERROR:root:Can, RX errors 5 dropped 0 overruns 0 frame 5
Flashing ERROR:root:Can, then flashed but failed to re-connect for verfy, RX errors 13 dropped 0 overruns 0 frame 13
Flashing ERROR:root:Can, RX errors 5 dropped 0 overruns 0 frame 5

I got errors at ALL speeds tested. At this point I started debugging the data path starting with the Raspberry Pi’s SPI interface. First chip in the path is the MCP2515 connected via SPI. The data sheet calls out “High-Speed SPI Interface (10 MHz)”. But the CAN Hat calls for a 2MHz SPI frequency. That’s WRONG! Running at 500K @ 2MHz SPI won’t be stable as the sampling rate is just too small for reliable reads.

Lets crank up the SPI interface speed!

#CAN Hat enable (/boot/config.txt)
#dtoverlay=mcp2515-can0,oscillator=12000000,interrupt=25,spimaxfrequency=2000000
dtoverlay=mcp2515-can0,oscillator=12000000,interrupt=25,spimaxfrequency=5000000

Testing at 750K - Oct + spimaxfrequency=5000000
Flashed clean
Flashed clean
Flashed clean

Testing at 750K - Oct + spimaxfrequency=10000000 #10MHz
Flashed clean
Flashed clean
Flashed clean

Testing at 1M - Oct + spimaxfrequency=10000000 #10MHz
Flashing ERROR:root:Can, but still finished succeffuly, no interface errors
ERROR:root:Can Flash Error
Flashing ERROR:root:Can, RX errors 3 dropped 0 overruns 0 frame 3

Well, well, well! The data shows that the CAN Hat, or the Octopus is stable up to 750k (With this extremely limited short test). Does this same apply to the EBB?

Testing with a 10MHz SPI interface speed

Testing at 250K - EBB at spimaxfrequency=10000000
Flashed clean
Flashed clean
Flashed clean

Testing at 500K - EBB at spimaxfrequency=10000000
Flashed clean
Flashed clean
Flashed clean

Testing at 750K - EBB at spimaxfrequency=10000000
Flashed clean
Flashed clean
Flashed clean

Testing at 1M - EBB at spimaxfrequency=10000000
ERROR:root:Can Read Error, but still finished succeffuly, no interface errors
Flashed clean
Flashed clean

Based on this new information I am going to make an assumption that my configuration is “stable” at 750k. Now to put everything back on my CAN bus and see what happens at 750k.

750K at 10MHz SPI testing

#Here goes nothing!
Testing at 750K - EBB + Octopus @ spimaxfrequency=10000000
EBB, Flashed clean
EBB, Flashed clean
EBB, Flashed clean
OCT, Flashed clean
OCT, Flashed clean
OCT, Flashed clean
ZERO bus errors!!!

IT LIVES!!! Klipper is up, both MCU’s are running and homing + full 7x7 bed leveling passed with flying colors. Then it happened :frowning: :

CRAP!

12:12:43 Communication timeout during homing probe
12:12:43 Communication timeout during homing probe
12:12:43 Communication timeout during homing probe
12:09:30 Communication timeout during homing z
12:09:30 Communication timeout during homing z
12:09:30 Communication timeout during homing z

And bus errors… ARG!

ratos:~ $ ifconfig can0
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 128 (UNSPEC)
RX packets 472032 bytes 2948451 (2.8 MiB)
RX errors 12 dropped 0 overruns 0 frame 12
TX packets 1047331 bytes 7780160 (7.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Flashed everything back to 500k. Ran a print that took 46 minutes, but still getting some RX errors.

Errors @ 500K with speaker wire connecting to the EBB

ratos:~ $ ifconfig can0
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 128 (UNSPEC)
RX packets 586568 bytes 3417757 (3.2 MiB)
RX errors 13 dropped 0 overruns 0 frame 13
TX packets 2452833 bytes 18673304 (17.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Errors are still happening, even at 500k. Without the interface status, specifically errors, on the Octopus and EBB I am not able to localize where the issue(s) are. I was using leftover red/black 24awg wire to make the long run to the EBB (AKA speaker wire). It was about 100cm. This should REALLY be twisted pair. So I replaced the EBB connection wire with a single twisted pair, multi strand, that I removed from a CAT5 network cable. Started the print again. Took 49m and still got some RX errors, but nothing the effected the print. Did the new cable help? 13 VS 10 errors? That’s way to close to call:

Errors @ 500K with twisted pair wires connecting to the EBB

ratos:~ $ ifconfig can0
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 128 (UNSPEC)
RX packets 592436 bytes 3449166 (3.2 MiB)
RX errors 10 dropped 0 overruns 0 frame 10
TX packets 2429379 bytes 18500590 (17.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

At this point my printer is running, stable~ish. Both MCU’s are on the CAN bus @ 500K. I think finding the SPI speed problem was a BIG win. Before that nothing was working. I also know that the CAN protocol has tons of error checking built in, as it was intended for less than nice environment. With out more information I can not say that those 10 RX errors are bad. The could be any number of errors, many of which will not effect the print.

Klipper data rate? In 2940 seconds Klipper transmitted around 17.6MiB of data. That’s around 6.28Kb/s, or about 6428 bytes per seconds. My current CAN bus is set to 500000B/s… We’re good! Unfortunately I think I need 500K\s to keep the response time up, otherwise klipper will timeout.

I am not sure where to go from here. For my part I am VERY happy with the outcome. I will try some larger 10+ hour prints in the coming days and see if anything nasty shows up. I hope this information will be useful to others on their CAN-Klipper journeys.

1 Like