CANBUS Communication timeout while homing Z

So just as a brief update - I had similar issues on mainline klipper (and obviously I couldn’t actually print anything with this since my extruder stepper driver is not currently supported). Out of frustration I gave up, switched to USB and… had the exact same issues. So this is probably a multiMCU homing issue moreso than a can-bus issue per se, and CAN just happens to be how most of us implement multimcu homing for the first time.

I remembered a while back I was having timing issues when klipper was using all 8 cores (4 big, 4 little). I did in fact limit klipper’s affinity to the 4 big cores at one point, but at some point I must have updated the OS or something and that was lost. I reimplemented that affinity change and overall things improved significantly, but once in a while I was still getting the timeouts. So now I further limited things to two cars that share a hardware clock - I am hoping that solves it. May just try limiting to a single core at some point as well.

To change the affinity for the klipper service, add the line

CPUAffinity=[CPUs] to your /etc/systemd/system/klipper.service file in the [Service] section. In my case I’m using

CPUAffinity=6 7

Which limits the klipper service to cores 6 and 7. Obviously different setups may need different configurations, and different setups might have different issues altogether.

tl;dr – multimcu homing with a 25ms window is hard, there’s probably a million things that contribute to timing issues. Limiting CPU affinity helped in my case.

Edit 2/8: Just for more info, I’ve done several prints and haven’t had a homing issue yet since making the above change. This is the longest I’ve gone without having a homing fail.

1 Like

So I can add something here. I have an Octopus 1.1 flashed with the Klipper’s USB-CAN bridge with an EBB42 with canboot and klipper’s can firmware flashed with the Klipper’s USB-CAN bridge and I keep hitting the timeout error also. A lot really. I tried to set the CPUAffinity which didn’t help, actually made it a bit worse, and the only thing that gets mine printing properly is changing the TRSYNC_TIMEOUT

What does your CAN wiring look like?

Are you using a premade twisted pair or parallel cable for the two CAN data lines?

Mine is still going strong 1 month later with the CPU Affinity change. Unfortuantely I think there’s enough possible causes to to this problem that what works for one person won’t work for another.

My other printer which is running a Xeon 1275L V3 (haswell based low voltage quad core) has never had a timing issue ever. Brute forcing power seems to go a long way, and I suspect x86 linux just works better for some of these things.

Hey @koconnor, I don’t suppose you could share an example of how you determined this? I looked at the logs posted and couldn’t figure out where you were looking. I’m having similar problems and would like to understand how to analyze the logs to find the same thing.

Here’s my log with the Communication timeout during homing z error:
klippy.log (657.6 KB)

I’ve done CANbus stress testing (using cangen, candump, and watch ip... to watch for errors) while futzing with the cables in the chamber to see if maybe it’s a loose connection or something and none of that has borne fruit. :confused:

I’m trying to figure out what the next steps do diagnosis are…

1 Like

Analyzing the logs is a time intensive activity. I start by using the logextract.py tool as described at Debugging - Klipper documentation . That shows the low-level commands and responses sent between host and micro-controller. With that info, and low-level knowledge of what commands and responses should occur, I try to understand what actually occurred.

I’d analyze your log to demonstrate it, but I noticed you have modified Klipper (at a minimum by deploying a led_effect and ercf module). If you are looking for assistance you should start by reproducing the issue with the pristine Klipper code.

-Kevin

Could you explain, how you decide, if a Klipper installation is modified?

What should a user of your wonderful SW check, before asking?

Quite easy:

  • Klipper displaying the dirty flag OR
  • Configuration apparently containing sections that are not in mainline and thus can only come from “extra” modules
1 Like

Want to join here, have the same problem.
Two Voron 2.4r2 with rpi3b, octoprint as usb-can bridge, fysetc can toolhead, both terminated.
fresh rpi image mainsail 64 bit, many errors during x-home and gantry level, even during print → abort / get every 2 quad gantry level a timeout
fresh rpi image mainsail 32 bit, sporadic timeout errors during x-home and quad gantry level, 1 error on 20 quad gantry level
the errors on both machines are not equal, identical build and equipped, but the one has nearly quadtimes an error
deleted webcams - no change
no errors on y-home / z-home → maybe quality on can bus - will try other cable

klippy(3).log (6.7 MB)

mcu (stm32f446xx)

Version: v0.11.0-86-g6026a99a

Load: 0.00, Awake: 0.00, Freq: 180 MHz,

0


mcu head0 (stm32f072xb)

Version: v0.11.0-86-g6026a99a

Load: 0.02, Awake: 0.00, Freq: 48 MHz,

2


Host (armv7l, 32bit)

Version: v0.11.0-50-gf57ff2c0

OS: Raspbian GNU/Linux 11 (bullseye)

Distro: MainsailOS 1.0.1 (bullseye)

Load: 0.84, Mem: 257.6 MB / 870.6 MB, Temp: 63°C

eth0 (10.40.0.201) : Bandwidth: 1.3 MB/s , Received: 238.7 MB , Transmitted: 11.3 GB

can0 : Bandwidth: 0.6 kB/s , Received: 749.7 kB ,

So this is still working well for me since making the changes I described above earlier. I even tested adding a 3rd MCU (X and Y on 1, Z on another, and then toolhead) and it still worked fine.

That said, I have noticed that the bus load when homing Z / probing approaches 20-25% (on a 1M setup), while simply moving Z the load stays at 1-2%. Is this expected behavior?

I’ve tried both a made cable where I twisted the two can data lines myself and a premade can cable that had the 2 power and 2 data lines.

Where did you find the premade CAN cable?

Fabreeko has some and BTT. Now you would need to remove the end and crimp on an end that works for you.

I think I have resolved my issue with a solution that nobody has tried or mentioned yet. It took several attempts to get a print going without the error occurring until now.

My 120R resistor for CAN on my EBB36 was loose and did not fit snugly. I simply bent one of the pins so that the jumper would fit snugly and I have not had any issue since then. So if you have a jumper for the resistor make sure it has a nice snug fit.

I can totally confirm this. Downgrade to 32Bit mainsailos fixed my canbus. it seems it’s a 64bit issue cause I’m running bullseye successfully

1 Like

I can also confirm this solution with 32bit Mainsail OS on RPi 3 B+.

My setup is with MKS Monster 8 in USB to CAN bus bridge mode and the MKS THR42 on the toolhead with approx. 1m unshielded untwisted H and L wires running at 1.000.000 bitrate and 1024 txqueuelen all with the latest official klipper firmware.

Before changing to 32bit I had the described issues and after some time qgl or homing would fail with timouts. Long prints stopped with ‘Got error -1 in can write: (105)No buffer space available’.

Now it runs stable and with 1.000.000 bitrate even TEST_RESONANCES runs fine up to 133Hz instead of failing at around 100Hz.

Even after fixing the 120R resistor in my previous comment the error was greatly reduced but not completely eliminated. Since I have a low profile ice tower on my Pi I tried to overclock the Pi because of the comments about 64 bit OS. After doing this the error almost never happens but still every once in a while it will.

over_voltage=6
arm_freq=2000
gpu_freq=750
gpu_mem=512

It appears the 64 bit issue is the main culprit here. Hopefully a kernel or OS update will fix this at some point. Also my canbus is configured at a 500k rate not 1 million.

It would be nice if this error triggered a retry instead of hard stopping the print. Would not be as annoying at least.

Hello,
same problem,
I have BTT Klipper PI, BTT EBBCAN, TAP and Octopus
the new printer was completed 5 days ago, but the problem occurred now.
I’m new in Voron and all things.

How to downgrade mainsail to 32?
or edit timeout and wait for the patch?

thank you and sorry

This issue came back with vengeance for me. Overclock no longer resolves the issue. Also went through the trouble of re-installing everything on a 32-bit OS. Also tried disabling all of my LEDs thinking maybe it is a bandwidth issue. Just constant communication timeout errors. Half the time I can’t even get through QGL any more.

Did you check your wiring and connectors?

1 Like