CANBUS Communication timeout while homing Z

charlespick · August 5, 2022, 4:35pm

Whenever I let me printer heat soak for an hour before printing, the probe always gives a “Communication timeout while homing Z” error. Like it will fail 5 times in a row. Restarting the machine with firmware restart immediately fixes the problem. When I try running another print after that one completes, it fails again. Restart and it works again.

I’m running an MKS Canable 1.0 with candlelight, 500000bps 1000txqueuelen, and BTT EBB 36 1.0

I suspect there’s some clock drift that gets worse over time and restarting syncs the two controllers together again. Otherwise how would I begin troubleshooting this?

charlespick · August 6, 2022, 3:49pm

And also Communication timeout during homing probe

Boxxy · August 8, 2022, 10:39pm

Communication time out during homing fix

Changed TRSYNC_TIMEOUT value in “/home/pi/klipper/klippy/mcu.py” file from 0.025 to 0.050, i.e. was “TRSYNC_TIMEOUT = 0.025”, and became “TRSYNC_TIMEOUT = 0.050”, and the error “Communication timeout during homing probe” disappeared

charlespick · August 9, 2022, 4:47am

Is there anywhere I can see the actual “ping” times that I’m getting. While this is nice, it doesn’t solve the problem of clock drift if that is the problem, or otherwise explain the behavior I’ve reliably caught now.

charlespick · August 9, 2022, 5:59pm

koconnor · August 14, 2022, 9:21pm

It’s impossible to say what could cause that without seeing the full Klipper log from the event.

-Kevin

charlespick · August 20, 2022, 9:43pm

Here’s one log file with many of them in it:

klippy.log.2022-08-09.zip (3.1 MB)

koconnor · August 24, 2022, 2:43am

Your log is showing an incrementing “bytes_invalid” counter for the “tool” mcu. This was a symptom of an older version of candlelight firmware that was severely broken (it would reorder packets). You should confirm that the candlelight firmware is the latest and ensure that the “bytes_invalid” counter is no longer incrementing.

If it is that old version of candlelight, then it must be fixed as Klipper’s CANbus implementation is unlikely to be stable with reordered packets (even if you don’t get “communication timeout during homing” errors).

-Kevin

charlespick · August 24, 2022, 2:58am

i’ll flash the latest version again but the candlelight isn’t that old, it’s from about a month ago now.

charlespick · August 24, 2022, 2:24pm

I flashed latest candlelight firmware last night and then left the printer overnight and just tried homeing again and got the same issue. I also lowered my transmission rate to 250k so my cabus should be fully compliant with the distance I’m running. Fully twisted pair, termination resistors at each end. Still failing to probe and still incrementing invalid bytes.

Same problem appears to be coming up here

I did some research and I’m also fairly certain that the candlelight firmware Kevin was talking about was from far before I started using canbus toolheads. Unfortunately I don’t think that is where the problem here is.

koconnor · August 24, 2022, 3:31pm

I analyzed your first log and it shows that the timeout error occurs due to lost messages between toolhead and host. At the same time, the “invalid_bytes” counter increases for the “tool” mcu. So, the homing issue is definitely a direct cause of the issue causing incrementing “invalid bytes”. Whatever the root issue is, it will need to be fixed to get a stable connection.

It is possible that the “invalid bytes” issue is a result of lost canbus messages between toolhead and host. However, it is odd that there is no indication of lost host messages sent to the toolhead (the retransmit counters are not incrementing).

I do not recommend using a canbus speed below 500000. If anything, you’ll want to go up to 1000000. A speed below 500000 does not provide enough bandwidth to accurately perform adxl345 resonance measurements. Lower speeds also notably increase the message round-trip-time, which tends to exasperate communication issues.

Finally, if you don’t mind experimenting, you could try flashing Klipper in “usb to canbus bridge” mode to your canbus adapter. If the problem persists with Klipper on the adapter then it would likely rule out any issue with candlelightfw.

Cheers,
-Kevin

koconnor · August 24, 2022, 3:40pm

Oh, another way for you to debug the issue is to perform a Linux capture of the canbus, and then align the homing failure with the actual messages on the canbus. (If you go this route, you’ll need to research the low-level protocol, research the canbus protocol, and align the timestamps between captures/logs - so expect to invest notable time on it).

The candump utility can be used to take canbus captures - for example: candump -t z -Ddex can0,#FFFFFFFF

Cheers,
-Kevin

charlespick · August 24, 2022, 5:54pm

I’m curious which part of my log you were looking at. There’s a big portion where the bytes_invalid isn’t incrementing.

I’ve built as CANable_MKS_fw (I have an MKS canable 1) and reflashed and am not seeing bytes_invalid at all. I will leave the printer for a while and see if the problem is reproducible after time.

This theory about bytes_invalid is important, but I don’t see how that relates to the behavior I’m seeing where after a restart and for about 30 minutes it’s perfectly fine and then after waiting, (especially after the printer has been idle for hours) it fails reliably.

charlespick · September 2, 2022, 10:53pm

this does seem to have fixed the issue. is it possible the reordering of packets causes clock drift?

eikaf · September 16, 2022, 11:40pm

Hi guys, i have got same problem whit my ebb 1.2 connected to raspberry pi through usb c cable. Problem is appearing randomly while doing 9x9 bed mesh with my klicky probe.

Do you have any hints? I will provide log as soon as i can Save it to my pc.

Thanks.

RebelPhoton · September 19, 2022, 10:23pm

candump-2022-09-18_180603.log (115.8 KB)
canbusload_rp2040_ebb36_12_homing.txt (3.6 KB)
klippy-rp2040.log (2.5 MB)
iplinkstats.txt (1.7 KB)

I’m on the same boat with a BTT EBB36 1.2 toolhead board. I’ve tried with a BTT U2C 1.1 adapter, using the firmware on the BTT github first, and then I’ve tried the canable.io web flasher and the candlelight_fw v2.0 release.

The web flasher didn’t result in a usable board for me but the other two flashed from stm32cubeprogrammer worked ok.

My other board is an SKR Mini E3 2.0 and I was using it over uart originally but tried flashing it to USB for troubleshooting.

After reading this thread I was able to flash a raspbery pico board with the USB to CAN Bus bridge option and a can transceiver connected.

All my tries have the same result. I can see in mainsail that the board connects, and I’ve verified that extruder motor, fans, heater, MAX31865 with PT1000 and slideswipe probe all work ok. I get communication timeouts while homing. Sometimes it works for half a second, sometimes for a bit more. A couple of times I could finish homing but then failed to probe a bed mesh.

I’ve tried 250.000 and 500.000 baud. Without canboot first and with canboot later. Two different DIY cables, using different connectors (molex first, jst/ferrules later) to discard bad crimps. Bus resistance is measusing 60 ohms between can_h and can_l.

I’m attaching a capture with canbusload and candump while trying to home.

What should I try next?

jakep_82 · September 19, 2022, 11:06pm

I use 1M baud and have no issues.

koconnor · September 19, 2022, 11:33pm

Alas, the log did not indicate the cause of the issue. Please retry and issue an “emergency stop” immediately after the “communication timeout” event. This will cause Klipper to write additional information to the log. Please attach that full log here. Please also indicate what usb to canbus adapter you were using during that log (canable, pico, etc.).

-Kevin

RebelPhoton · September 20, 2022, 8:19am

Hi Kevin, thanks so much for answering. I’m attaching another capture of a single homing try and e-stop. I’m using the raspberry pico canbus adapter.
ipLinkStatsBeforeAfter.txt (1.8 KB)
canbusload_rp2040_ebb36_12_homing-2.txt (1.4 KB)
candump-2022-09-20_090655.log (33.6 KB)
klippy-can2040-homing-estop.log (152.0 KB)

I’m not sure what else to try. I have a second unopened EBB36 1.2 but do you think it’s a hardware fault?

koconnor · September 20, 2022, 5:07pm

The log indicates that the canbus was unable to send messages from mcu to host for ~28ms. The micro-controller correctly detected the loss of communication and halted the homing action.

It’s unclear why the bus was unable to communicate for an extended period of time. Eventually all messages were received, so whatever the root case was, it eventually cleared without any software based retransmits.

I’d say double check the wiring and terminating resistors, but it sounds like you’ve done that already.

The candump capture you attached correlates with the klippy.log file - both show the communication lapse.

I’m not sure what the underlying issue is. Next step in debugging would be to place a logic analyzer on the can_rx/can_tx lines to see what is happening on the bus at the time of the failure. That is a lot of work though.

-Kevin

Topic		Replies	Views
Communication timeout during General Discussion	42	8008	November 26, 2024
Configurable timeout Features	26	2139	March 27, 2024
CAN Hat RPi , Octopus, EBB, all over CAN General Discussion	10	3520	May 7, 2024
Communication timeout during homing probe General Discussion	8	4301	March 7, 2024
Problem with CANBUS during print/idle General Discussion	81	8310	December 6, 2024

CANBUS Communication timeout while homing Z

Related topics