Communication timeout during

Basic Information:

Printer Model: Voron 2.4
MCU / Printerboard: Octopus, Canable, EBB36
klippy.log (1.3 MB)

Recently reflashed my pi, eb36, and octopus. I’m getting pretty frequent communication timeout during homing errors. I’m using a cannable for my toolhead, just usb for the octopus. Issue comes up about once every other print I want to say? Always during qgl or mesh leveling. I’m running 500,000 bitrate on the canbus, terminating resistors, a twisted wire of a standard length for a 2.4 300mm with cable chains. Any ideas what I can do to get this reliable again?

Please have a look on these:

1 Like

Hi,

I didn’t find any shorts, all my connectors are looking fine, and the printer is able to print for hours at a time so I don’t think there’s any problems in make menuconfig. The issue comes up when the heaters are off so I don’t think it’s a main power supply issue. The printer also stays on and I’m able to move the toolhead after one of these errors. Klipper isn’t shutting down, it’s just canceling the probing move.

Charles

See Search results for 'timeout homing' - Klipper
There are various reports with different solutions. Maybe on fits your case.

1 Like

There has been considerable discussion of this issue in the github issues list. Most people have resolved it by running bitrate at 1M (1000000), setting buffer to 1024, and failing that, changing to 32-bit version of Bullseye on their pi. It can be tough to chase down. I also personnally mitigated the problem on my 2.4 by not running loads of RGB leds via CANbus while probing/QGL.

I’m running pi4, Octopus 1.0 in bridge mode, and Fly SB2040.

1 Like

I see in one of these threads, Kevin refers to the srtt, rttvar, and rto values, but I can’t find any information about what units these are in or what they are. When the timeout happens these values appear to be:

srtt=0.001 rttvar=0.000 rto=0.025

Which seems pretty low given the number of significant figures logged, regardless of what the units are. Nonetheless, is there anywhere that details what all the items in the log are and their units?

I feel your pain,

I had a similar problem and improved, but did not fix things by the following:
-updated the communications speed from 500K to 1M
-increased the canbus transmit buffer size in Linux
-stopped any neopixel updates when probing

I tried basically the same config with a Pi3B+, PiZero2W and an i5 and oddly enough the Pi Zero seemed to not exhibit the problem (huh??).

My solution now is to just run the probe connection directly back to the Octopus. It seemed to be some timing issue between the probe detecting and Klipper coordinating it with the z movement.

I should add that I tried both an EBB42 and a FLY-SHT42 and with the SHT42 connected via a USB CAN adapter and directly to the Octopus pro CAN connector and the result was the same.

I’m not an expert on these numbers, but they look similar to the values in my logs, so likely not the issue’s root cause.

I was able to get the timeout to occur by bed probing twice in a row, failed about halfway through the second run. I ran the M112 right after and exported the log but I have no idea how to read the log. Can someone tell me if there’s reordered messages or lost messages here?

klippy shutdown after failure.log (284.7 KB)

Thanks,
Charles

If you don’t mind checking for me, how many bytes_invalid do you get during a sequence of 75 probes or so? I would assume 0 would be ideal but maybe it’s normal to have some?

See CANBUS Troubleshooting - Klipper documentation . Your log indicates you have an incrementing bytes_invalid counter. That is an indicator that something on the canbus is reordering packets (either the canbus adapter or the linux kernel). This must be fixed - the printer will continue to be unstable until it is fixed.

-Kevin

Hi Kevin,

When it says

  1. Some Linux kernel builds for embedded devices have been known to reorder CAN bus messages. It may be necessary to use an alternative Linux kernel or to use alternative hardware that supports mainstream Linux kernels that do not exhibit this problem.

what builds are susceptible to this? I am running Debian GNU/Linux 11 (bullseye) (64bit)

I don’t know. The kernels for embedded devices are all over the place. To the best of my knowledge the mainstream kernel.org kernels don’t have this severe CAN bus networking bug (though I’ve never explicitly tested it).

I haven’t seen the issue on my devices (standard rpi kernels).

-Kevin

This is different from the kernal version right? How do I tell what build I’m using? I flashed RpiOS using Raspberry Pi Imager. I’m going to try connecting the usb cables to another computer, although that’ll be a vm which might introduce other issues. Never hurts to try

Well, that didn’t work. The communication timeout happens almost immediately when running klipper in debian with python3 in a vmware vm on a m1 macbook, plus an additional usb hub and new cables. That’s alot of different pieces so I’m not sure what is causing the problem.

Interestingly however, when running in this setup, there are no bytes_invalid, even during the short time when the probe function is running. Does this indicate a problem with the host OS on my pi, or the usb cables going to the octopus/canable?

klippy-VM-Debian-Python3.log (263.9 KB)

1 Like

Running Klipper in a VM is not supported, and errors like such are to be expected. See Running Klipper in a Virtual Machine (VM)

FWIW there are multiple reports that a 32bit Kernel works better than a 64bit one.

I see…

I’ve never had issues running audio software in vmware, (I have in other hypervisors) but I guess the proof is in the pudding.

Next up I was going to try 32bit mainsailos instead of building my image myself. Will report back on that.

Still really curious if the log tells you about the mcu latency though, and if so, how you can read it.

Audio software is a different pair of shoes.
There the stream is just pushed out or sucked in.

With 3d printing things are time sensitive.
The host has to react to occasions in a very short time.
The more things you have between the hardware and the managing software, the greater the risk that the data comes in too late for a proper process.

While this is getting pretty off topic, DAWs often recommend against running in a VM for the same reasons as Klipper. Despite this, I was running Ableton in a Windows VM for quite a while and never had an issue with MIDI timing. I’m doing more research about it and it appears as if the implementation for virtualization on Apple Silicon isn’t as good at timing as VMware’s Intel hypervisor. VMware actually has many features explicitly for latency and timing sensitive tasks across their various products but it appears as if the Apple Silicon version of Fusion has none of them now. For example you used to be able to enable " Scheduling Affinity" to ensure that the CPU core was always available to the VM, and only that VM. I would’ve expected Klipper to work in this scenario.

Adding to the reports that a 32bit kernal on a raspberry pi seems to be just better. Testing was done on another pi with different cables so now it’s time to reflash the pi inside my printer and go from there.