Recently reflashed my pi, eb36, and octopus. I’m getting pretty frequent communication timeout during homing errors. I’m using a cannable for my toolhead, just usb for the octopus. Issue comes up about once every other print I want to say? Always during qgl or mesh leveling. I’m running 500,000 bitrate on the canbus, terminating resistors, a twisted wire of a standard length for a 2.4 300mm with cable chains. Any ideas what I can do to get this reliable again?
I didn’t find any shorts, all my connectors are looking fine, and the printer is able to print for hours at a time so I don’t think there’s any problems in make menuconfig. The issue comes up when the heaters are off so I don’t think it’s a main power supply issue. The printer also stays on and I’m able to move the toolhead after one of these errors. Klipper isn’t shutting down, it’s just canceling the probing move.
There has been considerable discussion of this issue in the github issues list. Most people have resolved it by running bitrate at 1M (1000000), setting buffer to 1024, and failing that, changing to 32-bit version of Bullseye on their pi. It can be tough to chase down. I also personnally mitigated the problem on my 2.4 by not running loads of RGB leds via CANbus while probing/QGL.
I’m running pi4, Octopus 1.0 in bridge mode, and Fly SB2040.
I see in one of these threads, Kevin refers to the srtt, rttvar, and rto values, but I can’t find any information about what units these are in or what they are. When the timeout happens these values appear to be:
srtt=0.001 rttvar=0.000 rto=0.025
Which seems pretty low given the number of significant figures logged, regardless of what the units are. Nonetheless, is there anywhere that details what all the items in the log are and their units?
I had a similar problem and improved, but did not fix things by the following:
-updated the communications speed from 500K to 1M
-increased the canbus transmit buffer size in Linux
-stopped any neopixel updates when probing
I tried basically the same config with a Pi3B+, PiZero2W and an i5 and oddly enough the Pi Zero seemed to not exhibit the problem (huh??).
My solution now is to just run the probe connection directly back to the Octopus. It seemed to be some timing issue between the probe detecting and Klipper coordinating it with the z movement.
I should add that I tried both an EBB42 and a FLY-SHT42 and with the SHT42 connected via a USB CAN adapter and directly to the Octopus pro CAN connector and the result was the same.
I was able to get the timeout to occur by bed probing twice in a row, failed about halfway through the second run. I ran the M112 right after and exported the log but I have no idea how to read the log. Can someone tell me if there’s reordered messages or lost messages here?
If you don’t mind checking for me, how many bytes_invalid do you get during a sequence of 75 probes or so? I would assume 0 would be ideal but maybe it’s normal to have some?
See CANBUS Troubleshooting - Klipper documentation . Your log indicates you have an incrementing bytes_invalid counter. That is an indicator that something on the canbus is reordering packets (either the canbus adapter or the linux kernel). This must be fixed - the printer will continue to be unstable until it is fixed.
Some Linux kernel builds for embedded devices have been known to reorder CAN bus messages. It may be necessary to use an alternative Linux kernel or to use alternative hardware that supports mainstream Linux kernels that do not exhibit this problem.
what builds are susceptible to this? I am running Debian GNU/Linux 11 (bullseye) (64bit)
I don’t know. The kernels for embedded devices are all over the place. To the best of my knowledge the mainstream kernel.org kernels don’t have this severe CAN bus networking bug (though I’ve never explicitly tested it).
I haven’t seen the issue on my devices (standard rpi kernels).
This is different from the kernal version right? How do I tell what build I’m using? I flashed RpiOS using Raspberry Pi Imager. I’m going to try connecting the usb cables to another computer, although that’ll be a vm which might introduce other issues. Never hurts to try
Well, that didn’t work. The communication timeout happens almost immediately when running klipper in debian with python3 in a vmware vm on a m1 macbook, plus an additional usb hub and new cables. That’s alot of different pieces so I’m not sure what is causing the problem.
Interestingly however, when running in this setup, there are no bytes_invalid, even during the short time when the probe function is running. Does this indicate a problem with the host OS on my pi, or the usb cables going to the octopus/canable?
Audio software is a different pair of shoes.
There the stream is just pushed out or sucked in.
With 3d printing things are time sensitive.
The host has to react to occasions in a very short time.
The more things you have between the hardware and the managing software, the greater the risk that the data comes in too late for a proper process.
While this is getting pretty off topic, DAWs often recommend against running in a VM for the same reasons as Klipper. Despite this, I was running Ableton in a Windows VM for quite a while and never had an issue with MIDI timing. I’m doing more research about it and it appears as if the implementation for virtualization on Apple Silicon isn’t as good at timing as VMware’s Intel hypervisor. VMware actually has many features explicitly for latency and timing sensitive tasks across their various products but it appears as if the Apple Silicon version of Fusion has none of them now. For example you used to be able to enable " Scheduling Affinity" to ensure that the CPU core was always available to the VM, and only that VM. I would’ve expected Klipper to work in this scenario.
Adding to the reports that a 32bit kernal on a raspberry pi seems to be just better. Testing was done on another pi with different cables so now it’s time to reflash the pi inside my printer and go from there.