Communication timeout during

Moving to a 32bit kernel has fixed the issue with invalid bytes incrementing in the log however the Communication timeout during homing probe error is still happening.

klippy (3).log (545.4 KB)

This is still a CAN related error indicating that the 25ms window for “homing communication” has been violated. Make sure to run your CAN at 500k or 1 Mbps

I’m running at 500000

That is unexpected. If you get a chance, could you reproduce the issue with pristine Klipper (no z_calibration or other modifications), run candump (as described at CANBUS Troubleshooting - Klipper documentation ) and upon reproducing the “communication timeout” problem immediately run M112 (or click on emergency shutdown in your gui). That will add additional diagnostic information to the log. (The M112 has to occur within a couple of seconds of the “communication timeout” in order to get useful diagnostic information.)

If you can compress and attach the resulting logs (both Klipper log and candump log) I will take a look at it.

Thanks.
-Kevin

Good or bad news, I haven’t been able to reproduce the issue yet. I setup a server to issue a bed mesh probe command to moonraker every 5 minutes for the past 12 hours and it hasn’t failed once.

Is it worth opening a pull request to add information about 32bit vs 64bit kernels to the canbus page?

Of course, my bad luck the second I go to actually print something, it fails. I did not have z_calibration removed or candump running but I will work on that next. In the meantime, here’s the klippy.log with the dumped messages at the end.
klippy (4).log (1.7 MB)

Does anyone know of a way to get either linux or candump to automatically rotate the log file over time? After starting maybe 20 prints, I just got the error again. But of course I thought I was past this and wasn’t running the candump. I tried attaching block storage over the network to my pi so I could just leave candump running without burning out my sd card or running out of space but the resulting file was massive.

I want to modify it to automatically shutdown and flush the packets to the log when the error occurs because with it going more than 10 prints between errors, I don’t want to be baby sitting it all the time ready to stop it. If I add self.gcode.cmd_M112(None) on line 138 of homing.py will it do that?

Kevin,

Here is the current candump and klippy.log.

At 8:24pm the same error occurred during either QGL or bed mesh leveling. Communication timeout during homing probe

The candump log was rotated last at midnight but it was downloaded right after the error so the data in question should be at the end of the file. Edit: line 766849 (73440.084612 seconds)

Testing was performed on “pristine” klipper pulled from github earlier today.

Unfortunately, I’m unable to download that zip file from google drive (likely some strange javascript errors). Can you just attach the zip file here?

-Kevin

I truncated the candump so there isn’t as much to look at but now it’s small enough to post here:

candumplog.txt (1.1 MB)
klippy (6).log (1.2 MB)

The logs seem to indicate that some kind of an event occurred at the Linux kernel or USB interface.

Under normal circumstances, you’re getting round-trip-times of under 1ms to both the main mcu (emulated serial over USB) and to the “tool” mcu (canbus over USB). However, during the event, the timestamps indicate that there are notable delays in messages from both the main mcu and the tool mcu.

In particular, there are two homing state reports (trsync_state) from the main mcu that are notably delayed (7ms and 5ms). At the same time there is a state update to the tool mcu that is notably delayed (8ms). These delays add up, and as a result the tool mcu correctly detects that it’s state updates exceed the 25ms multi-mcu homing time window ( Multiple Micro-controller Homing and Probing - Klipper documentation ) and correctly halts the homing process.

I did not see any indications in the logs that the software itself is not working correctly nor any indications that the software itself did not get sufficient scheduling time. The logs indicate the Klipper code was modified (a z_calibrate module), so it is possible those modifications are causing some issue in the code (though that is seemingly unlikely).

It’s unclear what the root event was. Some questions: What is the host machine hardware type? What is the host machine OS and revision? Are there any other USB devices connected to the host (such as a USB hub or USB webcam)? Was there anything in the system logs at the time of the event? What usb2can adapter firmware and version are you using?

-Kevin

Regarding z_calibration, what in the logs is pointing to these files still being modified? I’ve commented out the z_calibration module in the config and removed it from the klipper directory. Could it be that the microcontroller firmware was compiled when these files were present?

Does anyone know how to identify the version of candlelight running on the adapter?

Raspberry Pi 3 Model B Rev 1.2
MKS Cannable
Raspberry Pi camera interface webcam

“Raspbian GNU/Linux 11 (bullseye)”
moonraker
mainsail on nginx
crowsnest (camera-streamer)
obico moonraker agent

Here is the system logs from the event:

charlespick@voron:~/klipper/klippy/extras $ journalctl --since "2023-09-01 20:20:00" --until "2023-09-01 20:28:00" --no-pager
-- Journal begins at Tue 2023-05-02 17:25:51 MST, ends at Tue 2023-09-19 11:02:04 MST. --
Sep 01 20:23:41 voron python[749]: [job_state.py:_status_update()] - Job Started: Calibration cube.gcode
Sep 01 20:23:42 voron python3[14738]: 2023-09-01 20:23:42,146      INFO  obico.app - detected state change: Operational -> Printing
Sep 01 20:23:42 voron python3[14738]: 2023-09-01 20:23:42,336      INFO  obico.app - print event: PrintStarted (1693625021)
Sep 01 20:23:43 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:23:43 voron python3[14738]:   warnings.warn(
Sep 01 20:23:53 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:23:53 voron python3[14738]:   warnings.warn(
Sep 01 20:24:04 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:24:04 voron python3[14738]:   warnings.warn(
Sep 01 20:24:14 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:24:14 voron python3[14738]:   warnings.warn(
Sep 01 20:24:25 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:24:25 voron python3[14738]:   warnings.warn(
Sep 01 20:24:35 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:24:35 voron python3[14738]:   warnings.warn(
Sep 01 20:24:46 voron python3[14738]: /home/charlespick/moonraker-obico-env/lib/python3.9/site-packages/urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
Sep 01 20:24:46 voron python3[14738]:   warnings.warn(
Sep 01 20:24:46 voron python3[14738]: 2023-09-01 20:24:46,325      INFO  backoff - Backing off capture_jpeg(...) for 0.4s (AttributeError: 'ServerConn' object has no attribute 'snapshot_url')
Sep 01 20:24:46 voron python3[14738]: 2023-09-01 20:24:46,700      INFO  backoff - Backing off capture_jpeg(...) for 1.5s (AttributeError: 'ServerConn' object has no attribute 'snapshot_url')
Sep 01 20:24:48 voron python3[14738]: 2023-09-01 20:24:48,154     ERROR  backoff - Giving up capture_jpeg(...) after 3 tries (AttributeError: 'ServerConn' object has no attribute 'snapshot_url')
Sep 01 20:24:48 voron python3[14738]: 2023-09-01 20:24:48,155   WARNING  obico.server_conn - Failed to capture jpeg - 'ServerConn' object has no attribute 'snapshot_url'
Sep 01 20:24:48 voron python[749]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown
Sep 01 20:24:48 voron python3[14738]: 2023-09-01 20:24:48,418      INFO  obico.app - detected state change: Printing -> Offline
Sep 01 20:24:48 voron python[749]: [proc_stats.py:_handle_shutdown()] -
Sep 01 20:24:48 voron python[749]: Moonraker System Usage Statistics:
Sep 01 20:24:48 voron python[749]: System Time: 1693625059.067797, Usage: 5.92%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625060.067285, Usage: 3.17%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625061.070064, Usage: 5.16%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625062.075730, Usage: 3.31%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625063.077522, Usage: 5.65%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625064.078312, Usage: 3.93%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625065.082589, Usage: 5.42%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625066.084147, Usage: 3.47%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625067.081421, Usage: 8.62%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625068.081748, Usage: 3.63%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625069.081032, Usage: 5.74%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625070.080084, Usage: 3.23%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625071.085274, Usage: 5.06%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625072.086127, Usage: 4.0%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625073.085835, Usage: 7.37%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625074.084477, Usage: 4.51%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625075.088935, Usage: 4.9%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625076.087110, Usage: 3.48%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625077.088501, Usage: 9.34%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625078.092522, Usage: 3.28%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625079.092228, Usage: 5.7%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625080.092483, Usage: 3.16%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625081.093604, Usage: 5.05%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625082.093483, Usage: 3.49%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625083.095302, Usage: 6.05%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625084.097880, Usage: 3.5%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625085.103077, Usage: 4.74%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625086.103331, Usage: 3.48%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625087.102789, Usage: 12.19%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: System Time: 1693625088.104292, Usage: 3.3%, Memory: 28420 kB
Sep 01 20:24:48 voron python[749]: CPU Temperature: 61.762
Sep 01 20:24:48 voron python[749]: [proc_stats.py:_handle_shutdown()] - Throttled Flags:

The only nearby event is the error from obico. I’m not sure what’s wrong with that and I need to look at that as well but searching through other shutdowns and the obico logs they don’t seem to be related.

charlespick@voron:~/klipper/klippy/extras $ journalctl | grep "WARNING  obico.server_conn - Failed to capture jpeg - 'ServerConn' object has no attribute 'snapshot_url'"
Aug 02 20:44:48 voron python3[10880]: 2023-08-02 20:44:48,777   WARNING  obico.server_conn - Failed to capture jpeg - 'ServerConn' object has no attribute 'snapshot_url'
Aug 05 16:08:51 voron python3[1080]: 2023-08-05 16:08:51,164   WARNING  obico.server_conn - Failed to capture jpeg - 'ServerConn' object has no attribute 'snapshot_url'
Aug 17 09:51:01 voron python3[23398]: 2023-08-17 09:51:01,657   WARNING  obico.server_conn - Failed to capture jpeg - 'ServerConn' object has no attribute 'snapshot_url'
Sep 01 20:24:48 voron python3[14738]: 2023-09-01 20:24:48,155   WARNING  obico.server_conn - Failed to capture jpeg - 'ServerConn' object has no attribute 'snapshot_url'
charlespick@voron:~/klipper/klippy/extras $ journalctl | grep "Klippy has shutdown"
Aug 05 13:33:25 voron python[712]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown
Aug 05 16:08:56 voron python[784]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown
Aug 05 16:17:43 voron python[784]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown
Aug 05 17:37:19 voron python[784]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown
Aug 06 22:22:43 voron python[27937]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown
Sep 01 20:24:48 voron python[749]: [klippy_connection.py:_process_status_update()] - Klippy has shutdown

Recently had this error. Cannot say it is a solution but I’ve uncommented the hdmi_safe=1 line in /boot/config.txt and though it may happen again, so far I did have a couple of prints and it seems okay. People here have got to the point of printing for twelve hours then getting an error, so I cannot tell this is a solution. If anyone else tried it already and it didn’t work, please let me know.

There is a file klippy/extras/z_calibration.py in your local installation which causes Klipper to report a modified installation.

No.

Okay, thanks for the further info. Alas, I’m not sure what the root cause is.

Are you running ethernet or wifi on your rpi3? (Technically, ethernet uses the USB bus on rpi3, but it seems unlikely that would be an issue.)

One thing you could try is to test how much Klipper bandwidth you have available on your canbus. Basically, if you run the steps at https://www.klipper3d.org/Benchmarks.html#command-dispatch-benchmark on the toolhead (eg, sudo service klipper stop, ~/klippy-env/bin/python ~/klipper/klippy/console.py -c can0 <toolheadcanid>, issue a restart command in console.py if you get adc reports from the toolhead, hit ctrl-c to exit console.py, rerun console.py to reconnect, run the benchmarks at the url above). On a 1mbit canbus you should be able to get around 52K klipper command bytes per second.

Again, I’m not sure what the root cause of the issue is on your particular setup.

-Kevin


The file was gone but I guess I may have not restarted the service yet. I am using Ethernet. I will work on the benchmarks when I have time and I do more testing with different OSes, maybe reflash the cannable, different cables, and I have a dedicated Linux laptop now (no VMs) that I can test with.

Charles

probable You have not any software issues but hardware.
It is “electromagnetic interference in wires” between CAN and Power.
It can be solved with “shielded wires with common grounding” and, maybe, ferrite rings close to Octopus MCU.
I have had the same issue on my CNC.

I thought about sharing my experience on this issue so far. I also ran into an issue Communication timeout during homing probe. I’ve got a setup with Mellow SHT36v2 (two boards on in IDEX printer, each connected to the Orange Pi5B via dedicated UTOC-FLY1). The issues I had were on pristine Klipper, mind you, Armbian 24.2.1 Bookworm, kernel 5.10.160-legacy-rk35xx. The probing would sometimes fail when doing just probing and nothing else, but if trying to gather a bed mesh while also heating the bed, for example, it would almost certainly fail somewhere mid-probing bed mesh. Notably, I did not observe any non-zero bytes_invalid as was sometimes indicated earlier in this thread. What I initially tried and what did not help:

  • Connecting each UTOC-FLY1 to a different USB port on the Pi (they were initially connected via USB hub, that also had another USB MCU (with an accelerometer) connected).
  • Reflashing the latest Katapult (CanBus) and Klipper firmware with 1000000 CAN bus speed (it was 500000 initially) to the SHT36v2 boards.

What seems to have help (at least so far): adjusting CPU speed and governor (via armbian-config)

  • Minimum CPU speed 1800 MHz
  • Maximum CPU speed 2352 MHz
  • CPU governor performance

After that, so far, I had no issues probing the bed and bed mesh, while heaters being active. What may be happening here is that Klipper, in practice, does not create enough CPU load for Linux to think that CPU needs to stay in performance mode. So, the CPU frequency may be downscaled, and perhaps some other power saving measures applied from time to time. Then, depending on the overall bus traffic (e.g. on other USB devices), the communications with the CAN adapter may get delayed, if the CPU considerably drops the performance, all while also dealing with communications with other devices. And hence the turn-around time of the communication with the CAN board may go outside the range allowed for multi-mcu homing. At least this is something easy to try and may help someone.

There were some tweaks to the Klipper code over the last few months that seems to have helped with issues like the one reported here - PRs #6352, #6353, #6354, #6453, #6563, #6564.

If errors are still occurring with the latest code, I’ll try to take a look through the logs to see if they reveal something.

Cheers,
-Kevin

1 Like