Is bytes_retransmit something to be concerned with on a 2 item CAN bus?

Basic Information:

Printer Model: Voron 2.4r2 (RPi 4 /w 4GB ram)
MCU / Printerboard: Octopus 1.1
klippy.log klippy.log (657.6 KB)

Describe your issue:

I’ve just upgraded to CAN bus. I’m using USB-to-CAN on my Octopus with the little RJ-11 plug going to a Mellow SB2040. The setup seems to be working okay but occasionally I get “communication timeout during homing probe” error while doing Z homing or QGL. This happens so far, like once in 30 to 50 probes or something and I definitely have not been able to cause it to happen. Unfortunately, my klipper config has customizations in it (led_effects and ercf) so that’s an understandable sore point for the purists here.

I’m looking at all the possible feedback paths that CAN bus has to provide me with information and I’ve seen people mention bytes_invalid and bytes_retransmit as listed in the Mainsail SB2040 info screen. I have never seen bytes_invalid display anything other than 0 but bytes_retransmit seems to climb fairly slowly and steadily as I print stuff. I’m trying to figure out how to exploit this, if it is indeed a symptom of the larger potential communication problems across the bus.

I only have the Octopus and the SB2040 on CAN so communication should be VERY simple (point-to-point). I’ve got the umbilical which means there are a couple of extra interconnections. I’ve used my power-drill to twist the CAN_H and CAN_L wires and by themselves before soldering all the connections. I used only the wiring that was provided with the CAN board. I’m running my CAN bus at 1M with this setup:

allow-hotplug can0
iface can0 can static
  bitrate 1000000
  up ifconfig $IFACE txqueuelen 1024
  pre-up ip link set can0 type can bitrate 1000000
  pre-up ip link set can0 txqueuelen 1024

I’m printing something right now and I can see that the bytes_retransmit value is up to 7259 - it has been printing for about an hour. Seems like every 20 minutes or so it climbs by a few hundred bytes. After writing this post, I checked it again and it had bounced up to 7441.

I also have tried doing a stress test with my CAN bus by:

  1. forcing traffic on the bus: cangen -e -g 0.12 can0
  2. monitoring the traffic, status, and bus loading using:
    a. watch -n 1 -d "ip -details -statistics link show can0"
    b. watch -n 1 "tc -s qdisc show dev can0"
    c. canbusload can0@1000000 -b -c
  3. playing with the cabling in the chamber while those are running and watching for errors/retransmits.

But that has been unsuccessful! I’ve even bumped up the saturation of the bus to like 80-90% bus utilization which, it seems, should bring out errors but everything seems to operate smoothly even while playing with the umbilicial cable.

Is there another way to watch for or cause bytes_retransmit?

Thanks,
-Greg
BTW: When printing usually my CAN bus loading traffic is 8-20%. ← reason for edit

This is an unproven theory, but in my opinion, the CAN 2.0 transceivers on most boards, like the Octopus, don’t handle error recovery or noise at higher bitrates 500K - 1M. In contrast, the FDCAN transceivers can handle 1M better than the 2.0 chips since FDCAN chips can run FD modes up to 5M. Currently, 3D printer’s firmware only supports CAN 2.0

When I have some lose time, I plan to test CAN 2.0 & FDCAN transceivers at 1M to truly see if frame and transmission errors are lower or none when using FDCAN type only is used on the bus. Note I am going to do this by switching the transceiver chip.

Information: it seems that one board that BTT manufactures has the FDCAN transceiver. SKR3/3 EZ

CAN 2.0 SN65HVD1050
FDCAN MCP2542FD-E/SN

FWIW, when I last looked at your log, there was a notable amount of traffic being sent to the neopixels during homing (presumably from the led_effects module). That’s not ideal and could certainly impact the homing process. It is true that I’m a “code purist” (analyzing a log can easily take several hours and that would just be wasted with a log from unknown code). But, that aside, I would still recommend you start by verifying the problem is present with more widely tested code.

Cheers,
-Kevin

1 Like

Thank you for pointing this out Kevin. I do have the neopixels going bonkers when doing different operations so I can glance at the printer and know what stage it is in. I hadn’t realized that it would also consume a bunch of bandwidth. It’s definitely overkill and not really necessary, I’ll rethink that approach now that I have CAN bus. Sorry about the “purist” comment… :slight_smile: