CANBUS Troubleshooting guide is misleading

Basic Information:

Printer Model: Voron 2.4
MCU / Printerboard: BTT Octopus

Describe your issue:

I just spent the better part of the week figuring out how canbus works from Klipper’s perspective. This comes after having some issues with my setup. The new troubleshooting guide came in handy to understand some of the basic troubleshooting steps, as it’s not straightforward to see the contents of the canbus stream from the line itself. My issues started with this line from the troubleshooting guide:

Incrementing bytes_invalid on a CAN bus connection is a symptom of reordered messages on the CAN bus

I went down the rabbithole trying to understand why I’m getting reordered traffic. After checking pretty much all possible hardware issues, I ended up reading the Klipper source code and turns out that while out of order messages could be one of the reasons it’s not the only reason.

In my opinion the more common issue is the portion of the code with a comment “skip bad data at beginning of input”. And sure enough, when I dig deeper my case is caused by plain old errors on the line.

In my case this is what things look like after adding some logging to msgblock_check():

27152.0:bytes_invalid=30
b'ZZZ invalid message len 219'
b'ZZZ bad message trailer sync'
b'ZZZ need sync'
b'ZZZ invalid message len 93'
b'ZZZ bad message trailer sync'
b'ZZZ need sync'
b'ZZZ invalid message len 93'
b'ZZZ bad message trailer sync'
b'ZZZ need sync'
b'ZZZ invalid message len 93'
b'ZZZ bad message trailer sync'
b'ZZZ need sync'
b'ZZZ invalid message len 93'
b'ZZZ bad message trailer sync'
b'ZZZ need sync'
b'ZZZ invalid message len 129'
b'ZZZ bad message trailer sync'
b'ZZZ need sync'
27501.6:bytes_invalid=202

My point here is that the troubleshooting guide clearly states that out of order data is the cause for increasing bytes_invalid, but it’s just one of the reasons. I would have sent a PR to fix the wording, but I was unable to come up with anything that just doesn’t make things more confusing.

I am assuming you are referring to this guide?
https://www.klipper3d.org/CANBUS_Troubleshooting.html

Yup, that’s the guide. The post had a limit of 2 links, so I had to cut out the more obvious one.

The low-level canbus hardware implements crc checks on the content of messages. It will also automatically retransmit messages should one be corrupted or if it is not acknowledged by any other device on the canbus.

The low-level “invalid bytes” counter does just report invalid characters found in the messages. The primary cause of invalid content from a canbus stream is, however, due to reorded canbus packets somewhere. (A klipper mcu protocol message is typically comprised of several canbus packets, and should those canbus packets be reordered then the contents of that message show up as “invalid” to the klipper host parser.)

If there is some other cause of incremented invalid bytes (not reordering) then it would also be a major problem that would need to be addressed.

Cheers,
-Kevin

1 Like

The key here is reordered or dropped. In my case the main issue is in the transport layer and I’m figuring that out separately. The point with this post was that the troubleshooting guide does state that out of order is the cause for invalid bytes. Which is misleading and can lead to investigating the wrong thing.