[Canbus] Debugging MCU lockups

Hi, I have a toolhead MCU lockup issue that I resolved to a Usb2Can adapter glitching under high temps.

I’m suspecting that it’s caused by garbled data received by the MCU and subsequently crashing.
A reset of just that MCU brings it back on the bus and this occurs evenly accross 6 toolheads so fairly certain it’s a software issue.

Any recommendations how to debug this? The klippy log is not helpful at all.
Perhaps another canbus node, logging all the traffic?

https://www.klipper3d.org/CANBUS_Troubleshooting.html

A logic analyzer would probably be the easiest way to see if the data is garbled.

If it is truly a temperature issue, move this adaptor outside of the area where it is getting hot or add a heatsink to the microprocessor and a cooling fan to blow over the board.

Thanks for the link, picking up a logic analyzer.

Just to clarify, I have resolved the immediate issue by replacing the U2C adapter and adding cooling for good measure.

Now looking to see if I can make the code more robust, MCUs should not be locking up from scrambled signal.

Another idea could be fuzzing the canbus code. Unfortunately the MCU code has no tests to speak of.

Edit: One more data point, seeing this on both RPI2040 and STM32 based toolboards.

Since quite everything in Klipper is timing sensitive or rather timing critical, why should an MCU not lock up if it no longer receives properly timed messages?

FWIW and without being the authority for its deeper meaning, if you look at the log posted in the above linked thread, you will see that the RTO (Retransmit Timeout) starts dramatically rising for the ET0 MCU around 4 seconds before it crashes.

Do you use a heated chamber?

There was a short discussion regarding high temps when using STM32 based CAN IFs No Klipper problem: Maximum CAN bus toolheads temperatures regarding heated chambers Question? Poll?

CAN bus node will shutdown after reaching an error limit, this is not Klipper but how the CAN bus hardware works.

Thanks Napcal, that’s good to know. Is that implemented in software in Klipper? I do not think there is dedicated canbus hardware on toolhead boards.

Would it make sense for the tool board to send a shutdown message in a loop to Klipper? Klipper aborts on timeout anyway and this would help debugging.

It is controlled by the CAN bus transceivers via the CAN bus protocol and not by the software.

Hmm, at least for rpi2040 it seems to be implemented by PIO. Or am I missing something?

https://github.com/Klipper3d/klipper/blob/49c0ad6369670da574f550aa878ce9f6e1899e74/lib/can2040/can2040.c#L551

Every toolhead board has a CAN bus transciever, it’s required by the standard.

Most boards use a:

Including the EBB 42/36

Caveat: There are some BTT mainboards that try to get away with using the CAN H/CAN L signals over USB but that is NOT CAN bus by any definition, despite what they try to call it.

The RX/TX of the actual communication is handled on the mcu, but it goes through a CAN transciever before it gets to the wiring.

You think there’s a “standard”?

I’ve worked with a number of toolhead boards - the first one was simply a passive plug board for my Voron. Since then I’ve bought intelligent toolhead controllers from at least three different manufacturers (I thought four) as well as different variants of the same boards (ie for attaching to 42mm and 36mm steppers) and there are a lot of differences in how CAN is implemented including the connectors used on both the toolhead controllers as well as main controller boards and U2C boards.

This isn’t to say that they all won’t work together once they’re wired together it is just that you can’t assume that they all wire together the same way. You must have a basic understanding of how CAN works along with the ability to read a schematic. Even with that, I always use a multi-meter for testing the connections before applying power.

I wish there was a standard.

Any UART communication TTL connection can be connected to a CAN bus transceiver as long as the software sending the data is using the CAN bus commands to communicate.

It is a STM32 feature to use CAN bus communications over the same pins as the USB but a CAN bus transceiver is still required to connect to the bus.

Most U2C adapters (example BTT U2C V2.1) have USB-A connections just for this type.

I meant the ISO 11898 CAN bus standard, there has to be some sort of transceiver implementation to handle the requirements of the CAN bus. Be it integrated into an mcu (never actually seen that) or, more commonly, external.

@NAPCAL

That’s my point, the USB-A implementation isn’t CAN bus. It’s just a glorified UART at that point. It’s not differentially signaled, the characteristic impedance is wrong and there is no dominant/recessive bits that way which defines CAN communication.

In other words, Can “High”/Dominant is logic “0” with the CAN H being around 3.5v and CAN L being around 1.5V. Can “Low”/Recessive is logic “1”, with both CAN H and CAN L at around 2.5V. That differential signaling gives it the noise immunity.

The STM32 pins are just standard logic GPIO not a differential input. You can prove this easily because the pins can be reconfigured for other uses.

The CAN bus has two layers: the physical layer you are talking about and the data link layer.

Yes, UART and USB are not CAN bus physical layer compatible, but that is what CAN bus transceivers are for to interface between the TTL source and the physical CAN bus.

Bosh created the CAN bus protocols. Refer to their documents for the technical details you are looking for.

Yes, But by ditching the physical layer you lose all noise immunity and nearly all the benefits of the CAN bus. I actually think the data layer requires error management in hardware which the STM32 doesn’t implement. But that’s a minor factor.

The issue is, most users don’t understand that and when they have issues they blame CAN as a whole or Klipper. When in reality it’s just BTT cutting corners and making weird design choices by implementing something in a way it was never intended to be implemented.

I’d be fine if BTT labeled it as something else, it’s less confusing that way. Or hell, Just use UART and simplify the entire thing.

Even STMs presentation on bxCAN (the implementation in the low cost mcus) shows it needs a transceiver.

First and foremost, most 3D print builders and manufacturers ignore the physical layer requirements for shielding the CAN bus cables.

Even though the CAN bus has good noise rejection, it can still be overwhelmed by being too close to stepper wires. As well as fan and heater wires are controlled by PWM, these frequencies can cause high noise on unshielded CAN_L and CAN_H wires.

The builder that do use shielded wiring do not typically have resending errors issues and can run at max bus speeds.

Shielding is optional in ISO 11898-2, using twisted pair wiring is mandatory though.

As I said, the ones that use shielded cables have fewer issues than those that don’t.

The specs were written for industrial and automotive designs, not hobbyist.

Wow, this went south fast.
The canbus transcievers are just dumb differential to ttl signal level convertors, right?
The protocol itself, including error handling is still in software?