[Canbus] Debugging MCU lockups

Viesturz · June 9, 2024, 10:28am

Hi, I have a toolhead MCU lockup issue that I resolved to a Usb2Can adapter glitching under high temps.

I’m suspecting that it’s caused by garbled data received by the MCU and subsequently crashing.
A reset of just that MCU brings it back on the bus and this occurs evenly accross 6 toolheads so fairly certain it’s a software issue.

Any recommendations how to debug this? The klippy log is not helpful at all.
Perhaps another canbus node, logging all the traffic?

TheFuzzyGiggler · June 9, 2024, 11:43am

https://www.klipper3d.org/CANBUS_Troubleshooting.html

A logic analyzer would probably be the easiest way to see if the data is garbled.

NAPCAL · June 9, 2024, 12:19pm

If it is truly a temperature issue, move this adaptor outside of the area where it is getting hot or add a heatsink to the microprocessor and a cooling fan to blow over the board.

Viesturz · June 9, 2024, 4:52pm

Thanks for the link, picking up a logic analyzer.

Just to clarify, I have resolved the immediate issue by replacing the U2C adapter and adding cooling for good measure.

Now looking to see if I can make the code more robust, MCUs should not be locking up from scrambled signal.

Another idea could be fuzzing the canbus code. Unfortunately the MCU code has no tests to speak of.

Edit: One more data point, seeing this on both RPI2040 and STM32 based toolboards.

Sineos · June 9, 2024, 5:10pm

Since quite everything in Klipper is timing sensitive or rather timing critical, why should an MCU not lock up if it no longer receives properly timed messages?

FWIW and without being the authority for its deeper meaning, if you look at the log posted in the above linked thread, you will see that the RTO (Retransmit Timeout) starts dramatically rising for the ET0 MCU around 4 seconds before it crashes.

hcet14 · June 9, 2024, 5:22pm

Do you use a heated chamber?

There was a short discussion regarding high temps when using STM32 based CAN IFs No Klipper problem: Maximum CAN bus toolheads temperatures regarding heated chambers Question? Poll?

NAPCAL · June 9, 2024, 10:30pm

CAN bus node will shutdown after reaching an error limit, this is not Klipper but how the CAN bus hardware works.

Viesturz · June 10, 2024, 5:58am

Thanks Napcal, that’s good to know. Is that implemented in software in Klipper? I do not think there is dedicated canbus hardware on toolhead boards.

Would it make sense for the tool board to send a shutdown message in a loop to Klipper? Klipper aborts on timeout anyway and this would help debugging.

NAPCAL · June 10, 2024, 7:56am

It is controlled by the CAN bus transceivers via the CAN bus protocol and not by the software.

Viesturz · June 10, 2024, 9:26am

Hmm, at least for rpi2040 it seems to be implemented by PIO. Or am I missing something?

https://github.com/Klipper3d/klipper/blob/49c0ad6369670da574f550aa878ce9f6e1899e74/lib/can2040/can2040.c#L551

TheFuzzyGiggler · June 10, 2024, 12:15pm

Every toolhead board has a CAN bus transciever, it’s required by the standard.

Most boards use a:

Including the EBB 42/36

Caveat: There are some BTT mainboards that try to get away with using the CAN H/CAN L signals over USB but that is NOT CAN bus by any definition, despite what they try to call it.

The RX/TX of the actual communication is handled on the mcu, but it goes through a CAN transciever before it gets to the wiring.

mykepredko · June 10, 2024, 2:20pm

You think there’s a “standard”?

I’ve worked with a number of toolhead boards - the first one was simply a passive plug board for my Voron. Since then I’ve bought intelligent toolhead controllers from at least three different manufacturers (I thought four) as well as different variants of the same boards (ie for attaching to 42mm and 36mm steppers) and there are a lot of differences in how CAN is implemented including the connectors used on both the toolhead controllers as well as main controller boards and U2C boards.

This isn’t to say that they all won’t work together once they’re wired together it is just that you can’t assume that they all wire together the same way. You must have a basic understanding of how CAN works along with the ability to read a schematic. Even with that, I always use a multi-meter for testing the connections before applying power.

I wish there was a standard.

NAPCAL · June 10, 2024, 2:25pm

Any UART communication TTL connection can be connected to a CAN bus transceiver as long as the software sending the data is using the CAN bus commands to communicate.

It is a STM32 feature to use CAN bus communications over the same pins as the USB but a CAN bus transceiver is still required to connect to the bus.

Most U2C adapters (example BTT U2C V2.1) have USB-A connections just for this type.

TheFuzzyGiggler · June 10, 2024, 5:50pm

I meant the ISO 11898 CAN bus standard, there has to be some sort of transceiver implementation to handle the requirements of the CAN bus. Be it integrated into an mcu (never actually seen that) or, more commonly, external.

@NAPCAL

That’s my point, the USB-A implementation isn’t CAN bus. It’s just a glorified UART at that point. It’s not differentially signaled, the characteristic impedance is wrong and there is no dominant/recessive bits that way which defines CAN communication.

In other words, Can “High”/Dominant is logic “0” with the CAN H being around 3.5v and CAN L being around 1.5V. Can “Low”/Recessive is logic “1”, with both CAN H and CAN L at around 2.5V. That differential signaling gives it the noise immunity.

The STM32 pins are just standard logic GPIO not a differential input. You can prove this easily because the pins can be reconfigured for other uses.

NAPCAL · June 10, 2024, 6:00pm

The CAN bus has two layers: the physical layer you are talking about and the data link layer.

Yes, UART and USB are not CAN bus physical layer compatible, but that is what CAN bus transceivers are for to interface between the TTL source and the physical CAN bus.

Bosh created the CAN bus protocols. Refer to their documents for the technical details you are looking for.

TheFuzzyGiggler · June 10, 2024, 7:02pm

Yes, But by ditching the physical layer you lose all noise immunity and nearly all the benefits of the CAN bus. I actually think the data layer requires error management in hardware which the STM32 doesn’t implement. But that’s a minor factor.

The issue is, most users don’t understand that and when they have issues they blame CAN as a whole or Klipper. When in reality it’s just BTT cutting corners and making weird design choices by implementing something in a way it was never intended to be implemented.

I’d be fine if BTT labeled it as something else, it’s less confusing that way. Or hell, Just use UART and simplify the entire thing.

Even STMs presentation on bxCAN (the implementation in the low cost mcus) shows it needs a transceiver.

NAPCAL · June 10, 2024, 7:12pm

First and foremost, most 3D print builders and manufacturers ignore the physical layer requirements for shielding the CAN bus cables.

Even though the CAN bus has good noise rejection, it can still be overwhelmed by being too close to stepper wires. As well as fan and heater wires are controlled by PWM, these frequencies can cause high noise on unshielded CAN_L and CAN_H wires.

The builder that do use shielded wiring do not typically have resending errors issues and can run at max bus speeds.

TheFuzzyGiggler · June 10, 2024, 7:18pm

Shielding is optional in ISO 11898-2, using twisted pair wiring is mandatory though.

NAPCAL · June 10, 2024, 7:20pm

As I said, the ones that use shielded cables have fewer issues than those that don’t.

The specs were written for industrial and automotive designs, not hobbyist.

Viesturz · June 10, 2024, 7:37pm

Wow, this went south fast.
The canbus transcievers are just dumb differential to ttl signal level convertors, right?
The protocol itself, including error handling is still in software?

Topic		Replies	Views
Toolhead MCU timeout errors General Discussion	5	246	July 9, 2024
Ebb36 Temp sensor spikes on Klipper restart General Discussion	12	754	January 29, 2024
CAN-Bus suddenly losing connection ~3 Hours into Prints General Discussion	11	815	June 28, 2024
MCU Clock Problem? General Discussion	5	80	March 15, 2025
Clock sync and unrealistic extruder temps after restart Developers	3	134	June 20, 2024

[Canbus] Debugging MCU lockups

Related topics