Hey everyone, I’ve been chasing random print freezes on my Voron 2.4 and wanted to share findings — it looks like the extruder driver is radiating EMI that corrupts CAN communication (at least that is my assumption).
Setup
Voron 2.4
BTT Manta M8P v2 + CB1 BTT
EBB2209 (RP2240) toolhead over native CAN
Umbilical Mod
Stock BTT CAN cable
Cartographer 3D
24 V PSU
48V PSU for X/Y axis
Latest Klipper firmware on both MCUs
Symptom
During long prints (7 h +), the print randomly halts near the end.
KlipperScreen and SBC stay responsive, temps normal, but motion stops.
klippy.log shows no shutdown or MCU disconnect.
CAN status (canstat_EBBCan) remains bus_state=active, rx_error=0, tx_error=0, but tx_retries skyrockets into the millions right before the stall.
Error message: Lost connection with MCU ‘can’
Tests done
Checked all connectors and crimps → solid.
Verified CAN termination (≈ 60 Ω) → OK.
Wiggle test with live tx_retries monitor → no increase → cable & crimps good.
EMI diagnostic macro toggling heaters/fans → stable, no retries.
Extruder stress test (rapid E moves) → massive retry growth: +100 retries / sec even with motor disconnected.
Tried different driver mode (spreadCycle vs stealthChop).
Twisted motor phases and route away from CAN pair/heater wires.
Lowered extruder run current
Shorter prints (< 2 hours) complete without a problem. If I keep the printer on and print 2+ shorter prints in a row the issue will happen eventually. After crawling through the internet and testing everything on the printer for days I’m out of ideas. Help is really appreciated. Thank you in advance
Welcome MightyMietz,
interesting! I might have some ideas, but please give me time til Sunday evening (my time). Your klippy.log is quite big and I got to work tomorrow.
I‘m out of ideas. I changed every piece of hardware except for the mainboard. Reflashed, reinstalled every piece of software with triple checking the configs. I even switched th SB2209 with a new one and also the fly mellow alternative to rule out that BTTis the problem. The issue persists. After switching from CB1 to CM5 I was able to drop the tx_retries from approx. 100/s to 10 - 20/s but that is still too much. I even stripped the mainboard and board out, powered it with a different power supply, but no luck.
It seems like a software issue to me, because the CM5 switch changed the rate, but that is just an assumption
Yes, sure. In the meantime I stripped the manta board and EBB from the printer, just connected the PT1000 temp sensor. I commented out some parameters in the printer.cfg to make “fake” printing work with a minimal setup. I started a print and still see rising tx_retries. Please see
Unfortunately the issue you shared is not the same, since I already ruled out EMI or anything related to power draw. My logs also didn’t show any similar error except the rising tx_retries.
Ah, okay.
Well, the thing is, RP2040 emulates can with PIO, IDK the exact details, but it is literally the only one that reports tx_retries.
All others would not report that and silently retry.
So, I guess, it is not a problem, or at least, I’m not sure where the baseline lies to compare against.
About the correlation itself between driver activity and CAN retries:
My pure guess, that it is really just a bus arbitration happening.
Quick recap of what I’ve already changed to rule out hardware/software:
MCU / mainboard: Swapped from Manta M8P + CB1 to M8P + CM4, currently running a BTT Kraken + Pi5 8GB
Toolhead boards: Tested two different CAN toolboards (2x BTT SB2209 and Mellow Fly SB2040/Fly Pro).
Cabling: Tried stock cable, Igus shielded cable, and a very short ~30 cm CAN cable.
Setup: Stripped the whole Voron down – now testing on the desk with only mainboard + toolhead board (no motors, no movement, no heaters).
OS / software: Tried Armbian and Raspberry Pi OS Lite (32-bit + 64-bit)
Bus checks:
CAN termination ~59.3 Ω between CANH/CANL (so 2×120 Ω present).
ip -details -statistics link show can0 shows 0 RX/TX errors and bus in ERROR-ACTIVE.
Klipper stats always start with about 26 tx_retries on EBBCan after startup and then stay stable at idle.
Key new finding (fully reproducible):
Desk setup, no motors attached:
Using Mainsail to extrude/retract:
1 mm at 1 mm/s, clicked ~1×/s → tx_retries stays flat.
1 mm at 30 mm/s, clicked ~1×/s → tx_retries increases.
So the retries only spike with higher extruder speed, even with no motor connected and no movement. That strongly points to rate-dependent noise / interference from the driver/board itself.
Next step I’m about to test:
Adding a 1500 µF 50 V capacitor directly across 24 V and GND on the toolhead board to stabilize the local supply and see if that reduces or eliminates the tx_retries during high-speed extruder moves.
The tx_retries counter does not itself indicate an error. The value is only reported on rp2040 chips and it indicates the number of times the low-level code attempted to send a canbus packet from mcu to host more than once. It’s normal for canbus packets to be scheduled more than once due to low-level message arbitration rules. That is, for example, if the mcu has a message to send to the host at the same time that the host wants to send a message to the mcu (or at the same time the host is sending messages to/from another mcu) then the counter would increment. The tx_retries counter is only intended as a statistical data point during debugging and is not itself an indicator of a problem.
Hello Kevin, thank you for your explanation. I’m investigating tx_retries because longer prints (over 4 hours) or multiple consecutive prints totaling more than 4 hours eventually fail with the ‘Lost connection with MCU ‘can’’ error. The Klippy logs show no other indicators except approximately 4 million tx_retries, which seems excessive to me. This is particularly puzzling since I’m running a standard setup and have already replaced all components. Would you say that even this high number is normal behavior?
It’s really hard to give advice without seeing a log of the error. I’m not aware of any concrete conclusions one could take from just knowing the tx_retries count. The value of other stats leading to the error would likely be more informative (and changes in tx_retries would likely only be useful in comparison to changes in those other stats).
I don’t have one of these boards so I’m kinda shooting in the dark…
What baud rate are you running? What is the bus load if you drop to 25k? Is there a need to run faster?
On the manta boards the CAN data is transmitted several cm to the MCU then passed on to the CAN transceiver. At high data rates it seems to me that timing could be difficult to maintain.
It seams at least once a week there is a new thread about a manta board not playing nice with a CAN toolhead board. It makes me wonder if there is a fundamental flaw.
Remove everything connected to the M8P via USB and see if anything changes.
By this i mean remove any extra tool boards, cameras etc that are plugged into the USB of the manta board.
The reason for suggesting this is the Manta board has only a single usb D+ D- pair between the manta board and the CB board.
The Manta board uses a F1.1S usb hub chip onboard to split the USB from the CB between the manta MCU and the usb ports on the board. (see image below)
This F1.1S hub chip is a blocking type hub (known as STT) and from my experience STT chips and this one specifically can cause issues with the timing klipper requires from USB comms. IMO Printer boards should use non blocking hubs (MTT)
From what i can see in the schematic, the CAN bus comms for the manta is handled by the MCU on the Manta board, not direct to the CB, this meaning you have a bridge mode mcu.
So any thing else using the USB ports may cause interruption / delay to your CAN bus aswell as usb comms to the manta mcu itself.
You may see instances of the Got EOF error message in your klippy.log
You make it sound like that’s a bad thing - it’s actually a very efficient method of connecting the host to the MCU and is the typical way of doing it in the Klipper ecosystem.
The OpenMoko Geschwister Schneider CAN bridge (which is used with Klipper) handles CAN packets at the Data Link layer of the USB interface. It’s its own USB device and does not use another USB device (like CDC or HID) and then convert the data into CAN packets. This means the effectively MCU treats USB data as CAN packets, the same way a CAN bridge would, processing packets meant for it (and replying appropriately) or passing them along to other devices.
This means that the CAN bus packets are being handled directly by the CB1 and MCU.
I’m not sure where you get the term “blocking type” when describing an STT hub. STT, as I understand it, has one translator which passes all USB packets to all the devices connected to the hub at the source’s transmission speed (HS - 480Mbps, FS - 12Mbps or LS - 1.5Mbps). MTT, has one translator per port and passes USB packets to devices connected to the hub at the destination’s transmission speed and only passes USB packets between the appropriate devices.
MTT has a definite advantage in an installation that has multiple HS devices that are communicating with each other independently of other devices (ie multiple PCs using a single paper printer). MTT has a significant disadvantage compared to STT in terms of cost - basic MTT hub chips cost 6x or more than STT chips.
Which is better for a 3D printer? First off, there shouldn’t be multiple devices that communicate directly between each other in a 3D printer. Secondly, if there are LS (or even FS) devices (mices, keyboards) connected to the hub, they shouldn’t be communicating continuously, which would be a disadvantage for an STT hub. So, if the features that differentiate MTT are not likely to exist in a 3D printer and the cost of an MTT hub is significantly higher than an STT hub, which means that STT is the optimal choice for use in a 3D printer.
No, as explained above, CAN packets passed over USB are handled directly in both the CB1 and the MCU.
This is good advice and often difficult to get people to acknowledge. For some reason, when you get people to list the devices connected to their 3D printer hosts, they never include cameras and other UDB devices.
Now, when I’ve asked people to do this in the past, it doesn’t resolve the TTC errors - they’re generally not the cause of the problems.
Similarly, CAN Bus retries is a symptom of the problem, not the solution.
It’s nice that BTT provides the block diagram at the top of their schematics but don’t consider them to be authoritative in terms of the actual wiring that’s on the boards - especially in terms of USB.
If you search through this site, you’ll find at least two posts by me in which I’ve discovered errors in the block diagram compared to the actual wiring. You’ll also find things that appear to provide a specific function but you need to dig deeper to understand exactly what is happening (ie the “Type-C”/“Pi-USB”/“RS2227” block between the “CM4” and “F1.1S” does not provide alternative paths for USB but is actually implementing the mechanism to allow a user to program optional eMMC built into the host SBC).
A few questions for you in how you’ve set things up.
How have you implemented the CAN bus? Are you using the built in CAN of the M8P
What do you mean by “Umbilical Mod”?
What is a “Stock BTT CAN cable”?
Now, when I look at your “klippy.log.zip” files, I don’t feel like I have a complete or fully representative one of the original problem.
The first klippy.log you posted was several hundred thousand lines of output and no configuration information with most of the messages being produced by macros and the second, while being more manageable and has configuration information, doesn’t have macros that would produce the messages I see in the first one.
What is the current state of the printer and, if it is still failing with the original issue, could you provide a new klippy.log?