Basic Information:
Printer Model: home built bed slinger aka “ the …Franken8”
MCU / Printer board: BTT Manta m5p | EBB36 toolhead | USB-CAN bridge
Host / SBC: CB1 on board
klippy.log: attached
klippy.log (1.5 MB)
Describe your issue: After updating my mcu’s to v13 from v12 Klipper reported unable to connect to EBB36.
Straight up I am no software dev. much of what follows is my blindly driving a console ‘vibe debugging’ at Claudes direction. But it seems to have resolved the issue. This is the AI generated summary of the debugging and fix applied follows.
Bug Report: usb_canbus stall detection causes CAN toolhead board connection failure on cold power cycle
Summary
The stall detection introduced in commit 2c90c97c (usb_canbus: Detect canbus stalls when in usb to canbus bridge mode) causes CAN toolhead boards to fail to connect after a cold power cycle when using a USB-to-CAN bridge mainboard. The 50ms stall timeout is too short for toolhead boards that require more than 50ms to boot and assert CAN bus presence on cold start.
Environment
| Component | Details |
|---|---|
| Host | BTT CB1 (Linux/ARM) |
| Mainboard | BTT Manta M5P V1.0 — STM32G0B1, USB-to-CAN bridge mode |
| Toolhead board | BTT EBB36 — STM32G0B1, CAN node |
| Bootloader | Katapult v0.0.1-110-gb0bf421 (EBB36) |
| Klipper host | v0.13.0-572-g88a71c3ce |
| Manta MCU firmware | v0.13.0-572-g88a71c3ce |
| EBBCan MCU firmware | v0.13.0-572-g88a71c3ce |
| CAN bus speed | 500000 baud |
| CAN interface | gs_usb via Manta M5P USB-CAN bridge |
Symptom
After a cold power cycle (PSU off, then on), Klipper fails to connect to the EBBCan toolhead MCU with repeated timeout errors:
mcu 'EBBCan': Wait for identify_response
mcu 'EBBCan': Serial connection closed
mcu 'EBBCan': Timeout on connect
mcu 'EBBCan': Unable to connect
Warm restarts (firmware restart, Klipper service restart, or system reboot without PSU off) work correctly every time. The issue is exclusively triggered by a cold power cycle.
Diagnosis
CAN bus traffic analysis
candump captured during a failed cold start shows the Manta bridge correctly receives and broadcasts SET_NODEID commands to the EBBCan (0x3F0), and the host correctly sends identify requests to EBBCan’s assigned CAN ID (0x10A). However, EBBCan never transmits any response on its transmit channel (0x10B):
3F0#0126D998A69C7A05 ← SET_NODEID broadcast: UUID=26d998a69c7a, nodeid=5
10A#081101002842247E ← host → EBBCan: identify request (MCU_RX)
10A#7E08110100284224 ← host → EBBCan: identify request (MCU_RX)
10A#7E ← host → EBBCan: identify request (MCU_RX)
[... repeated with exponential backoff, no 0x10B traffic ever appears ...]
Querying Katapult after a failed cold start shows EBBCan CAN hardware is functional — it responds correctly on the admin channel (0x3F1):
3F0#00 ← QUERY_UNASSIGNED broadcast
3F1#2026D998A69C7A11 ← EBBCan responds: UUID=26d998a69c7a (CAN TX working)
This rules out EBBCan hardware or firmware as the cause.
Root cause
The frames sent by the host to EBBCan (0x10A) are being silently discarded by the Manta USB-CAN bridge before they reach the CAN bus.
Commit 2c90c97c introduced a stall detection mechanism in src/generic/usb_canbus.c. When the bridge cannot successfully send a CAN frame for 50ms, it transitions to a BSS_DISCARDING state and silently drops all subsequent outbound frames until a successful send occurs.
On cold power-on, the EBBCan STM32G0B1 requires more than 50ms to complete its boot sequence (clock initialisation, Katapult bootloader check, Klipper firmware startup, FDCAN peripheral initialisation). During this window, the EBBCan cannot acknowledge CAN frames. The Manta bridge attempts to send early frames, fails to get bus acknowledgement within 50ms, enters BSS_DISCARDING, and then drops all subsequent host-to-EBBCan frames — including the SET_NODEID and identify commands that Klipper depends on for connection establishment.
The relevant code path in src/generic/usb_canbus.c:
if (UsbCan.bus_send_state == BSS_READY) {
// Just starting to block - setup stall detection after 50ms
UsbCan.bus_send_state = BSS_BLOCKING;
UsbCan.bus_send_discard_time = timer_read_time() + timer_from_us(50000); // ← too short
}
The state only clears back to BSS_READY on a successful send — which cannot happen if all frames are being discarded.
Warm restarts are unaffected because the EBBCan firmware is already running before Klipper reconnects, so it can immediately acknowledge CAN frames and no stall occurs.
Fix
Increasing the stall timeout from 50ms to 5000ms in src/generic/usb_canbus.c resolves the issue:
// Before:
UsbCan.bus_send_discard_time = timer_read_time() + timer_from_us(50000);
// After:
UsbCan.bus_send_discard_time = timer_read_time() + timer_from_us(5000000);
After rebuilding and reflashing the Manta M5P with this change, cold power cycle connection succeeds consistently.
Alternative fixes to consider
A more robust solution might be to reset bus_send_state to BSS_READY whenever a new UUID assignment cycle begins (i.e. when a SET_NODEID admin frame is sent), so that the discarding state cannot persist across a fresh connection attempt. This would preserve the stall detection benefit for genuine mid-session bus failures while avoiding false positives during boot.
Reproduction steps
-
BTT Manta M5P in USB-to-CAN bridge mode with stock v0.13 firmware
-
BTT EBB36 (or similar STM32G0B1 toolhead board) as a CAN node
-
Power cycle the PSU completely (not just a warm reboot)
-
Observe Klipper failing to connect to the toolhead MCU
-
Confirm
candumpshows zero traffic on the toolhead’s transmit CAN ID
Additional notes
-
The issue was initially mistaken for an EBBCan firmware problem (bus-off, FDCAN clock misconfiguration, Katapult timing) and required extensive CAN bus traffic analysis to correctly attribute to the bridge’s discard logic.
-
The problem affects any configuration where a toolhead board takes longer than 50ms to boot relative to the USB-CAN bridge board on cold start. This is likely to affect other STM32-based toolhead boards beyond the EBB36.
-
Tested and confirmed on Klipper v0.13.0-572-g88a71c3ce. The stall detection was not present in v0.12 which did not exhibit this issue.
…
Hope it helps resolve this for others.