Setup
- Siboor Voron 2.4r2 kit from 2022-2023 converted to StealthChanger (4 toolheads)
- Motherbopard: BTT Octopus Pro v1.1
- Toolheads: 4× BTT EBB SB2209 (RP2040) v1.0
- USB-CAN adapters tried: BTT U2C v2.1 (factory firmware originally, now updated per Esoterical’s guide) and FYSETC UCAN v1.0 — both fail
- CAN hubs tried: Isik’s Tech MOAR_CAN and a DIY hub (parallel power distribution, CAN bus OUT from one head connected to IN of next) — both fail
- Host: Raspberry Pi 3B, MainsailOS 1.3.2 (Debian 11 Bullseye), kernel 6.1.21-v8+ aarch64. Note: this matches the combination Esoterical flags as causing timing issues. OS reflash is a significant operation (full Klipper/Moonraker/Mainsail rebuild) so I’d like some confirmation this could plausibly cause the specific symptoms described below — not just generic “timing issues” — before committing to it.
- Klipper 0.13.0-455, Katapult 0.0.1-91 on all toolheads
- Wiring: linear bus through the moar_can as designed. Power and CAN-out on SB2209’s XT30 (incoming + outgoing CANH/L crimped into same pin). Also tried using the SB2209 secondary board’s CANH/L breakout — identical failure either way.
- Cable: original SB2209 cable for outgoing leg, twisted pair from cat5e for return leg
- Bus length: ~7.7m total (1.3m from U2C to moar_can + 4× 1.6m round-trip to toolheads)
- Termination: 2× 120Ω, one at U2C, one at moar_can (PCB termination jumper populated). Measured 64Ω across bus, power off. Correct.
- 24V at each toolhead: stable at 24V+, no measurable sag
- Star ground at PSU. Each toolhead has dedicated 24V/GND back to moar_can common point
- Bitrate: tried 1Mbit and 500kbit on both adapters, identical failure at both
- txqueuelen 128, restart-ms 100 (systemd-networkd config)
Issue
CAN bus fails (BUS-OFF on U2C, persistent ERROR-PASSIVE on UCAN) with 3+ toolheads; works fine with any 2 out of 4. All physical-layer checks pass. Tried 2 USB to CAN bus adapters, 2 CAN bus hubs and both 1M and 500k bitrates.
Symptom
Any single toolhead works. Any pair works. Three or more active toolheads → bus fails within seconds of boot — U2C goes to BUS-OFF, UCAN holds in persistent ERROR-PASSIVE. Klipper logs repeated Timeout on connect / Serial connection closed for the first MCU it tries to enumerate either way.
The “which 2 work” is interchangeable — every combination of 2 I’ve tried works, no combination of 3 neither all 4 works.
Specific signature
With 2 toolheads (working): RX ~19 packets/sec, candump shows all of it, all error counters 0, state ERROR-ACTIVE.
With 3+ toolheads (failing): RX climbs at ~3,000+ packets/sec on U2C, ~25,000+/sec on UCAN, while candump captures almost nothing (only a few normal Klipper protocol frames from one node). Counters and candump don’t match — the controller is registering huge bus activity that doesn’t surface as valid frames. The bus-errors counter stays at 0 throughout, despite massive RX accumulation and bus-off/error-passive events. TX failures with no valid ACKs.
U2C: goes through ERROR-WARN → ERROR-PASSIVE → BUS-OFF rapidly.
UCAN: holds in ERROR-PASSIVE persistently, error-pass counter climbs into the millions without progressing to bus-off.
Example U2C output after a failure:
can state BUS-OFF restart-ms 100
bitrate 500000 sample-point 0.875
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 3 3 2
RX: bytes 17717569 packets 2214713 errors 0 dropped 67
TX: bytes 24 packets 3 errors 0 dropped 10
Example UCAN output after a similar failure window:
can state ERROR-PASSIVE restart-ms 100
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 32 1161925 0
RX: bytes 9296416 packets 1162052 errors 0 dropped 67
TX: bytes 18 packets 4 errors 0 dropped 0
Relevant note on can2040 behavior
The SB2209 RP2040 uses can2040 (Kevin O’Connor’s software CAN implementation), which per its own documentation does not implement automatic bus-off transitions and does not transmit error frames. Static analysis of can2040.c confirms: no SJW/TSEG/PHASE_SEG configurable parameters, no bus-off state machine, no escalation past discarding malformed messages. Bit timing is hard-coded at 32 PIO clocks per CAN bit; resync is implemented in PIO assembly.
This may explain the host-side asymmetry (host goes bus-off or persistent error-passive while toolheads keep retrying indefinitely), but does not explain why the threshold is 3 nodes or why the bus floods with invisible-to-candump traffic.
Outstanding hypotheses I haven’t been able to confirm or rule out
-
The OS combination (Bullseye + kernel 6.1 + aarch64) Esoterical warns about — would appreciate input on whether this could plausibly cause the specific signature above before I commit to the OS reflash. I know some will think this is the obvious next step but as I said changing the OS is a mayor PITA and would like some evidence supporting it as a cause for this issue. I had been using the same OS before converting to Stealthchanger/CAN bus with zero problems.
-
Some interaction between can2040’s non-standard error handling and the host adapter’s standard CAN state machine that only manifests with 3+ active nodes
What I’m asking
Has anyone seen this specific signature — bus failure (BUS-OFF or persistent ERROR-PASSIVE) via TX errors with bus-errors counter staying at 0, massive RX counter accumulation that doesn’t appear in candump, threshold at exactly 3 nodes, bitrate-independent, identical across two adapters and two hubs?
Particularly interested in input from anyone with multi-toolhead/IDEX/toolchanger experience, anyone who’s debugged gs_usb driver issues at this depth, or anyone with can2040 internals knowledge.
Ruled out (with measurements where applicable)
-
Termination (60-65Ω measured, no extra terminators anywhere)
-
Topology / stubs (true linear bus through moar_can, ~12mm physical stubs)
-
Cabling (CANH/L on twisted pairs throughout)
-
Bus length (7.7m, well within spec at both 500k and 1M)
-
Power delivery (24V solid at each head)
-
Ground (star at PSU, dedicated returns)
-
Bitrate (fails at both 1M and 500k)
-
txqueuelen (tried 128, 256, and 1000/1024 at various points — current value 128 per Klipper docs recommendation; no observable change in failure behavior at any value)
-
SJW on host side (tested SJW=2 — accepted by gs_usb driver but no change in failure behavior)
-
Individual toolhead health (each works solo, all pair combinations work)
-
USB-CAN adapter (both U2C and UCAN fail; different behaviors, same outcome)
-
USB-CAN adapter firmware (U2C factory firmware updated per Esoterical, no change)
-
Bridge mode (using transparent gs_usb / candleLight firmware, not Klipper bridge)
-
Hubs (MOAR_CAN and DIY hub both fail)
-
moar_can per-port jumpers (correctly populated for unused ports when relevant)
-
The “two wires in one crimp” connection at SB2209 (changed approach mid-project, no behavior change)
-
Solo MCU health (each toolhead enumerates fine when alone on the bus)
klippy2.log (444.1 KB)