CAN bus collapse when connecting 3rd toolhead

Setup

  • Siboor Voron 2.4r2 kit from 2022-2023 converted to StealthChanger (4 toolheads)
  • Motherbopard: BTT Octopus Pro v1.1
  • Toolheads: 4× BTT EBB SB2209 (RP2040) v1.0
  • USB-CAN adapters tried: BTT U2C v2.1 (factory firmware originally, now updated per Esoterical’s guide) and FYSETC UCAN v1.0 — both fail
  • CAN hubs tried: Isik’s Tech MOAR_CAN and a DIY hub (parallel power distribution, CAN bus OUT from one head connected to IN of next) — both fail
  • Host: Raspberry Pi 3B, MainsailOS 1.3.2 (Debian 11 Bullseye), kernel 6.1.21-v8+ aarch64. Note: this matches the combination Esoterical flags as causing timing issues. OS reflash is a significant operation (full Klipper/Moonraker/Mainsail rebuild) so I’d like some confirmation this could plausibly cause the specific symptoms described below — not just generic “timing issues” — before committing to it.
  • Klipper 0.13.0-455, Katapult 0.0.1-91 on all toolheads
  • Wiring: linear bus through the moar_can as designed. Power and CAN-out on SB2209’s XT30 (incoming + outgoing CANH/L crimped into same pin). Also tried using the SB2209 secondary board’s CANH/L breakout — identical failure either way.
  • Cable: original SB2209 cable for outgoing leg, twisted pair from cat5e for return leg
  • Bus length: ~7.7m total (1.3m from U2C to moar_can + 4× 1.6m round-trip to toolheads)
  • Termination: 2× 120Ω, one at U2C, one at moar_can (PCB termination jumper populated). Measured 64Ω across bus, power off. Correct.
  • 24V at each toolhead: stable at 24V+, no measurable sag
  • Star ground at PSU. Each toolhead has dedicated 24V/GND back to moar_can common point
  • Bitrate: tried 1Mbit and 500kbit on both adapters, identical failure at both
  • txqueuelen 128, restart-ms 100 (systemd-networkd config)

Issue

CAN bus fails (BUS-OFF on U2C, persistent ERROR-PASSIVE on UCAN) with 3+ toolheads; works fine with any 2 out of 4. All physical-layer checks pass. Tried 2 USB to CAN bus adapters, 2 CAN bus hubs and both 1M and 500k bitrates.

Symptom

Any single toolhead works. Any pair works. Three or more active toolheads → bus fails within seconds of boot — U2C goes to BUS-OFF, UCAN holds in persistent ERROR-PASSIVE. Klipper logs repeated Timeout on connect / Serial connection closed for the first MCU it tries to enumerate either way.
The “which 2 work” is interchangeable — every combination of 2 I’ve tried works, no combination of 3 neither all 4 works.
Specific signature
With 2 toolheads (working): RX ~19 packets/sec, candump shows all of it, all error counters 0, state ERROR-ACTIVE.
With 3+ toolheads (failing): RX climbs at ~3,000+ packets/sec on U2C, ~25,000+/sec on UCAN, while candump captures almost nothing (only a few normal Klipper protocol frames from one node). Counters and candump don’t match — the controller is registering huge bus activity that doesn’t surface as valid frames. The bus-errors counter stays at 0 throughout, despite massive RX accumulation and bus-off/error-passive events. TX failures with no valid ACKs.
U2C: goes through ERROR-WARN → ERROR-PASSIVE → BUS-OFF rapidly.
UCAN: holds in ERROR-PASSIVE persistently, error-pass counter climbs into the millions without progressing to bus-off.
Example U2C output after a failure:

can state BUS-OFF restart-ms 100
      bitrate 500000 sample-point 0.875
      re-started bus-errors arbit-lost error-warn error-pass bus-off
      0          0          0          3          3          2
RX: bytes 17717569  packets 2214713  errors 0  dropped 67
TX: bytes 24        packets 3        errors 0  dropped 10

Example UCAN output after a similar failure window:

can state ERROR-PASSIVE restart-ms 100
      re-started bus-errors arbit-lost error-warn error-pass bus-off
      0          0          0          32         1161925    0
RX: bytes 9296416  packets 1162052  errors 0  dropped 67
TX: bytes 18       packets 4        errors 0  dropped 0

Relevant note on can2040 behavior
The SB2209 RP2040 uses can2040 (Kevin O’Connor’s software CAN implementation), which per its own documentation does not implement automatic bus-off transitions and does not transmit error frames. Static analysis of can2040.c confirms: no SJW/TSEG/PHASE_SEG configurable parameters, no bus-off state machine, no escalation past discarding malformed messages. Bit timing is hard-coded at 32 PIO clocks per CAN bit; resync is implemented in PIO assembly.
This may explain the host-side asymmetry (host goes bus-off or persistent error-passive while toolheads keep retrying indefinitely), but does not explain why the threshold is 3 nodes or why the bus floods with invisible-to-candump traffic.

Outstanding hypotheses I haven’t been able to confirm or rule out

  1. The OS combination (Bullseye + kernel 6.1 + aarch64) Esoterical warns about — would appreciate input on whether this could plausibly cause the specific signature above before I commit to the OS reflash. I know some will think this is the obvious next step but as I said changing the OS is a mayor PITA and would like some evidence supporting it as a cause for this issue. I had been using the same OS before converting to Stealthchanger/CAN bus with zero problems.

  2. Some interaction between can2040’s non-standard error handling and the host adapter’s standard CAN state machine that only manifests with 3+ active nodes

What I’m asking

Has anyone seen this specific signature — bus failure (BUS-OFF or persistent ERROR-PASSIVE) via TX errors with bus-errors counter staying at 0, massive RX counter accumulation that doesn’t appear in candump, threshold at exactly 3 nodes, bitrate-independent, identical across two adapters and two hubs?

Particularly interested in input from anyone with multi-toolhead/IDEX/toolchanger experience, anyone who’s debugged gs_usb driver issues at this depth, or anyone with can2040 internals knowledge.

Ruled out (with measurements where applicable)

  • Termination (60-65Ω measured, no extra terminators anywhere)

  • Topology / stubs (true linear bus through moar_can, ~12mm physical stubs)

  • Cabling (CANH/L on twisted pairs throughout)

  • Bus length (7.7m, well within spec at both 500k and 1M)

  • Power delivery (24V solid at each head)

  • Ground (star at PSU, dedicated returns)

  • Bitrate (fails at both 1M and 500k)

  • txqueuelen (tried 128, 256, and 1000/1024 at various points — current value 128 per Klipper docs recommendation; no observable change in failure behavior at any value)

  • SJW on host side (tested SJW=2 — accepted by gs_usb driver but no change in failure behavior)

  • Individual toolhead health (each works solo, all pair combinations work)

  • USB-CAN adapter (both U2C and UCAN fail; different behaviors, same outcome)

  • USB-CAN adapter firmware (U2C factory firmware updated per Esoterical, no change)

  • Bridge mode (using transparent gs_usb / candleLight firmware, not Klipper bridge)

  • Hubs (MOAR_CAN and DIY hub both fail)

  • moar_can per-port jumpers (correctly populated for unused ports when relevant)

  • The “two wires in one crimp” connection at SB2209 (changed approach mid-project, no behavior change)

  • Solo MCU health (each toolhead enumerates fine when alone on the bus)
    klippy2.log (444.1 KB)

Welcome daBigR,

most important is attaching a klippy.log! That tells us almost all the important things.

Best way to do is filling out this form.

Basic Information:

Printer Model:
MCU / Printerboard:
Host / SBC
klippy.log

Fill out above information and in all cases attach your** klippy.log file** (use zip to compress it, if too big). Pasting your printer.cfg is not needed
Be sure to check our “Knowledge Base” Category first. Most relevant items, e.g. error messages, are covered there

Describe your issue:

As I’ve read the gs usb driver, I don’t think it is relevant to this.
As you mentioned, it looks like a CAN bus hardware fault, where all CAN bus controllers go to the passive state.

I guess the only way forward is to look at the CAN bus with some logic analyzer/oscilloscope, and well, guess what happened.

Klipper should expose hardware error counters for TX/RX
Klipper can bridge can be flashed to the U2C, and well, U2C can be run as a CAN bridge.

So, it is even possible to use output() in specific drivers, to output the type of the error on the canbridge side.

Otherwise, IIRC, all devices should be somewhat synchronized by the SOF.
So, it sounds to me like your bus is failing at synchronization.

I’ve no idea what I’m talking about, so take my words with a grain of salt.

-Timofey


Btw, this was useful to me to understand how it works, hope that can help you to: CAN Bus Errors Explained - A Simple Intro [2025] – CSS Electronics

Alas, I don’t know what the issue is, but I have a few high-level ideas.

What you are describing sounds like a hardware wiring issue. I’d be surprised if any change to software (kernel, can2040, or klipper) would alter the behavior.

No, I haven’t seen reports like that. I can’t think of anything that would cause that other than wiring issues.

I regularly run 4 devices on a single canbus on my local printer and don’t have any issues like that (1 toolhead board, 2 xy motor boards, 1 canbus bridge board).

I know you’ve tested the terminating resistors, but I’d suggest triple/quadruple checking them. What you are describing sounds just like an issue with the resistors.

Make sure you’ve got ~60 ohms of resistance between CANH and CANL wires after all four toolheads are connected.

The issue sounds like additional toolheads are adding additional resistors, and once 1 more toolhead is added it finally brings the resistance so far out of spec that everything fails.

Again, I understand you’ve tested it, but that’s my best guess as to what could bring about your symptoms.

Another possibility is “line noise” (reflections, impedance mismatch, etc.) that finally add up to something unstable when the fourth line is added. This seems highly unlikely though. You could drop the bitrate to something absurd (like 100000) just for testing to verify if that is the issue.

Maybe that helps a little,
-Kevin

Thanks, gross wiring errors like swapped CANH & CANL or short circuit / open circuit, I think can be discarded by the fact that any two heads work ok just by disconnecting others from the hub and jumpering those positions Also this doesn’t change the terminating resistors at all since I’m absolutely 100% completely sure there are no terminating resistors on the heads, only on the USB to CAN board and MOAR_CAN hub. And by now I must have measured them like a bajillion times. I could try testing at a very low bitrate but since I should be on spec for 1M and lowering to 500k didn’t make any difference, I’m also in the very unlikely camp on this, besides it’s a pain to reflash all toolhead boards.

Hi, thanks for the input. I’ve been reluctant to probe the bus with an oscilloscope mainly because I don’t have a differential probe and using two channels and math is not so good in my experience. Also I think this could answer “what” but not “why” and I’d still be stuck. As for the bridge suggestions, all other comments I’ve seen as I understand them, point to the contrary bridge mode would be worse, transparent mode is the correct one for this scenario.

Otherwise, IIRC, all devices should be somewhat synchronized by the SOF.
So, it sounds to me like your bus is failing at synchronization.

Possible, but as I said regarding probing the BUS, if that is the case then “why” and how to fix it?

What happens when you go back to your old working two toolhead setup? Is your system stable?

What happens when you go back to your old two toolhead working setup and exchange one old toolhead with the new toolhead. Doing that would eliminate DOA of your new toolhead.

I don’t see where you have verified the UUID of all CAN devices are unique?

The fact that “any pair” works cancels this concern but Maybe there is an address colision.

Maybe I’m not fully understanding your question, but so far as I’ve tested all two heads combinations work. As for going from 3 to 2 with a live system, never done it, I’ve always shut down, connected or disconnected stuff and then turned on again.

Hi, didn’t post it but yes all 4 ids are indeed different.

Could you post more klippy.logs after your tries?

Certainly, but maybe you could be more specific as to which test configurations would you find helpful to diagnose? The log I posted has a working two toolhead config around line 669 and 1411 and a failure in 2142 (only two heads configured in printer.cfg but a third one added to the bus) and 2898 and many others after that. As to which configuration was active in each one I can’t say for certain other than what I just said, but AFAIK all failures surface the same way in that log.

If you tried those combinations. klippy.logs from each try might be helpful since we are still poking about in a fog .

I think I made a fairly comprehensive post and answered all questions. I know I forgot the Klipper log at first, but added it when prompted and also explained the combinations associated with that log as my memory allowed. I will post new logs with what you ask for, but I think I already explained that whenever I have only two heads (any two heads) on the BUS everything works and you will see a log like on line 669 of klippy2.log and whenever I have three or more heads on the bus, no matter if they are present in printer.cfg or not, the log is as on line 2142 on klippy2.log. I’m sorry if I’m mistaken about the quality of the information I’ve provided but I sincerely don’t think the “poking about in a fog” comment is fair.
Files uploaded: heads_0_1 only those two heads on bus only those configured in .cfg, heads_1_2 same with 1 & 2. heads_0_1_2 all three on bus and in .cfg, heads_0_1_2b all three on bus only 0 in .cfg
klippy_heads_0_1_2b.log (34.7 KB)
klippy_heads_0_1_2.log (34.5 KB)
klippy_heads_1_2.log (31.7 KB)
klippy_heads_0_1.log (31.4 KB)

Termination: 2× 120Ω, one at U2C, one at moar_can (PCB termination jumper populated). Measured 64Ω across bus, power off. Correct.

I was reluctant to reply but what stands out to me breezing through the post is the termination should be on the U2C and the last CAN tool head , I am atm building a Toolchanger and am using a Hexa Distro which is terminated and 4 tool heads I purposely put T0 on the end of the Can Line and that is terminated so every other tool head added in between is not terminated and trivial to add disconnect at will with no influence on termination. Please excuse my limited CAN understanding if I am wrong in my comments and that is very likely.

In my setup all toolheads are wired the same with a pair carrying CANH and CANL to the head and another pair returning to the “hub”. As you say one resistor is at the start on the u2c board and instead of having the terminating resistor at the ending head it is at the end of the bus at the hub. Maybe if you read the MOAR_CAN manual it may be more clear than my explanation.

Although it would be nice to debug this setup but have you considered a 2nd U2C plugged into your pi? Attach 2 heads to CAN2 and melt some plastic.

Has anyone successfully put 4 RP2040 on one bus?

Absolutely, that is my plan C at this moment, as a matter of fact since I have both the u2c and ucan I don’t even need a second u2c and I already figured how to split the moar_can in two segments so I won’t need a separate hub nor change cabling. But I think I’ll go with plan B first, it being changing Raspberry Pi kernel/OS , even if I find no harder supporting arguments here.

Did you try testing your setup with termination on the U2C and on the last toolhead board?
The manual says this is possible if I’m right. And this would adhere more to the CAN specs, termination of the bus at the beginning and the end - the CAN bus starts on the U2C.

Yes, I did test with that configuration before settling on the current one. And it doesn’t make any difference to the behaviour of the system, any two heads work, three or more crash. As I said on my original post, after trying and testing and checking everything pointed out on documentation and related forum posts, I think the problem must be one of the two alternatives I mention: OS version timing issues or can2040 SW CAN bus implementation limitations. Thing is both are painful and/or costly to try on what I think is weak evidence.