New Voron instance lost communication with EBB (can bus)

Basic Information:

Printer Model: Voron 2.4R2
MCU / Printerboard: Manta M8P V2.0 + EBB 2209 RP2040
Host / SBC: BTT CB1
klippy.log klippy.log (637.4 KB)

Hi. Please help to diag and solve problem…

I assemblied a Voron 2.4 with can bus attached head (Stealthburner with EBB2209 RP2040 connected to Manta’s CAN socket) and can’t beat toolhead lost communication issue during print.

What happens:

I start a file to print from Cura and then on random (ussually 3 - 30) layer toolhead fans are stoppped (leds color and brightness not changing), but head moves for 3-5 seconds without extruder. Then it stops and yellow error message appeared in browser.

I was able to print complete benchy two times, two voron cubes and also got many failed prints (30+).
CB1 temperature is allways < 50 degrees (C)
SBC cpu load during print is 3-4%
SBC memory load < 200 Mbytes
No screen or camera installed

I tried:

  • slice file with low speed (40mm/s)

  • slice file with high speed 300mm/s / accel 5000

  • install and run bandwidth tests from here: Klipper: communication bus tests and tried to call macros BUS_BANDWIDTH MCU=EBBCan COUNT=20000000 so it transfered gigabytes thru can for hours (and I tried to move and shake toolhead and printer in process) and there was no errors or retransmits.

  • change a cable from manta to toolhead (new one with new plug)

  • upgrade and flash manta and ebb to latest klipper firmware

  • downgrade all to klipper release v0.12.0

  • upgrade operating system (kernel now is 6.6.66 - armbian)

  • reinstall operating system

  • change psu (from china 400W to TRIO-PS-2G/1AC/24DC/10)

  • change can speed from 1000000 to 500000

  • change microsteps from 16 to 8

  • tried klipper git branch “work-canbridge-20250214”

From the log, it looks like an issue with your extruder heating element. It could be a defective heater or issues with the wiring/contacts to it.

I don’t think this is issue. This is consequence not cause. It appears because toolhead board stops accepts commands and disabled all its inventory (extruder, heater, blower fan and extruder fan). Was it a toohead board reset ???

The first error message is:

USB CANBUS bridge ‘mcu’ is discarding!
USB CANBUS bridge ‘mcu’ is no longer discarding.

I checked a video - when this message appears the print head has been allready moving for 5 seconds with the fans, extruder and heater turned off

Well, it is absolutely your right to think so. However, the log tells a different story in my opinion. The messages

USB CANBUS bridge 'mcu' is discarding!
USB CANBUS bridge 'mcu' is no longer discarding.

appear around 7 seconds after the temperature has been constantly dropping from 235.0 °C to 234.3 °C, despite Klipper’s attempts to move it in the opposite direction.
The final error is:

Heater extruder not heating at expected rate
Transition to shutdown state: Heater extruder not heating at expected rate

In case the board is missing the next PWM for the heater, it will error out, with:

  • “Missed scheduling of next digital out event”

Even if there is a high CAN tx error from MCU to EBB if PWM arrived too “late”:

  • “Scheduled digital out event will exceed max_duration”

Like, there should be no such thing as “Heater is working badly because of a bus issue”

Afaik, in a shutdown state, fans should have full blast, and heaters should be disabled.
You described that fans stop spinning, which is odd, in a shutdown state they should spin.

So, just a wild guess, if the toolhead still moved with everything turned off, can it be a power issue? That’s it, toolhead lost power, can’t answer to can packets and can’t answer to reset command.

Ah, yes, and “verify heater” probably happens, because there are no updates from the thermistor

Hope it helps.

Well, the only thing is working in toolhead at this moment is a leds - they not change at all! I thought powerlost should “reset” them too… But this can be a powerlost for secondary dc-dc on toolhead board (ldo 5->3.3V). I purchased new toolhead board for second toolhead, so i’ll try to replace it (psu was changed, cable was changed too).

I synced the time on the sbc with my phone and then took a video with a timestamp. The “stats” message before “USB CANBUS bridge ‘mcu’ is discarding!” is the moment when the hotend fan stops making noise / loses power. At this point, the extruder and blower fans also stop working. The LEDs do not change.
So it looks like the toolhead microcontroller is reset (the fans and extruder are controlled in real time by the microcontroller pins, and the LEDs are just commanded at startup and then light on their own). There are no “Heater extruder not heating at expected rate” errors in the other klippy.log.

Is there any way to know if this is a power outage or software oops in microcontroller ?

klippy.log (637.4 KB)

I’m not sure that I understand your new report. The attached log seems identical to the one provided in the original post.

Sorry. Wrong log. I got it today. But it doesn’t contain “Heater extruder not heating at…” And video show toolhead fan stopped at stats 2220.1 timestamp

klippy_20240221.log (1.8 MB)

Is there any way to know if this is a power outage or software oops in microcontroller ?

I’m not so familiar with CAN to say something strong here.
So, I don’t know if MCU will respond to messages without some “CAN Initialization”.
Generally, the microcontroller will respond to the reset command and will refuse other commands because it is not initialized, so the Klipper should error out earlier, like in the moment of reset.
You can reproduce it, by the reset button, I think.

If the controller has been rebooted, all pins should be in the default, possibly downstate. So, the LED should be disabled if PWM controls it.
I think, you can probably try to test it, recompile firmware, and define pins to be enabled by default (PD15) GPIO pins to set at micro-controller startup, choose one that you like. So, in case if your board has been rebooted by some reason, you will be able to externally determine that state, like by led state or fan spinning.

The LEDs are not PWM controlled. They are a single-wire controlled SK6812. They are turned on by commands and then continue to glow on their own until poweroff.

Your idea about “GPIO pins to set at micro-controller startup” is brilliant! I’ll try it!

@regressor go into the mcu information detail within Fluidd/Mainsail and check for these fields to be zero on ALL your MCUs. If any are non-zero you’ve got a communications issue on your CANbus and you need to check it for problems (loose/dirty connections, bad cabling, termination problems, etc.).

  • bytes_retransmit
  • bytes_invalid
  • retransmit_seq

I found mine were non-zero and nefelim4ag just commented on my thread and mentioned this one. I think mine may have been related to CANbus cabling and I’m cautiously trying a spare cable I had on hand which at first blush seems to be working better.

Good luck!

This can be checked in klippy.log. In my log I have non-zero bytes_rentransmit on both mcu’s starting at moment when toolhead fan stops spinining.

Also I tried to run bandwidth test with gigabytes transmitted without any retransmits.

I checked can bus - this is not issue.

Not that I know of. If I were you, I would work on replacing cables methodically until that goes to zero. I would also confirm that my termination was set such that the two physical ends of the bus had termination enabled.

If all of that looks good, then the problem could also be that some hardware is bad. I had that problem before and it becomes very difficult to debug without replacing the bad hardware. These daughter boards are made cheaply some with cheap components and they can go bad. I think this is something that people tend to overlook.

I’ll add that when I had bad hardware in my setup, I was unable to cause a failure synthetically. I mean using bandwidth flooding tests and watching for errors. That isn’t a good way to verify your bus and hardware is working. I had to use klipper natively working the bus with command and watching those fields to be zero.

Look at the first message - i already tried to replace can cable (new from toolhead to board) and psu.

Then look at my last klippy_20240221.log - to message before “USB CANBUS bridge ‘mcu’ is discarding!” - it is Stats with timestamp 2220.1

Take second line from log - it contain unix global timestamp and local timestamp, substract local from global and add Stats timestamp and you’ll get a real gmt time of this event (1740122391,6 - 444,8 + 2220,1 = 1740124166,9). Then you can paste result to https://www.unixtimestamp.com/ and you’ll get Fri Feb 21 2025 07:49:26 GMT+0000

Also I have a timestamped video shoot of this print. And Fri Feb 21 2025 07:49:26 GMT+0000 is a time when hotend fan lost power and stops noizing.

This is a 2220.1 stats message:

Stats 2220.1: gcodein=0 canstat_mcu: bus_state=active rx_error=0 tx_error=0 tx_retries=0 mcu: mcu_awake=0.015 mcu_task_avg=0.000002 mcu_task_stddev=0.000001 bytes_write=5297125 bytes_read=1395616 bytes_retransmit=0 bytes_invalid=0 send_seq=116581 receive_seq=116581 retransmit_seq=0 srtt=0.002 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=399996326 canstat_EBBCan: bus_state=active rx_error=0 tx_error=0 tx_retries=101660 EBBCan: mcu_awake=0.012 mcu_task_avg=0.000009 mcu_task_stddev=0.000018 bytes_write=1599769 bytes_read=707427 bytes_retransmit=0 bytes_invalid=0 send_seq=41291 receive_seq=41291 retransmit_seq=0 srtt=0.003 rttvar=0.002 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000361 adj=12000484 sd_pos=1739839 heater_bed: target=70 temp=69.9 pwm=0.186 chamber: temp=29.8 EBB_NTC: temp=38.7 CB1: temp=42.0 sysload=0.20 cputime=265.419 memavail=653384 print_time=1778.460 buffer_time=2.037 print_stall=0 extruder: target=235 temp=234.4 pwm=0.319
USB CANBUS bridge ‘mcu’ is discarding!

it has bytes_retransmit=0 both for mcu and ebb. But next 4 stats messages has non-zero bytes_retransmit for both mcu and ebb, so i belive can wiring is not issue.

hotendfanstop2

1 Like

okay, so the fact that the line immediately after the one you posted has retransmit of 180 bytes and 563 bytes doesn’t concern you? Good luck with your debug.

  1. Fans & extrudder stopped without command (look at last commands dump)
  2. Toolhead moving (so mcu is probably ok and running commands queue)
  3. After fan stop there is a retransmits appeared on mcu and ebb

So ebb board already faulted before retransmits and I believe retransmits are an effect, not a cause.

*I am posting my reasoning and logical chains here in case someone finds incorrect conclusions in them - I can be wrong : (

Btw, try to enable rp2040 MCU temperature monitoring.
Just in case, it is rated up to 85C.

Neopixel LED (24V->5V) is possibly powered from the same board. So, it is safe to say, there should be enough power, to power MCU (5V->3.3V).

The only mystical reason, that I can suggest is either watchdog comes into play, but that should mean there is a bug in firmware.
Or there is some sort of hardware failure.

It is hard to guess here.

I enabled temp monitoring both for manta mcu and rp2040. Also I added gpio26 to the “GPIO pins to set at micro-controller startup” on ebb firmware (this is a red led on ebb board). Then I added output pin EBBCan:gpio26 to klipper config with initial value = 0. So when ebb board resets red status led lights and then it turned off by klipper.

Test print failed on second layer and red light appeared (at same time when fan stop noizing). And ebb mcu temperature at this moment was 35C (chamber: temp=27.0 EBB_NTC: temp=37.3 EBB: temp=35.0 MCU: temp=42.8 CB1: temp=40.9 sysload=0.06 cputime=185.621 memavail=659468 print_time=1611.347 buffer_time=2.338 print_stal
l=0 extruder: target=235 temp=234.9 pwm=0.151)

So yes - reset of ebb2209 is the reason of failure. Not sure if it hardware or software fail and don’t know how to debug this. I’ll try to replace ebb…

1 Like

Ok. Some news. I got a new EBB2209_RP2040 board and replaced the old one. And this didn’t help - the first print failed on the 40th layer.

Then I decided to add a capacitor to the ebb power line (I found only a big 470mkf-100v so I soldered it to power input contacts and it seems to me this helps - I printed 1.5hour part and then 2hour parts set.

So I think the cause is a voltage drop on power wires because of the toolhead payload: 70W heater block, 3xfan, pt1000 with max 31865, 3 bright leds and extruder drive (650ma). There is only one large capacitor on the ebb board - 100mkf-35v and it looks like it is too small for this load.

2 Likes