I assemblied a Voron 2.4 with can bus attached head (Stealthburner with EBB2209 RP2040 connected to Manta’s CAN socket) and can’t beat toolhead lost communication issue during print.
What happens:
I start a file to print from Cura and then on random (ussually 3 - 30) layer toolhead fans are stoppped (leds color and brightness not changing), but head moves for 3-5 seconds without extruder. Then it stops and yellow error message appeared in browser.
I was able to print complete benchy two times, two voron cubes and also got many failed prints (30+).
CB1 temperature is allways < 50 degrees (C)
SBC cpu load during print is 3-4%
SBC memory load < 200 Mbytes
No screen or camera installed
I tried:
slice file with low speed (40mm/s)
slice file with high speed 300mm/s / accel 5000
install and run bandwidth tests from here: Klipper: communication bus tests and tried to call macros BUS_BANDWIDTH MCU=EBBCan COUNT=20000000 so it transfered gigabytes thru can for hours (and I tried to move and shake toolhead and printer in process) and there was no errors or retransmits.
change a cable from manta to toolhead (new one with new plug)
upgrade and flash manta and ebb to latest klipper firmware
downgrade all to klipper release v0.12.0
upgrade operating system (kernel now is 6.6.66 - armbian)
reinstall operating system
change psu (from china 400W to TRIO-PS-2G/1AC/24DC/10)
I don’t think this is issue. This is consequence not cause. It appears because toolhead board stops accepts commands and disabled all its inventory (extruder, heater, blower fan and extruder fan). Was it a toohead board reset ???
The first error message is:
USB CANBUS bridge ‘mcu’ is discarding!
USB CANBUS bridge ‘mcu’ is no longer discarding.
I checked a video - when this message appears the print head has been allready moving for 5 seconds with the fans, extruder and heater turned off
Well, it is absolutely your right to think so. However, the log tells a different story in my opinion. The messages
USB CANBUS bridge 'mcu' is discarding!
USB CANBUS bridge 'mcu' is no longer discarding.
appear around 7 seconds after the temperature has been constantly dropping from 235.0 °C to 234.3 °C, despite Klipper’s attempts to move it in the opposite direction.
The final error is:
Heater extruder not heating at expected rate
Transition to shutdown state: Heater extruder not heating at expected rate
In case the board is missing the next PWM for the heater, it will error out, with:
“Missed scheduling of next digital out event”
Even if there is a high CAN tx error from MCU to EBB if PWM arrived too “late”:
“Scheduled digital out event will exceed max_duration”
Like, there should be no such thing as “Heater is working badly because of a bus issue”
Afaik, in a shutdown state, fans should have full blast, and heaters should be disabled.
You described that fans stop spinning, which is odd, in a shutdown state they should spin.
So, just a wild guess, if the toolhead still moved with everything turned off, can it be a power issue? That’s it, toolhead lost power, can’t answer to can packets and can’t answer to reset command.
Ah, yes, and “verify heater” probably happens, because there are no updates from the thermistor
Well, the only thing is working in toolhead at this moment is a leds - they not change at all! I thought powerlost should “reset” them too… But this can be a powerlost for secondary dc-dc on toolhead board (ldo 5->3.3V). I purchased new toolhead board for second toolhead, so i’ll try to replace it (psu was changed, cable was changed too).
I synced the time on the sbc with my phone and then took a video with a timestamp. The “stats” message before “USB CANBUS bridge ‘mcu’ is discarding!” is the moment when the hotend fan stops making noise / loses power. At this point, the extruder and blower fans also stop working. The LEDs do not change.
So it looks like the toolhead microcontroller is reset (the fans and extruder are controlled in real time by the microcontroller pins, and the LEDs are just commanded at startup and then light on their own). There are no “Heater extruder not heating at expected rate” errors in the other klippy.log.
Is there any way to know if this is a power outage or software oops in microcontroller ?
Sorry. Wrong log. I got it today. But it doesn’t contain “Heater extruder not heating at…” And video show toolhead fan stopped at stats 2220.1 timestamp
Is there any way to know if this is a power outage or software oops in microcontroller ?
I’m not so familiar with CAN to say something strong here.
So, I don’t know if MCU will respond to messages without some “CAN Initialization”.
Generally, the microcontroller will respond to the reset command and will refuse other commands because it is not initialized, so the Klipper should error out earlier, like in the moment of reset.
You can reproduce it, by the reset button, I think.
If the controller has been rebooted, all pins should be in the default, possibly downstate. So, the LED should be disabled if PWM controls it.
I think, you can probably try to test it, recompile firmware, and define pins to be enabled by default (PD15) GPIO pins to set at micro-controller startup, choose one that you like. So, in case if your board has been rebooted by some reason, you will be able to externally determine that state, like by led state or fan spinning.
The LEDs are not PWM controlled. They are a single-wire controlled SK6812. They are turned on by commands and then continue to glow on their own until poweroff.
Your idea about “GPIO pins to set at micro-controller startup” is brilliant! I’ll try it!
@regressor go into the mcu information detail within Fluidd/Mainsail and check for these fields to be zero on ALL your MCUs. If any are non-zero you’ve got a communications issue on your CANbus and you need to check it for problems (loose/dirty connections, bad cabling, termination problems, etc.).
I found mine were non-zero and nefelim4ag just commented on my thread and mentioned this one. I think mine may have been related to CANbus cabling and I’m cautiously trying a spare cable I had on hand which at first blush seems to be working better.
Not that I know of. If I were you, I would work on replacing cables methodically until that goes to zero. I would also confirm that my termination was set such that the two physical ends of the bus had termination enabled.
If all of that looks good, then the problem could also be that some hardware is bad. I had that problem before and it becomes very difficult to debug without replacing the bad hardware. These daughter boards are made cheaply some with cheap components and they can go bad. I think this is something that people tend to overlook.
I’ll add that when I had bad hardware in my setup, I was unable to cause a failure synthetically. I mean using bandwidth flooding tests and watching for errors. That isn’t a good way to verify your bus and hardware is working. I had to use klipper natively working the bus with command and watching those fields to be zero.
Look at the first message - i already tried to replace can cable (new from toolhead to board) and psu.
Then look at my last klippy_20240221.log - to message before “USB CANBUS bridge ‘mcu’ is discarding!” - it is Stats with timestamp 2220.1
Take second line from log - it contain unix global timestamp and local timestamp, substract local from global and add Stats timestamp and you’ll get a real gmt time of this event (1740122391,6 - 444,8 + 2220,1 = 1740124166,9). Then you can paste result to https://www.unixtimestamp.com/ and you’ll get Fri Feb 21 2025 07:49:26 GMT+0000
Also I have a timestamped video shoot of this print. And Fri Feb 21 2025 07:49:26 GMT+0000 is a time when hotend fan lost power and stops noizing.
it has bytes_retransmit=0 both for mcu and ebb. But next 4 stats messages has non-zero bytes_retransmit for both mcu and ebb, so i belive can wiring is not issue.
okay, so the fact that the line immediately after the one you posted has retransmit of 180 bytes and 563 bytes doesn’t concern you? Good luck with your debug.
Btw, try to enable rp2040 MCU temperature monitoring.
Just in case, it is rated up to 85C.
Neopixel LED (24V->5V) is possibly powered from the same board. So, it is safe to say, there should be enough power, to power MCU (5V->3.3V).
The only mystical reason, that I can suggest is either watchdog comes into play, but that should mean there is a bug in firmware.
Or there is some sort of hardware failure.
I enabled temp monitoring both for manta mcu and rp2040. Also I added gpio26 to the “GPIO pins to set at micro-controller startup” on ebb firmware (this is a red led on ebb board). Then I added output pin EBBCan:gpio26 to klipper config with initial value = 0. So when ebb board resets red status led lights and then it turned off by klipper.
Test print failed on second layer and red light appeared (at same time when fan stop noizing). And ebb mcu temperature at this moment was 35C (chamber: temp=27.0 EBB_NTC: temp=37.3 EBB: temp=35.0 MCU: temp=42.8 CB1: temp=40.9 sysload=0.06 cputime=185.621 memavail=659468 print_time=1611.347 buffer_time=2.338 print_stal
l=0 extruder: target=235 temp=234.9 pwm=0.151)
So yes - reset of ebb2209 is the reason of failure. Not sure if it hardware or software fail and don’t know how to debug this. I’ll try to replace ebb…
Ok. Some news. I got a new EBB2209_RP2040 board and replaced the old one. And this didn’t help - the first print failed on the 40th layer.
Then I decided to add a capacitor to the ebb power line (I found only a big 470mkf-100v so I soldered it to power input contacts and it seems to me this helps - I printed 1.5hour part and then 2hour parts set.
So I think the cause is a voltage drop on power wires because of the toolhead payload: 70W heater block, 3xfan, pt1000 with max 31865, 3 bright leds and extruder drive (650ma). There is only one large capacitor on the ebb board - 100mkf-35v and it looks like it is too small for this load.