Problem with CANBUS during print/idle

Basic Information:

Printer Model: Voron 2.4r2
MCU / Printerboard: Spider 1.0 / EBB36 v1.2 / MKS CANable Pro (candlelight_fw)
klippy.log
klippy.log.2022-12-15.log (777.0 KB)

Fill out above information and in all cases attach your klippy.log file. Pasting your printer.cfg is not needed

Describe your issue:

Occasionally I have issues with my canbus setup and I’m not sure if it’s a software or hardware problem. Sometimes I get an error during printing and sometimes after some hours of idling. I attached the log of that day. I cut away the stats at the beginning and end. The error should be around here:

b’Got error -1 in can read: (19)No such device’
Timeout with MCU ‘can0’ (eventtime=190242.397807)
Transition to shutdown state: Lost communication with MCU ‘can0’

I used this kit:

I hope someone can help me here.

Best regards,
Florain

Printer: Custom CoreXY
MCU: Octopus 1.1 (Bridge mode), EBB42
CAN speed: 1000 000 bps

I’m not sure if I am clouding the OP’s issue, but I have very similar issues. After spending a LOT of time on getting CAN stable. Everything is mostly fine, but I have lost faith in long prints. On the attached log, the print errored out after a number of hours. (5-10… not sure. was not monitoring). On idle things seem fine, but my last print, which had a cylindrical shape (so many steps), I got timeouts and stopped prints.
klippy.log (354.2 KB)

Looking at the logs, I’m not even sure CAN is at fault. I might get flamed for not having the correct details.

I’m doubtful this is a hardware error… the fact that the print can run for an extended time (hours, without any issue), indicates that something goes wrong somewhere in the software stack. That might very well be a naive statement, but in my experience hardware issues typically manifest much faster.

I have trimmed the log to make it smaller.

Any guidance/input will be helpful.

Glad I’m not the only one :sweat_smile:

Try this
edit /etc/network/interfaces.d/can0

Change the line auto can0 to

allow-hotplug can0

Reboot your Raspberry Pi.

‘allow-hotplug’ allows the devices to reconnect to the bus, whereas ‘auto’ is only during the initial bus setup.

Hi, I already have that.

allow-hotplug can0
iface can0 can static
bitrate 1000000
up ifconfig $IFACE txqueuelen 128

ditto on allow-hotplug. This setting is really to allow the canbus to gracefully com back online when firmware_restart is issued AFAIK.
For ref, here is my interface def:

allow-hotplug can0
iface can0 can static
    bitrate 1000000
    up ifconfig $IFACE txqueuelen 1024

What version of the EBB are you using (F072) (G0B1)?

CAN hat, U2C (F072) or (G0B1), or Klipper bridge?

Using CanBoot?

EBB42 CAN V1.2 (G0B1)
Klipper Bridge (Octopus V1.1)
Canboot - Yes

Edit:
CANboot on both: EBB & Octopus

@koconnor any input on what to check?

I guess the MKS CANable pro acts like a klipper bridge.
CANboot is on the EBB36 installed and working because I already did an update using CANboot.

@flow What version is your EBB (F072) or (G0B1)?

Are any other CAN bus devices on the same bus?

@tiaanv Are any other CAN bus devices on the same bus?

@tiaanv Are any other CAN bus devices on the same bus?

Nope.

Pi3b → USB → Octopus → CAN → EBB42
CAN bus resistance 63ohm (twisted pair) 143cm

1 Like

Hi,

I have version 1.2 → G0B1.
No other CAN devices on the bus.

1 Like

For those of you that are more fluent in interpreting Klippy logs:
Is it apparent that the loss of communication with ‘MCU’ is in fact because of can error?

I find the ability to diagnose low-level issues with CAN to be somewhat frustrating, as the log does not say much. and even if you look at the actual can stats, it rarely even shows errors or re-transmits… yet the can interface is no longer operative… and can generally be remedied by a firmware restart… no power cycle is required…

Not sure what exactly this indicates, other than it could be a recoverable state if managed by klipper… Of course, this would be a deal killer for the ‘real-time’ timing requirements…

Is there somewhere I can look to see more detail on the exact nature of the errors? I have read that using an Emergency stop within some timeframe after an error will produce additional log data, but in my situation, it is rather difficult to do that, since the print fails at random… and sitting in front of the interface for 8 hours is not very practical…

CAN tools and ifconfig can help to find the best CAN bus speed with no loss of packets or errors.

Install can util

Instructions for installing and information:

Open two terminal windows via ssh
One run cangen, the other run ifconfig about every minute or so, check the loss of packets and error counts.

If they increase then speed, wires or connections have problems.

You can try decreasing the speed.
Moving your wires away from EMF sources.
Check your connections by moving wires at the connectors to see if errors jump up in numbers.

After 2 minutes…
image

no errors, dropped, overruns, nothing…

but:
image

My previous debugging attempts rendered similar issues… This lead me to believe that I can’t debug can in this way… or the issue is NOT with the can interface itself…

Also remember that I am running a Octpus in bridge mode, so not sure what the impact on can debugging will be in that regard…

Either way… this test manifested in EXACTLY what happens during a print…

Edit:

Restarting Klipper then still renders the following:
image

cangen then errors out(expected), since it’s a virtual can device:
image

A second restart command returns me to a working state in klipper.

Edit2:

This is consistently reproducible…

Not sure what to make of it… is there a real CAN error situation, or has klipper just based on some criteria decided that there is an error with the can mcu, and goes into an error state…

Struggling to find useful information on the underlying problem.

Running can at 250k no issues, constant dropout at 500k and everything hangs at 1M.
However, at 250k cannot flash through CanBoot, only through dfu. Weird, thinking more of a software issue.

I can leave the printer at idle for days. The comms between toolhead (EBB42) and controller (Octopus) has no issues, and The temperature of the hotend updates a couple of times per second. CAN stats look healthy (but I’m not sure they report the truth, at least in bridge mode):
image
This is @ 1Mbps

@koconnor is there a way to get more detailed can debug info from the bridge MCU?

Don’t forget you have to reflash CanBoot with the new speed also if it is using the CAN bus.