Fill out above information and in all cases attach yourklippy.logfile. Pasting yourprinter.cfgis not needed
Describe your issue:
Occasionally I have issues with my canbus setup and I’m not sure if it’s a software or hardware problem. Sometimes I get an error during printing and sometimes after some hours of idling. I attached the log of that day. I cut away the stats at the beginning and end. The error should be around here:
b’Got error -1 in can read: (19)No such device’
Timeout with MCU ‘can0’ (eventtime=190242.397807)
Transition to shutdown state: Lost communication with MCU ‘can0’
I’m not sure if I am clouding the OP’s issue, but I have very similar issues. After spending a LOT of time on getting CAN stable. Everything is mostly fine, but I have lost faith in long prints. On the attached log, the print errored out after a number of hours. (5-10… not sure. was not monitoring). On idle things seem fine, but my last print, which had a cylindrical shape (so many steps), I got timeouts and stopped prints. klippy.log (354.2 KB)
Looking at the logs, I’m not even sure CAN is at fault. I might get flamed for not having the correct details.
I’m doubtful this is a hardware error… the fact that the print can run for an extended time (hours, without any issue), indicates that something goes wrong somewhere in the software stack. That might very well be a naive statement, but in my experience hardware issues typically manifest much faster.
ditto on allow-hotplug. This setting is really to allow the canbus to gracefully com back online when firmware_restart is issued AFAIK.
For ref, here is my interface def:
allow-hotplug can0
iface can0 can static
bitrate 1000000
up ifconfig $IFACE txqueuelen 1024
For those of you that are more fluent in interpreting Klippy logs:
Is it apparent that the loss of communication with ‘MCU’ is in fact because of can error?
I find the ability to diagnose low-level issues with CAN to be somewhat frustrating, as the log does not say much. and even if you look at the actual can stats, it rarely even shows errors or re-transmits… yet the can interface is no longer operative… and can generally be remedied by a firmware restart… no power cycle is required…
Not sure what exactly this indicates, other than it could be a recoverable state if managed by klipper… Of course, this would be a deal killer for the ‘real-time’ timing requirements…
Is there somewhere I can look to see more detail on the exact nature of the errors? I have read that using an Emergency stop within some timeframe after an error will produce additional log data, but in my situation, it is rather difficult to do that, since the print fails at random… and sitting in front of the interface for 8 hours is not very practical…
CAN tools and ifconfig can help to find the best CAN bus speed with no loss of packets or errors.
Install can util
Instructions for installing and information:
Open two terminal windows via ssh
One run cangen, the other run ifconfig about every minute or so, check the loss of packets and error counts.
If they increase then speed, wires or connections have problems.
You can try decreasing the speed.
Moving your wires away from EMF sources.
Check your connections by moving wires at the connectors to see if errors jump up in numbers.
My previous debugging attempts rendered similar issues… This lead me to believe that I can’t debug can in this way… or the issue is NOT with the can interface itself…
Also remember that I am running a Octpus in bridge mode, so not sure what the impact on can debugging will be in that regard…
Either way… this test manifested in EXACTLY what happens during a print…
Edit:
Restarting Klipper then still renders the following:
cangen then errors out(expected), since it’s a virtual can device:
A second restart command returns me to a working state in klipper.
Edit2:
This is consistently reproducible…
Not sure what to make of it… is there a real CAN error situation, or has klipper just based on some criteria decided that there is an error with the can mcu, and goes into an error state…
Struggling to find useful information on the underlying problem.
Running can at 250k no issues, constant dropout at 500k and everything hangs at 1M.
However, at 250k cannot flash through CanBoot, only through dfu. Weird, thinking more of a software issue.
I can leave the printer at idle for days. The comms between toolhead (EBB42) and controller (Octopus) has no issues, and The temperature of the hotend updates a couple of times per second. CAN stats look healthy (but I’m not sure they report the truth, at least in bridge mode):
This is @ 1Mbps
@koconnor is there a way to get more detailed can debug info from the bridge MCU?