CAN network failures - M8P CB2 EBB42

You could mechanically “excite” the CM2 board by tapping/pulling/pushing/heat/cool on it while the toolhead thinks it is printing. If you can find a physical input that trigger a fault you have a smoking gun.

Also visual inspection (with magnification) could identify a bent pin or a bad solder joint.

If you had a full schematic and an oscilloscope you look at the signal at the CPU and MCU and compare… but that’s overkill unless you are doing failure analysis for BTT

The network stack doesn’t know one branch is not using transceivers. It treats all communication errors the same.

That was my point. It would take a fault on the SPI connection to physically disrupt the CAN connection to the M8P. Any disruption of the physical CAN transceivers/network wouldn’t effect the other branch.

I have 6 hours left on the job it’s printing now, When it completes I’ll pull the Can cable off the EBB42, then firmware reboot, wait for the connection error, reconnect and firmware reboot again to clear it. I’ll capture pics of the errors that come up and another klippy log. Happy to do the CANBUS capture at any point if it’ll help anyone. I’ll post when it’s done to show the difference between a physical CAN error and whatever this is on my setup.

Having built several iterations of the wiring multiple times, swapping them out regularly, I’ve had extensive experience in troubleshooting physical connection issues. These errors just scream software/firmware to me. My next step is to borrow/build an oscilloscope to check the wave-forms for EMI or dig into CAN enough to troubleshoot at the protocol layer. Let me know if there is anything else I should do.

Once it’s down I’ll pull the boards and take a look under the microscope, it’s not super powerful, but good enough for SMC soldering.

I may do the osciloscope thing if the orangePi doesn’t fix it, as I need to know if this is a hardware issue for me, or a software issue. Our setup will require this kind of board combo and if the software causes a hard fail that makes it a no-go for us.

I’ll try to also see if I can find the other post that got me onto the power cycle requirement to fix this issue as a way to distinguish it from all the other errors I have now tracked down and fixed. These are now my only untraceable errors in any of our configurations.

Consider:

  • The error in the log “MCU ‘mcu’ shutdown: Missed scheduling of next digital out event” is similar to Timer too close regarding the potential causes and solutions
  • The error is raised against the MCU, not against the EBB
  • There is nothing in the log, indicating CAN errors

Looks like a regular hardware instability.

The 2 big problems with a “regular” hardware instability is that the error is happening on the physical connection between the M8P and the CB2, so if there is a physcial defect on one of the 2 boards or the connectors, there is no cable involved just the Pi CM connectors, that may make partial sense. The second issue is that when this error happens nothing will fix the error other than a hard power cycle. Accepting that there is a physcal connection fault between the 2 boards, why is it not corrected by a firware restart like a “normal” CAN error? If this is just the “normal” failure mode for the USB to CANBUS bridge mode, is this a defect, or was the system designed to hard fail when this portion of the CAN network is disrupted and this is a ”feature”?

If this is a “normal” hardware instability, I will need to track down what in klipper itself is causing this failure mode.

As I stated above, I have gone through that post and have done everything short of using an oscilloscope to measure the waveforms at the CB2 and the STM32 chip on the M8P. Since this is on a branch that has no wiring, not sure what else I can do to fix anything stated in that post. If you see anything out of spec as far as stats in the file, please point them out to me, I don;t see anything that is even remotely close to slowing down either with cycles/memory/or bandwidth. The EBB may be hot, but since it isn’t the cause of the error, as you stated, that’s not the issue. The M8P isn’t even remotely hot.

I am more than happy to help troubleshoot in anyway to show this is not a “normal” hardware issue.

You are misinterpreting the error.

Your actual error is Missed scheduling of next digital out event. The reasons for this error are similar to the TTC error.

The key message in the TTC error is true for your error as well:

  1. In the first place, the error means nothing more than “Hey, I’m the klippy host and I have been waiting for a call from you (the MCU). You did not call and now I’m disappointed and won’t talk to you any more.”
  2. The knowledge base entries give pointers to what could be the reason for this behavior.
  3. Klipper just does not know and thus cannot put anything in the logs.
  4. There is no indication that it is CAN or EBB related.

Conclusion:

  • Something in your local setup is causing this. Again, refer to the KB for potential contributors and reasons.
  • You need to identify “something.”
  • In a recent post here, a neon tube installed right over the printer was emitting so much electrical dirt that the communication was sufficiently disturbed.

Since this connection is between the M8P and the CB2 that are physically mounted on each other, what could possibly be causing this failure.

This still doesn’t solve the second issue, and in fact makes it worse. If something is interferring with the CAN connection between these devices, why does a firmware restart not fix the issue, but a physical power cycle does?

Something is making Klipper fail in a way that no form of software restart clears the error. If this is the case, how do I diagnose the cause of this failure mode as I have hade hundreds of “normal” CAN errors that are cleared by a software reboot once the connection is re-established.

Can I make a couple of comments.

First off, I saw this statement:

This is incorrect.

The connection between the CB2 (in your case, a “CM4 equivalent” for other devices) and the MCU is USB. There is a USB 2.0 Hub on the M8P with the CB2 providing the upstream data and the M8P’s MCU (as well as USB peripherals) connect to the Hub as standard USB devices.

The USB connection is generally very solid and the only time you may have problems if the CB2 isn’t properly seated in the two 100pin connectors. It really isn’t worth you thinking about ‘scoping out the lines because a) they’re probably not easily accessible and b) adding a probe to the lines could cause problems with their termination and cause unrelated problems.


You wrote:

This is something I would like to understand better. What do you mean by “monkey with something”?

If you’re moving the toolhead or gantry and you lose the CAN connection then you need to investigate that.

You might want to share images of your connections. One of the things that I highly recommend is to have stress relief on your CAN Bus cables and make sure there is no flexing at any of the EBB42 CAN Bus connectors (solder joints/crimped connections can break over time with flexing).

I see that @Sineos just replied and, as always, he has good advice.

The only thing I can think to add to it is to experiment a bit and see if you can do something that causes the error to happen repeatedly. From there, we should be able to figure out what’s going wrong.

Yes, the physical connection if over an SPI connection, but the communication between the host and MCU is via the CAN protocol over that SPI connection. Physical layer is SPI, protocal layer is CAN.

Either way, this makes the phyaical CAN connection to the M8P over those connectors. Nothing in my wiring to the EBB should be able to effect this connection.

No.

The connection between the Host and the MCU is USB and, when you’ve set up CAN through the M8P, it’s through the Geschwister Schneider USB/CAN Device:

You can see it when you do an lsusb like I am doing here with one of my CAN Bus enabled machines with an M8P:

I don’t know where the idea that it’s an SPI connection came from.

@mykepredko When I said “monkey with it” I mean disconnecting/connecting devices. Physically pulling the CAN connectors from devices.

I did forget to mention in the above post that splice joints on the multi-device cables are encapsulated after they are verified as good. I can get the encapsulation material if anyone needs it. It’s from MG, just not sure exactly which one off the top of my head.

@

It was from @cardoc originally.

So it’s a software emulation of a CAN connection over USB.. Either way, this doesn’t change the fact that there should be no way to interfere with that connection.

This would once again lead me to believe that there is an error somewhere in the BTT fork for in the CB2’s firmware.

A software emulation completely removes any chance of a physical disruption of only the CAN network causing a communication failure with the M8P.

No it’s not. It’s a USB Device that provides CAN protocol services over USB.

This isn’t BTT software. BTT has nothing to do with it.

I’m looking at your previous post and the obvious pride you took in making your cables but I think you need to take a hard look at your setup and see if there’s any possible loose or intermittent connections, any chance for induced voltages/currents, any chance that you have a power sag or surge (ie when all three steppers plus the extruder heater on the CAN Bus power line are active).

If you read my past responses to people, you’ll see that at this point I usually insist on seeing a wiring diagram and photographs - I know you feel that you have made good connections but another set (or more) of eyes looking at your system doesn’t hurt.

I still do not understand why you are so fixated on CAN:

  1. “Something” stopped your system dead in its tracks
  2. From the log, it originates from the MCU with no indication of CAN involvement
  3. For whatever reason, your system requires a hard cycle to come back to life
  4. Your main MCU is effectively communicating via USB, just not via the USB protocol but transparently via the CAN protocol
  5. If the crash somehow left your MCU floating dead in the water, then no connection is possible, but this is still not related to CAN

Note:

  • I’m not saying it cannot be related to CAN.
  • I’m saying from the logs, nothing points to it
  • I’m equally not stating that Klipper cannot have bugs. It sure can have, but your system is “sufficiently ordinary” that it is quite likely that we would be swamped with reports if there was something systematic
1 Like

@mykepredko I am under an NDA and can’t post pictures of devices. Besides that, they are not involved in this error as they are not involved in the connection between the CB2 and the M8P.

I have looked though the specs on the device at Geschwister Schneider and candleLight from what I see on that page it sets up a virtual “software emulated” CAN connection. It passes those packets over the USB connection in the payload of the frames it uses. This “device” is emulated both by the firmware on the MCU and by the linux installation on the host. There is no physical device that does CAN over USB.

I will again point out that this error, as confirmed by @Sineos, has nothing to do with the EBB. The only “cabling” and “connector” between the CB2 and the M8P are the traces on the boards and the 2 100pin connectors on said boards.

I’m not fixated on CAN necessarily, it’s just the only error in the logs, and it’s the one that persists after reboots. The CAN errors might just be a symptom of something else, if they are, how do I track that down as nothing is appearing in the klipper logs. Honestly not sure where else to look as everything else in the logs is smooth until the CAN errors show up.

If something else is killing the MCU and the CAN error is a symptom, what is the next troubleshooting steps? What would indicate that it’s not a CAN problem, but something else? You say my MCU has failed and that the host is waiting for a response that never comes, what would the error be if there was actually a CAN failure?

I take that back, digging in a bit more it looks like there is a physical device in the path and software emulation may not be happening.

Either way, it’s CAN protocol information traveling over the physical USB connection. This doesn’t change anything about the error not being caused by a physical disruption in the CAN network with the EBB.

Edit for clarity.