I’m bringing up a custom board design and one of the boards periodically crashes. In the latest, attached klippy.log you can see that the MCU clock is apparently going down to 1MHz which causes the Klipper connection to break.
Am I interpreting this correctly?
I have three other identical boards running but with a Raspberry Pi CM4, Orange Pi CM4 and BTT CB1 and none of them are presenting the same issues.
I will try with a different host but I was wondering if anybody had any comments beforehand.
It’s hard to tell exactly what went wrong, but the Klipper host code definitely got very confused by the clock reports from the mcu. It could certainly be a bad crystal (or similar timing issue) on the mcu.
Just so you know, I’ve been talking to @sineos about this separately.
Since the initial post, I’ve had seven more crashes of this board/host combination. None of the `klippy.log’ files generated after each of these subsequent crashes shows the same MCU timing problem.
One thing I didn’t put in the initial post (and should have now that I have more experience here) is that there is a heartbeat LED signal that flashes twice every second and I see that when there is a crash, this is stopping with the LED generally off (sometimes on) like the CB2’s CPU has stopped.
My plan is to add a wire on the CB2’s reset pin and see what happens if I force a reset - right now I’m pulling the power from the board (which has the CB2 mounted on it). The reason for doing this is to see if it’s a main controller board MCU (the STM32G0B1) failure - the thinking is that if I reset the CB2 but leave everything else on the board as is then I can isolate whether or not the problem is with the CB2 or the main controller board and it’s MCU.
I guess it’s possible that something is wrong with the timing on the Linux machine. However, if that were the case I’d have expected that Linux itself would report tons of errors in its logs. I suppose that there could be a power management mode that is altering the Linux clock (though it’s not supposed to alter the “monotonic” clock rate). Alas, I don’t have any good suggestions.