So just as a brief update - I had similar issues on mainline klipper (and obviously I couldn’t actually print anything with this since my extruder stepper driver is not currently supported). Out of frustration I gave up, switched to USB and… had the exact same issues. So this is probably a multiMCU homing issue moreso than a can-bus issue per se, and CAN just happens to be how most of us implement multimcu homing for the first time.
I remembered a while back I was having timing issues when klipper was using all 8 cores (4 big, 4 little). I did in fact limit klipper’s affinity to the 4 big cores at one point, but at some point I must have updated the OS or something and that was lost. I reimplemented that affinity change and overall things improved significantly, but once in a while I was still getting the timeouts. So now I further limited things to two cars that share a hardware clock - I am hoping that solves it. May just try limiting to a single core at some point as well.
To change the affinity for the klipper service, add the line
CPUAffinity=[CPUs] to your /etc/systemd/system/klipper.service file in the [Service] section. In my case I’m using
CPUAffinity=6 7
Which limits the klipper service to cores 6 and 7. Obviously different setups may need different configurations, and different setups might have different issues altogether.
tl;dr – multimcu homing with a 25ms window is hard, there’s probably a million things that contribute to timing issues. Limiting CPU affinity helped in my case.
Edit 2/8: Just for more info, I’ve done several prints and haven’t had a homing issue yet since making the above change. This is the longest I’ve gone without having a homing fail.