In general, I’d like to see an analysis of the root cause of the problem, before changing the code; otherwise “fixing symptoms” often results in unanticipated regressions.
I think this issue has been discussed into oblivion at this point. After 11 months the only thing that works for everyone without any side effect that I have read in all this time is increasing TRSYNC_TIMEOUT to 0.05. It has gotten to the point there is now a script to keep the change even after a klipper update.
Either allow changing this parameter via printer.cfg if possible with a big warning about it in documentation or apply the change permanently, please. About to just throw this CANBUS into the garbage because every time I think I have ‘fixed’ it. It simply returns.
The discussions are mostly on Discord like the Voron discord. Some on other forums like TeamFDM.
At least let those that do have the issue to change the setting without making the install dirty. That is all I am asking.
Edit: I just tagged 9 people on discord that have reported this issue on new builds in the last month. Hopefully some will chime in.
This has already all been done in the last 11 months.
The TLDR of it has been.
The conciseness is if you have dropped packets you have a problem if you don’t have dropped packets you need to increase TRSYNC_TIMEOUT via script.
Of course I am one of the ones that has tried all of these things and still get this issue.
Half of these “fixes” just seem like workarounds just trying to shave a few milliseconds off to get under the requirement and not actually fixing an underlying issue. Most of the time trying these fixes only reduces the likelihood and does not actually fix anything. Redoing the wire crips is the only real fix I see here and is usually recommended if you have dropped packets.
Seems to prefer BTT boards but that is probably just be because they are more popular.
Switching to 32-bit OS sometimes fixes it.
Overclocking the Pi sometimes fixes.
Re-doing the wire crips sometimes fixes it.
Disabling the camera service sometimes fixes it. This is to reduce load on the Pi
Sometimes switching from official candlelight firmware to btt fork fixes it.
Sometimes none of these fix it.
Sidenote: I miss back in the day when there was 1 niche forum where everything was instead of nowadays where everything is spread across Reddit, Discord, Facebook, and several Forums. At least back then everyone was on the same page.
The most frustrating about the situation is that it’s hard to impossible to troubleshoot the problem. Most replies are also focused on fixing problems with the can network, ignoring reports that the can network has been tested and no faults were found.
Searching further I’ve also found reports of this exact problem from non can systems, running 2 mainboards over usb. Notably old voron builds and builds running ERCF.
I almost forgot. There is 2 other fixes for those with a stable CANBUS connection I have seen a few reports of. Switching from an ARM based SBC to an x86 based one. Like an Intel NUC. Perhaps the common thread for this issue is the Pi4B being borderline on its response latency. Also seen one report of someone using a custom kernel on the Pi that is build for latency sensitive applications resolving the issue.
I don’t have these problems on either of my printers. One is a 2.4 with CAN boards on the A and B motors as well as the toolhead. The other is a switchwire with just a toolhead board.
Most of the people I see having these issues seem to be using the U2C. I use the MKS CANable clone running a recent version of candlelight I built from source. I’m not saying the USB to CAN devices other people are using are the problem, but I simply can’t reproduce them with my hardware.
This is exactly the point:
Not knowing what the root cause is and introducing just another band-aid to cover it up is rarely a wise choice. Especially since this band-aid may have unwanted consequences on the homing precision.
You may be someone who is experienced and who knows what you are doing.
Most users just see some setting and play around with it. If needed or not, it doesn’t matter. If they know the consequences or not, doesn’t matter. If they can later troubleshoot the consequences, doesn’t matter.
And if it is mentioned in the documentation doesn’t matter at all, since it is not read.
Well unless someone is planning on paying an electrical engineer with CANBUS experience to diagnose then I don’t think root cause is going to be determined. We are just kicking the can down the road at this point. I no longer believe this is a hobbyist solvable problem. We have reached a knowledge wall. We need $10,000 oscilloscopes and bachelor degrees to proceed.
Seeing as this is going nowhere fast I guess we have no choice but to use a sledgehammer to swat a fly.
Switched my Pi4B out with a ZimaBoard 432… over 1,500 bed mesh probes later I would say my issue is resolved. I think this confirms the Pi4B is borderline on being able to run CANBUS reliably at the default trsync_timeout especially if you have all the bells and whistles turned on. Klipperscreen, Webcams, mobileraker, LED Effects, etc. Even a 2Ghz overclock did not seem to help.
I would just like to add… EVERYTHING is faster on the Zimaboard. KlipperScreen is snappier, Webcam has a more consistent frame rate. LED effects does not have dropped frames when things get busy.
Anyone else that wants to do this here is how you set up the CAN interface on Ubuntu Server.
Here is how to rotate the display if you are using KlipperScreen. I am using the WaveShare 5.5" AMOLED.
rotate the terminal sudo nano /etc/default/grub
You will need to export DISPLAY=:0 then run xrandr to see what the Identifier is. I am using mini-displayport to HDMI so it shows up as HDMI. sudo nano /etc/X11/xorg.conf.d/10-monitor.conf
Did you ever try running the bus at 1M as the Klipper docs recommend, or did you only test at 500k as your Ubuntu configuration indicates you’re using now? I’ve only ever run my bus at 1M, and I’m using CB1s which are less powerful than the Pi4.
Only used 500k. When probing bed mesh or QGL the CANBUS usage is only at 23% utilization max at 500k. If not using 1Mil was the issue then I would still have the issue even after switching to ZimaBoard. Input Shaper actually uses 153% CANBUS utilization at 500k and I never had a problem with that. I have been printing A LOT since fixing this. Issue is definitely resolved. My take away is the Pi 4B has latency issues with CANBUS that has an increasingly likely chance of causing problems the more additional services you add. Either this is a architectural issue with the ARM CPU or a kernel optimization issue with ARM. Or the CPU does not have enough cache. Pi4B with 1MB L2 and ZimaBoard 432 with 2MB L2. BTT CB1 has 1MB of L2 so maybe not. People that have this issue usually fail to mention what host they are running. Maybe they are all Pi4B users. I did try running rpi-update a while ago to try and fix the issue with no luck.