Synchronisation issue between two mcu

Basic Information:

Printer Model: custom printer
MCU / Printerboard: mcu1:Arduino Due mcu2:Arduino Due
klippy.log (7.4 MB)

Hardware:

  • Host: Raspberry Pi 4
  • MCU: One Arduino Due controlling two z motors and the y motor + one Arduino Due controlling two x motors and two extruders
  • Motors: X/Y clsed loop servo motors, Z NEMA 23 motors with trinamic 5160, E NEMA 17 motors with trinamic 5160

Describe your issue:

Hello, I am having trouble with the synchronization of my two mcu. I discovered that during some of my prints the y axis suddenly lags behind after a while and the print gets “shifted”. To investigate the issue I did some testprints and found out that
it is most “reproduceable” with printing cylindrical/round objects. What happens is that the round print gets oval at some point, but the x and y axis fulfil the whole movement but are out of phase. So I printed a lot of these cylindrical towers with changing various parameters like speed, acceleration, microsteps, etc. but I couldn’t reproduce the error, it seems to happen on random times.
After a lot of testing I found out that when the error occurs in a print (and it starts to print oval) and I cancel the print without homing and then start a new one, the printer is still printing oval. But when I am homing the printer after the error occurs, it starts to print the new printjob round again.
Is there some synchronization happening when homing the printer?

Here is a picture of the error:
left: longer print until the error happens (first round, then suddenly oval)
mid: same gcode started after “left” without homing (oval)
right: same gcode started after “mid” with homing (round)

EDIT: Uploaded the wrong klippy.log

So to test if the issue really comes from the mcu not being in sync, I connected the y axis to the same mcu as the x axis. Now all prints I printed over the last days are perfectly round! (I printed five cylinders).

But now I get “Lost communication to mcu” (the mcu which controls the xy movement) after some prints.
Could it be that the Arduino Due is overloaded?

Hello together, I did a lot of testing in the last weeks so here is an update to this topic.
Since my last reply I tested a lot of things, such as different USB cables, config settings etc.
Nothing changed, the X and Y axis are going out of sync after a while.
Now I wanted to test if the error comes from the motor drivers / servo motors. So I borrowed an oscilloscope from a friend and hooked it up to the step signals of X Y and Z (only two at a time). Then I wrote a small macro which drives the toolhead diagonally and the Z axis down. I recorded the step signals of the axes at the start of the macro. In theory all axis should start at nearly the same time.
Here are my measurements after the printer was homed:

Y is the blue line, X the yellow one


So everything seems fine. (The small time delay of 4.6 ms maybe come from the different rotation distance). The Z axis is also in sync with X and Y.

But now comes the interesting part!
If I repeat the same procedure without homing after a print job where the issue of “oval printing” occurs, the result look like this:

Y is the blue line, X the yellow one

Y is the blue line, Z the yellow one

Z is the blue line, X the yellow one

The step signals of Y are over 300 ms off compared to Z and X. And the really strange part is, that the steps of Y and Z are generated on the same board.

Does anyone have an idea what could cause this behavior?

Here is a new update on this topic.
I purchased two new Arduino Due to check that these are working properly. I made the same tests as described in my previous post. Sadly the printer is again going out of sync and the the step signal of X and Y are not starting at the same time.

I’m not aware of any issues with clock synchronization.

The pictures posted above seem to me to be more likely the result of some mechanical issue than software synchronization.

You can use the graphstats.py -f tool to plot out what Klipper estimates the mcu clock frequency (and its corresponding adjustments). Debugging - Klipper documentation

For example, your log above (after filtering out the reconnects at the start of the log):

The flat lines (consistent frequency estimates no more than a few microseconds/second different over a long time span) indicate everything is working normally.

-Kevin

Hello Kevin,
Thank you very much for your reply!
Yeah the pictures could indicate this, but my measurements with the oscilloscope indicate that the step generation is not in sync.
Thanks for the tip with the graphics tool. And thanks for analyzing the log.

I have some good news, it seems to work now. The step signals of X and Y are now in sync and the toolhead is moving as it should.
I changed the following things (all at once):

I will revert each change step by step and test again to determine where the issue is coming from. Maybe this will help someone in the future.

From a cursory look at the source NTP should not be an issue as Klipper uses the monotonic clock since July 2018. I’m not familiar enough with Klipper’s scheduling to understand why changing the USB connection could affect this. If you can rep this issue via NTP I’d be interested in the details.

Hello together,
I tested the above mentioned points and got the printer reliably to run!
In the end it was the programming port that made problems. As soon as I switched to the native port, everything worked like a charm. Changing the RTC or NTP settings didn’t had any effect.
If you are using an Arduino Due use the native USB port!

Thanks everyone who helped finding the source of the issue.
I will mark this as the solution. Maybe this information will be helpful for someone in the future.