Does it make sense to actually sync mcu clock speed?

Disclaimer:
There is no problem, it’s just my curiosity and maybe a little perfectionism.

I spent some time looking at the clock sync. My initial thought was, “I see a time deviation on the graph. Can I improve that deviation? Maybe different clock synchronization will make it better.”

I tried to implement the convex hull approach from this paper, and it does not provide any benefits, which confused me. Then after some digging, I realized there is no “synchronization” underneath and maybe it is not needed at all.

The basic, pretty simple idea underneath, is that we compute most time from the lastest sample:

self.clock_est = (self.time_avg + self.min_half_rtt,
                          self.clock_avg, new_freq)

We know the approximate time when MCU has a clock, which is approximately equal to send_time + half_rtt.
This clock was added through linear regression with decay to clock_avg (which looks like a sort of Kalman filter I think). Also, we easily compute the speed of the remote clock by dividing the clock difference between remote clocks and local clocks.

new_freq = self.clock_covariance / self.time_variance

Let’s peek at actual log values (I removed useless field for us):

Stats 876.2: gcodein=0
  mcu:   srtt=0.000 rttvar=0.000 rto=0.025 freq=400021155
  ebb42: srtt=0.000 rttvar=0.000 rto=0.025 freq=64000514 adj=63997558
  host:  srtt=0.000 rttvar=0.000 rto=0.025 freq=50000030 adj=49997628
  print_time=55579.871

So, if we compare data from the graph and from logs, the graph microsecond deviation is simply the difference between the configured MCU clock and the “actual”.
On the graph, we see 50us for the main MCU, which is simply:

400_000_000 / 400_021_155 = 0.999947115
1 - 0.999947115 = 0.000052885 ~ 52us

I may be wrong here, but it feels a little misleading because it is not a deviation from the “predicted” clock, but a deviation from the “perfect/expected” clock rate. Looks like the actual “deviation” is a wiggle on the graph (this is when initialization happens, till clocks stabilize on that specific part of the graph):

I do not fully understand how adjustment happens here, feels like underneath this happens:

64_000_514 * 0.999947115 = 63997129.33
#  This is close to value 63997558 for ebb from the log.
BTW, I'm a little confused by this

We account for the actual clock speed for system time, but print time is computed back with the expected clock speed from the MCU config.
It feels like it can introduce drift with high uptime, but I can be wrong here.

    def print_time_to_clock(self, print_time):
        return int(print_time * self.mcu_freq)
    def clock_to_print_time(self, clock):
        return clock / self.mcu_freq
    # system time conversions
    def get_clock(self, eventtime):
        sample_time, clock, freq = self.clock_est
        return int(clock + (eventtime - sample_time) * freq)
    def estimate_clock_systime(self, reqclock):
        sample_time, clock, freq = self.clock_est
        return float(reqclock - clock)/freq + sample_time
    def estimated_print_time(self, eventtime):
        return self.clock_to_print_time(self.get_clock(eventtime))

Okay, there is a metric, but does it make sense to make it better?
On stm32 there is an HSITRIM register which can be used to adjust the speed of the clock, cause it has a finite resolution, it can be wrapped to a “timer” to trim back and forth N times a second, till the clock will be in sync with expected MCU speed.
Similar to PWM, with cycle time 0.01, we can control on_time to slowdown clocks for several microseconds, and then set it back to normal speed.

  • similar to NTP service, which adjusts local clock speed to sync your computer clock with remote source of time.

On STM32h7, which uses PLL1 as a clock source, there is PLL1FRACR, which can be used in the same manner for the runtime adjustment of clock speed (it is less documented so I don’t know how precise it is).

To put things in perspective, from the above logs. technically time will drift for 1 second after 5.5 hours:

20000 * 0.999947115 = 19998.9423
20000 / 3600 = 5.555

But, because we mostly compute times from the last sync which happens every second there should be no real issues, we still talk about several microseconds deviation here.

Does it make sense to actually sync the MCU clock speed to the host clock/real clock?
I don’t know but looks like mostly there is no real reason for that.

In Klipper all timing is based on the clock specified in the [mcu] config section. It is the “main” clock that everything else runs off of.

One of the challenges when understanding the timing code is recognizing that there is no “correct time”. Everything is relative to a particular clock and we don’t know (and don’t really care) how accurate that clock is.

There’s no reason to suspect that the host clock is a better clock than the mcu clock so we use the mcu clock. (The host times are taken via it’s “monotonic” clock, which is a crystal just like the crystal on the micro-controllers and there is no reason to suspect the host crystal is any better than the mcu’s crystal.) In this regard, the freq= reports in the logs can be thought of as measuring the inaccuracy of the host clock, not as a measurement of inaccuracy of the main mcu clock.

So, as an example, if we had a stepper on the main mcu that needs 100 pulses to move 1mm, the main [mcu] nominally runs at 20,000,000 clock ticks per second, and we move that stepper at 50mm/s, then the stepper pulses are issued at exactly 4000 ticks per step pulse (20000000 / (50 * 100). This timing is done regardless of the host timing and regardless of any host detected changes in timing, as the main mcu is considered the main clock.

Things only get interesting when there are multiple mcus. In this case, the main [mcu] is still considered the main clock, and we need to figure out the drift on each of the secondary mcus. This is what the adj= parameter in the logs is reporting. So, a report of ebb42: ... freq=64000514 adj=63997558 is indicating that the secondary mcu is running fast relative to the host clock (which we don’t care about) and slow relative to the main mcu clock (which is important). So, using the example above, if the stepper was on this mcu, the code would generate step pulses at ~12799.5116 clock ticks per pulse (63997558 / (50 * 100)). That is, we have to schedule the timing so that motors on the secondary mcus are actually running at the same rate as motors on the main mcu.

Things get a little tricky in that the clocks can change over time (and there can also be measurement error). The code to handle that is in the SecondarySync class in clocksync.py. It gets tricky because by the time we measure a change in clock rate, there is already an accumulated deviation and the code not only has to correct for the new clock rate but also has to correct for that accumulated deviation. Again though, the main mcu always has the “right time” and we only ever correct the secondary mcus. When looking at the logs/graphs the only thing important is that the lines remain mostly stable - we don’t want rapid “wiggles” as that would indicate an inability to sync the secondary mcus and/or an inability to actually measure the secondary clocks.

I don’t think that would help. Changing the mcu timing would introduce measurement jitter which would make it harder to measure the actual clock rate. This is the same reason that we don’t use the ntp adjusted clock on the host - we don’t care what the “real” time is, we just want a way of accurately predicting the clock rates. Any adjustments made to the clocks reduces are ability to make accurate predictions.

Cheers,
-Kevin

4 Likes

Thank you very much for your informative and detailed post. I’m no programmer, but I know a bit about hardware. The clock on the host is usually always better regarding jitter and phase noise than the mcu (printer board) clock. If I could choose the main clock for Klipper, I always would choose the host clock.

No! Usually mainboards or RPis use better clock sources.

Just for example

I didn’t read the datasheets of any of those oscillators, but I bet my money on the RPi 5 oscillator.

Best regards,
hcet14

That statement isn’t backed up by fact; You’ll find that your Asus/MSI motherboard uses the same parts and the circuitry as the MCUs used in SBCs and main/toolhead controller boards that are used in 3D printing.

Crystal based oscillator circuits have been used with single chip MCUs since the early 1970s and the science behind them is very well understood. You will find excellent frequency accuracy and miniscule jitter in basically all boards with the defining factor being design/layout.

When I look at the two boards, the rPi 5 crystal is on the backside of the board which means the signals go through a number of layers. Should be okay but I wouldn’t be surprised to see some distortion on the clock waveform at the MCU pins.

But, the main controller board has the crystal right beside the MCU. If it’s a four layer board with the next layer being ground with local load caps, it’s as good as you can get. If you were to look at the clock at the MCU pins, if will be as close to a perfect sine wave as possible.

For all intents and purposes, both clocks are probably equally accurate and either one could be used but there are probably operational/coding reasons why Kevin went with the clock on the main controller board and not the host.

1 Like

In common single mcu setups, using the main mcu as the timing reference means that measurement noise doesn’t impact kinematic timing. (Since all the peripherals are on the main mcu, we can time all movements relative to that main mcu, and any noise in the correlation between host and mcu timing doesn’t alter those movements.)

Even in multi-mcu setups, if the X and Y are on the main mcu then again, measurement error wont impact the vast majority of movements.

Separately, just to be clear, the goal isn’t to find the “right” time, but to accurately predict the clocks relative to each other. Common crystals are accurate to within about 1 part per million. So, if one commands the toolhead to move at 100mm/s and instead it moves at 100.00001mm/s then no one cares. In contrast, if the code can’t detect and account for a one part per million drift between clocks, then a print will be complete “spaghetti” within about 10 minutes of a print. That’s what I meant earlier by, “everything is relative to a particular clock and we don’t know (and don’t really care) how accurate that clock is”. We’re not looking for accuracy, we’re looking for predictability.

Cheers,
-Kevin

3 Likes

@koconnor @mykepredko

Please don’t get me wrong!

When I read
https://klipper.discourse.group/t/lack-of-memory-barriers
or
https://klipper.discourse.group/t/lack-of-memory-barriers
I can hardly follow! I’m sure I get the point, but I’m no programmer!

My point was
rpi@54MHz e.g.

My stupid thought was, if I divide a high running clock (in software) I get a better value, because I make the error of the source clock smaller.

Common MCUs like STM32 have 8MHz clocks and multiply it up to 48MHz. So the error of the source clock will be multiplied.

When you multiply or divide a clock frequency, you’re going to multiply or divide its error proportionately. If you have a 2% error at 8MHz then you’ll have a 2% error at 48MHz when you multiply the clock by 6 and a 2% error at 2MHz when you divide the clock by 4.

Clock multiplication/division is not done arbitrarily, it is done to ensure that the clock is working at the device’s specification.

For example, 48MHz is required as the base speed for USB 1.0 (which actually operates at 12MHz, so the 48MHz is divided by 4 in the USB hardware). As the crystal frequency of the MCU is arbitrary (it doesn’t have to be 8Mhz, that’s what’s in most boards that are used in 3D printers/Klipper) there is hardware which can increase/multiply or decrease/divide the crystal frequency to match the requirements of the MCU’s hardware.

I would never calculate like you did, but I’m too tired and go to bed,

Ah, it doesn’t work that way. Common electronics will use a quartz crystal that resonates at a particular frequency (often somewhere in the range 32Khz - 20Mhz). To get higher speeds the chips will use a PLL to generate a higher frequency from the crystal. Crucially, the accuracy does not improve when doing this - so for example, if a 10Mhz crystal drifts by 8 parts per million (9.999992Mhz to 10.000008Mhz) then a 10Ghz clock generated from it will still have the same 8 part per million drift (9999.992Mhz to 10000.008Mhz).

To actually improve accuracy typically one would deploy a temperature controlled crystal oscillator TCXO or even a temperature and voltage controlled crystal oscillator (VCTCXO). For even more accuracy, one can utilize a GPS clock or atomic clock. None of these are standard on general purpose computers.

Thus, in practice, the raw mcu clocks are just as accurate as the raw general purpose computer’s clock.

That said, the general purpose computers can use the internet and NTP to account for static offsets in the crystal frequency, but that isn’t meaningful in this application.

Cheers,
-Kevin

3 Likes