Disclaimer:
There is no problem, it’s just my curiosity and maybe a little perfectionism.
I spent some time looking at the clock sync. My initial thought was, “I see a time deviation on the graph. Can I improve that deviation? Maybe different clock synchronization will make it better.”
I tried to implement the convex hull approach from this paper, and it does not provide any benefits, which confused me. Then after some digging, I realized there is no “synchronization” underneath and maybe it is not needed at all.
The basic, pretty simple idea underneath, is that we compute most time from the lastest sample:
self.clock_est = (self.time_avg + self.min_half_rtt,
self.clock_avg, new_freq)
We know the approximate time when MCU has a clock
, which is approximately equal to send_time + half_rtt.
This clock was added through linear regression with decay to clock_avg
(which looks like a sort of Kalman filter I think). Also, we easily compute the speed of the remote clock by dividing the clock difference between remote clocks and local clocks.
new_freq = self.clock_covariance / self.time_variance
Let’s peek at actual log values (I removed useless field for us):
Stats 876.2: gcodein=0
mcu: srtt=0.000 rttvar=0.000 rto=0.025 freq=400021155
ebb42: srtt=0.000 rttvar=0.000 rto=0.025 freq=64000514 adj=63997558
host: srtt=0.000 rttvar=0.000 rto=0.025 freq=50000030 adj=49997628
print_time=55579.871
So, if we compare data from the graph and from logs, the graph microsecond deviation
is simply the difference between the configured MCU clock and the “actual”.
On the graph, we see 50us
for the main MCU, which is simply:
400_000_000 / 400_021_155 = 0.999947115
1 - 0.999947115 = 0.000052885 ~ 52us
I may be wrong here, but it feels a little misleading because it is not a deviation from the “predicted” clock, but a deviation from the “perfect/expected” clock rate. Looks like the actual “deviation” is a wiggle on the graph (this is when initialization happens, till clocks stabilize on that specific part of the graph):
I do not fully understand how adjustment happens here, feels like underneath this happens:
64_000_514 * 0.999947115 = 63997129.33
# This is close to value 63997558 for ebb from the log.
BTW, I'm a little confused by this
We account for the actual clock speed for system time, but print time is computed back with the expected clock speed from the MCU config.
It feels like it can introduce drift with high uptime, but I can be wrong here.
def print_time_to_clock(self, print_time):
return int(print_time * self.mcu_freq)
def clock_to_print_time(self, clock):
return clock / self.mcu_freq
# system time conversions
def get_clock(self, eventtime):
sample_time, clock, freq = self.clock_est
return int(clock + (eventtime - sample_time) * freq)
def estimate_clock_systime(self, reqclock):
sample_time, clock, freq = self.clock_est
return float(reqclock - clock)/freq + sample_time
def estimated_print_time(self, eventtime):
return self.clock_to_print_time(self.get_clock(eventtime))
Okay, there is a metric, but does it make sense to make it better?
On stm32 there is an HSITRIM
register which can be used to adjust the speed of the clock, cause it has a finite resolution, it can be wrapped to a “timer” to trim back and forth N times a second, till the clock will be in sync with expected MCU speed.
Similar to PWM, with cycle time 0.01, we can control on_time to slowdown clocks for several microseconds, and then set it back to normal speed.
- similar to NTP service, which adjusts local clock speed to sync your computer clock with remote source of time.
On STM32h7, which uses PLL1 as a clock source, there is PLL1FRACR
, which can be used in the same manner for the runtime adjustment of clock speed (it is less documented so I don’t know how precise it is).
To put things in perspective, from the above logs. technically time will drift for 1 second after 5.5 hours:
20000 * 0.999947115 = 19998.9423
20000 / 3600 = 5.555
But, because we mostly compute times from the last sync which happens every second there should be no real issues, we still talk about several microseconds deviation here.
Does it make sense to actually sync the MCU clock speed to the host clock/real clock?
I don’t know but looks like mostly there is no real reason for that.