I was able to take a look at this issue. Unfortunately, I have been unable to track down the root cause.
The logs seem to indicate a timer rollover issue in the low-level host c code. The error does not appear to be a delayed message being sent at a later time.
In particular, all the logs I’ve seen (4 logs - 3 here and 1 from discord) show the same pattern. There is a tsync_state message from the main mcu (shown here after post-processing using logextract.py):
Receive: 98 1150213.712746 1150213.708287 14: seq: 10(13791), trsync_state oid=16 can_trigger=1 trigger_reason=0 clock=1577674306(1150213.711630)
The code in klippy/chelper/trdispatch.c correctly responds by creating a trsync_set_timeout message for the SHT36 mcu:
mcu 'SHT36': Sent 99 1150213.712969 1150213.712783 12: seq: 11(4465), trsync_set_timeout oid=2 clock=2877267925(1150205.477910)
However, the time sent is ~8.259 seconds less than what it should have been. 8.259 seconds is nearly 2**32 clock ticks on the main mcu.
So, in the extracts above, it seems that trdispatch.c should have converted clock 1577674306 to time 1150213.711630 (as logextract.py does in the snippet above), but instead it seems to convert it to time 1150205.452910 . The mcu then correctly raises an error when it receives the incorrect command.
All the logs show the same high-level pattern.
Unfortunately, I can see no reason why the trdispatch.c code would do that. The code seems correct, this code hasn’t changed for years, and the same code is run on all homing/probing attempts on every printer running Klipper.
You might want to try wiping out the entire klipper/ directory and recloning the code (eg, rm -rf ~/klipper/ ; git clone https://github.com/Klipper3d/klipper ; sudo service klipper restart). I guess it might be possible that something corrupted the chelper build and rebuilding from scratch may clear the issue.
If you are comfortable with programming, you could also try adding debugging information to the trdispatch.c code. Something like the following may help:
--- a/klippy/chelper/trdispatch.c
+++ b/klippy/chelper/trdispatch.c
@@ -88,8 +88,23 @@ handle_trsync_state(struct fastreader *fr, uint8_t *data, int len)
}
// mcu is still working okay - update last_status_clock
+ struct clock_estimate old_ce;
+ memcpy(&old_ce, &tdm->ce, sizeof(old_ce));
+ uint64_t old_clock = tdm->last_status_clock;
serialqueue_get_clock_est(tdm->sq, &tdm->ce);
tdm->last_status_clock = clock_from_clock32(&tdm->ce, clock);
+ if (tdm->last_status_clock <= old_clock)
+ errorf("tr_state %p: lsc=%lld ce=%lld,%lld,%.6f,%.3f"
+ " o=%lld oce=%lld,%lld,%.6f,%.3f"
+ , tdm, (long long)tdm->last_status_clock
+ , (long long)tdm->ce.last_clock
+ , (long long)tdm->ce.conv_clock
+ , tdm->ce.conv_time, tdm->ce.est_freq
+
+ , (long long)old_clock
+ , (long long)old_ce.last_clock
+ , (long long)old_ce.conv_clock
+ , old_ce.conv_time, old_ce.est_freq);
// Determine minimum acknowledged time among all mcus
double min_time = PR_NEVER, next_min_time = PR_NEVER;
@@ -222,5 +237,8 @@ trdispatch_mcu_setup(struct trdispatch_mcu *tdm
tdm->expire_ticks = expire_ticks;
tdm->min_extend_ticks = min_extend_ticks;
serialqueue_get_clock_est(tdm->sq, &tdm->ce);
+ errorf("tr_mcu_setup %p: ce=%lld,%lld,%.6f,%.3f"
+ , tdm, (long long)tdm->ce.last_clock, (long long)tdm->ce.conv_clock
+ , tdm->ce.conv_time, tdm->ce.est_freq);
pthread_mutex_unlock(&td->lock);
}
If changing the code, be sure to do a full restart of the Klipper host software (sude service klipper restart).
Maybe that helps a little,
-Kevin