Klipper overload cause 'verify_heater' shutdown

heaters.py has the following code:

    def get_temp(self, eventtime):
        print_time = self.mcu_pwm.get_mcu().estimated_print_time(eventtime) - 5.
        with self.lock:
            if self.last_temp_time < print_time:
                return 0., self.target_temp
            return self.smoothed_temp, self.target_temp

In case of klipper host overload, timings can be like this > 5s:

Stats 12162.8: gcodein=0  mcu: mcu_awake=0.017 mcu_task_avg=0.000002 mcu_task_stddev=0.000002 bytes_write=1226580 bytes_read=293755 bytes_retransmit=0 bytes_invalid=0 send_seq=24846 receive_seq=24845 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=400021877 host: mcu_awake=0.000 mcu_task_avg=0.000003 mcu_task_stddev=0.000004 bytes_write=2532 bytes_read=9966 bytes_retransmit=0 bytes_invalid=0 send_seq=396 receive_seq=396 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=50000072 adj=49997886 sd_pos=245736 W_dryer: target=0 temp=23.0 pwm=0.000 raspberry_pi: temp=55.1 heater_bed: target=40 temp=40.2 pwm=0.008 sysload=1.06 cputime=196.820 memavail=3605024 print_time=382.526 buffer_time=1.612 print_stall=0 extruder: target=160 temp=159.8 pwm=0.197
Stats 12169.1: gcodein=0  mcu: mcu_awake=0.016 mcu_task_avg=0.000002 mcu_task_stddev=0.000002 bytes_write=1253207 bytes_read=297516 bytes_retransmit=0 bytes_invalid=0 send_seq=25288 receive_seq=25287 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=400021775 host: mcu_awake=0.000 mcu_task_avg=0.000002 mcu_task_stddev=0.000003 bytes_write=2538 bytes_read=9993 bytes_retransmit=0 bytes_invalid=0 send_seq=397 receive_seq=397 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=50000072 adj=49997767 sd_pos=253920 W_dryer: target=0 temp=23.0 pwm=0.000 raspberry_pi: temp=55.6 heater_bed: target=40 temp=40.1 pwm=0.275 sysload=1.06 cputime=203.137 memavail=3602672 print_time=388.870 buffer_time=1.651 print_stall=0 extruder: target=160 temp=160.3 pwm=0.103

My question/request here, in that specific case:
Host Overload → timing go mad → Klipper shows ‘Heater extruder not heating at expected rate’

This is misleading, a nice solution would be not to crash here.
But at least to provide more specific user info here (log warning maybe?)

Thanks


I tried just to drop the checking code, and now I see the normal ‘Timer too close’ error. Looks like now it is mostly unneeded.

Added here: adccmds: Continue to query analog inputs after a shutdown · Klipper3d/klipper@d03cf2b · GitHub
If I shut down MCU by webhook, according to log, I still get current temps from the thermistor.
I2C temperature sensors stop working on MCU shutdown anyway.

Hi,

That code in get_temp() is there so that we don’t mislead users about temperatures if the mcu is disconnected from the host. Otherwise, if a connection to an mcu is lost, the code may continue to report the last temperature seen, which could be confusing to a user. The idea is to report a zero value instead to alert users that there is no connection.

A massive host overload extending over 5 seconds would be quite extreme. In such an extreme failure I’d expect many different error would likely pop up (like lost connection with mcu, timer too close). I guess it’s possible to extend the verify heater code to provide a better hint in this situation, but for what it is worth, it’s really really hard to provide good error descriptions if there are massive timing problems like that. For what it is worth, when things go that terribly wrong, I’d focus on a safe shutdown - one could spend a lifetime developing error message code and still not accurately describe every possible root cause.

Cheers,
-Kevin

1 Like