Extruder thermal runaway without Klipper noticing

Basic Information:

Printer Model: diy printer
MCU / Printerboard: mcu1:Arduino Due mcu2:Arduino Due
klippy.log (2.7 MB)

Describe your issue:

Hello together,

today I started a test print which didn’t went the right direction. At some point I noticed some odd smell and realized that the PETG I was printing with is coming out the nozzle completely molten and was dripping on the printbed.
I aborted the print immediately. After having a look in the klippy.log I saw that in line 1681 the extruder stopped heating for two lines. After that it started again heating but the temperature didn’t rise. Klipper kept heating and also increasing the pwm signal to 1.000, so heating with full power.
The temperature was constantly at 239.9°C (target 240°C), so it seems like the temperature was not being updated.
Does someone know if Klipper is checking if the heater is getting hotter if it increases the power?

Thank you all in advance.

Klipper does check to see if the temp is within range, see the [verify_heater] section in the config reference to see the default settings.

Also, I notice that you have the extruder max_temp set to 550C, which seems a tad on the high side.

Having said that though, verify_heater would not be able to do anything, as the cause of the fault was the klipper was not able to read the extruder temp from the thermocouple, and it shut down accordingly. At least, that’s what the log indicates

Thanks for the reply. I checked the [verify_heater] section before and in my case it wouldn’t have catched the error.

The 550°C are on purpose, I experiment with high temperature polymers. But thanks for checking.

But if Klipper couldn’t read the temperature of the thermocouple, why is it then still heating? Klipper keeps heating until line 3004, then I aborted the print manually.

I’m sorry, I should have paid more attention to what you had written, my fault.

As you said, the mcu to which your thermocouple is attached was reporting the extruder temp to be 239.9 all the way from line 1577, with the blip you mentioned at 1681, all the way until you shut it down. And so that means that the only safety net left is [verify_heater], but it didn’t kick in. The documentation says

Specifically, the temperature is inspected once a second and if it is close to the target temperature then an internal "error counter" is reset

But it doesn’t detail exactly what “close” means, but I guess that we can deduce that it is > 0.1C.

In any case, I think the question I would be asking is why does the MCU keep reporting 239.9, even if the thermocouple has gone offline?

Sorry I couldn’t be of more help, hope that you get it sorted

Yeah no problem.

I think that a temperature difference of 0.1°C is seen as “close to target” and therefore the “error counter” is being reset (didn’t check it in the code, please correct me if I’m wrong).

Regarding the not updating thermocouple temperature, this question I also asked myself, but I don’t have an answer. Anyway, I think it would be good to check if a heater is heating up if the commanded power (aka PWM duty cycle) is being increased. This could also be beneficial for other heater and temperature sensor combos, also if the temperature is being updated. A little redundancy wouldn’t hurt.

The thermocouple code was changed not too long ago to improve immunity to intermittent “data dropouts”. I tested quite extensively the 31865 code but not the 31856. I have no idea how the 31856 might differ from 31865 in its thermocouple fault detection ability and accommodation, but if you have the time and the desire you might want to compare the code and the two data sheets.

You could also do a “brute force” test by trying to short the thermocouple input right at the 31856, open circuit it or even jump it through a resistor to induce a reading shift. The results in the data would illustrate empirically what the behaviour is. I wonder if the ASIC just keeps sending the last good value while it’s indicating a fault and perhaps the fault is not being caught by Klipper?

Unfortunately I do not have the time myself to try to decipher the code, but I did look at it quite extensively in the past:

thermocouple: implement sensor fault tolerance by ReXT3D · Pull Request #5627 · Klipper3d/klipper (github.com)

1 Like

Thank you for your reply. I will compare both chips and will also do a “brute force” test.
I will also take a look into the code. If I make any discovering I am going to update you.

As mentioned previously, the temperature of 239.9 was considered “in range” (as set by verify_heater hysteresis parameter which defaults to 5) and thus verify_heater never engaged. The reason the heating rate continually increased is due to the “integral” parameter of the PID which caused the heating rate to continually increase in an attempt to change the temperature from 239.9 to the target temperature of 240.

The root cause of the failure seems to be a missing fault notification from the MAX31856 device. It’s unclear why that happened. Your log has several other successful fault detections from that device. There doesn’t seem to be anything interesting in the log at the point where the device gets stuck at 239.9 .

-Kevin