Fatal Errors not so fatal ....?

smccully · March 8, 2024, 2:11am

Recently had a few long prints fail due to various drive issues. The latest was due to extruder being OverTempWarning.

While this is something that should be handled manually, would it make sense to have the option to not kill the print and instead pause the print to allow for a manual intervention to lower the chamber temp, reduce drive current, etc? Not sure if something that has been discussed before.

Would be willing to investigate and create a PR, if this is something that would make sense. Allowing the user to define a given pause macro under certain error conditions, instead of hard failing.

Obviously I can see the flip side in not wanting to do this to not cause serious electrical malfunctions as well.

TMC 'extruder' reports DRV_STATUS: 001e0101 otpw=1(OvertempWarning!) t120=1 cs_actual=30
Stats 234583.7: gcodein=0  mcu: mcu_awake=0.005 mcu_task_avg=0.000011 mcu_task_stddev=0.000011 bytes_write=3042467 bytes_read=2139857 bytes_retransmit=9 bytes_invalid=0 send_seq=141309 receive_seq=141309 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=16 upcoming_bytes=0 freq=179999963 BTT_EBB42: mcu_awake=0.004 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=3790895 bytes_read=1661914 bytes_retransmit=0 bytes_invalid=0 send_seq=127199 receive_seq=127199 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=32 upcoming_bytes=0 freq=63998882 adj=63998810 sd_pos=236596 MellowSB2040v2: temp=23.2 heater_chamber: target=50 temp=47.8 pwm=1.000 heater_bed: target=110 temp=109.9 pwm=0.456 sysload=0.37 cputime=12859.061 memavail=3266472 print_time=2681.766 buffer_time=3.605 print_stall=0 extruder: target=260 temp=260.1 pwm=0.302
Stats 234584.7: gcodein=0  mcu: mcu_awake=0.005 mcu_task_avg=0.000011 mcu_task_stddev=0.000011 bytes_write=3042804 bytes_read=2140291 bytes_retransmit=9 bytes_invalid=0 send_seq=141334 receive_seq=141334 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=179999966 BTT_EBB42: mcu_awake=0.004 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=3790999 bytes_read=1662066 bytes_retransmit=0 bytes_invalid=0 send_seq=127205 receive_seq=127205 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63998881 adj=63998825 sd_pos=236596 MellowSB2040v2: temp=23.3 heater_chamber: target=50 temp=47.8 pwm=1.000 heater_bed: target=110 temp=109.9 pwm=0.456 sysload=0.37 cputime=12859.097 memavail=3239032 print_time=2681.766 buffer_time=2.604 print_stall=0 extruder: target=260 temp=260.1 pwm=0.287
TMC 'extruder' reports DRV_STATUS: 001e0000 cs_actual=30
Stats 234585.7: gcodein=0  mcu: mcu_awake=0.005 mcu_task_avg=0.000011 mcu_task_stddev=0.000011 bytes_write=3043690 bytes_read=2140779 bytes_retransmit=9 bytes_invalid=0 send_seq=141361 receive_seq=141361 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=179999963 BTT_EBB42: mcu_awake=0.004 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=3791277 bytes_read=1662196 bytes_retransmit=0 bytes_invalid=0 send_seq=127212 receive_seq=127212 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63998878 adj=63998815 sd_pos=236663 MellowSB2040v2: temp=23.3 heater_chamber: target=50 temp=47.8 pwm=1.000 heater_bed: target=110 temp=109.9 pwm=0.529 sysload=0.37 cputime=12859.124 memavail=3508396 print_time=2683.457 buffer_time=3.295 print_stall=0 extruder: target=260 temp=260.1 pwm=0.323
TMC 'extruder' reports DRV_STATUS: 001e0101 otpw=1(OvertempWarning!) t120=1 cs_actual=30

Sineos · March 8, 2024, 7:19am

Personally, I’m not in favor of such ideas at all:

A hardware error is a hardware error
It should not have happened in the first place and points to an underlying problem
Potentially leaves the printer in an undefined state, e.g. power to the motors cut thus needing rehoming etc.

WRT to your example:

AFAIK otpw=1 is only reported but does not lead to a shutdown, since it is only a warning
otpw=1 plus ot=1 would be an Overtemp Error and shutdown. In this case, the above mentioned points do apply.

smccully · March 11, 2024, 9:03pm

Meant to respond to this sooner,

That makes sense, it is a little frustrating from a user perspective. Though I don’t have a lot of experience with this kinda of hardware/real time systems.

That log exists which I initially overlooked as the actual error.

klippy.log:TMC 'extruder' reports DRV_STATUS: 001e01c3 otpw=1(OvertempWarning!) ot=1(OvertempError!) ola=1(OpenLoad_A!) olb=1(OpenLoad_B!) t120=1 cs_actual=30

Igl · March 12, 2024, 12:18am

At least you could introduce some ‘isCritical = yes|no’ option to sensors like chamber temperature.

So that a failing temp sensor (like ambient/enclosure) would be ignored instead of killing the whole print.

theophile · March 12, 2024, 7:25pm

Depending on what TMC driver you have, you might be able to use a delayed_gcode loop to poll its internal temperature and issue a PAUSE command if it gets to within (say) 10 degrees of the shutdown threshold.

Even if your driver doesn’t report the internal temperature, you might be able to poll drv_status to look for otpw=1(OvertempWarning!) and issue a PAUSE or some other custom gcode before it gets to the point of ot=1(OvertempError!).

But of course, that should only be used as a protection mechanism to try to avoid losing a failed print, not as a solution to whatever is causing the warning and error in the first place.

smccully · March 12, 2024, 10:26pm

Yeah, I will try to look into this more and try to figure out what might be reasonable ways to reduce halts on failures.

A few things to consider,

If the errors are raised from the board firmware and require rebooting the firmware that may be out of reach or reasonableness.
If there are some other errors that can be safely resumed, than it may be reasonable to offer a way to not require firmware restart
This error in particular is good example where I think theophile makes a reasonable point. It would have been good to know this type of warning was occurring prior to the actual error. I think something in Mainsail or Fluid UI that shows a driver status or reports on driver warnings would be a reasonable place to start on making any changes.

system · May 11, 2024, 10:26pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to ignore TMC driver errors? General Discussion	3	565	June 22, 2022
TMC 'extruder' reports error: DRV_STATUS: 001f00e0 s2vsb=1(LowSideShort_B!) ola=1(OpenLoad_A!) olb=1(OpenLoad_B!) cs_actual=31 General Discussion	2	8001	June 16, 2024
Learning klippy log emergency aborted print General Discussion	4	38	November 26, 2024
TMC Driver error General Discussion	5	639	June 16, 2024
So I had a Unable to read tmc uart 'extruder' register IFCNT" General Discussion	4	391	August 17, 2024

Fatal Errors not so fatal ....?

Related topics