I posted this earlier on Discord, but it will be more readily accessible and therefore more useful long-term here.
This is a word of caution to those using the Duet RTD or Thermocouple daughter boards. As I recently started expanding the operating envelope of my WIP Voron build, I suddenly started experiencing really strange issues with the hot end temperature. I am using a Duet 3 Mini 5+ with the extruder temperature provided by Duet dual MAX31865 PT100 daughter card that I modified to use with PT1000 RTDs. I am using only one of the two channels of the daughter card and the second channel is not currently connected.
While performing some extended extrusion tuning with the hot-end at temperature, Klipper suddenly and randomly started shutting-down and reporting a whole series of random “thermocouple faults”. As I was troubleshooting, this behaviour escalated to the point where the printer would shut down just a few seconds after each Klipper restart. And the reported faults were totally nuts - effectively all fault bits were being set most of the time (but not always), for example:
Max31865 Overvoltage or undervoltage fault
Max31865 RTD input is disconnected
Max31865 RTD input is shorted
Max31865 VREF- is greater than 0.85 * VBIAS, FORCE- open
Max31865 VREF- is less than 0.85 * VBIAS, FORCE- open
Max31865 VRTD- is less than 0.85 * VBIAS, FORCE- open
Max31865 Overvoltage or undervoltage fault
Max31865 RTD input is disconnected
Max31865 RTD input is shorted
Max31865 VREF- is greater than 0.85 * VBIAS, FORCE- open
Max31865 VREF- is less than 0.85 * VBIAS, FORCE- open
Max31865 VRTD- is less than 0.85 * VBIAS, FORCE- open
Max31865 Overvoltage or undervoltage fault
MCU 'mcu' shutdown: Thermocouple reader fault
This became so bizarre that I actually connected a high-end bench multimeter to measure the RTD voltage right at the daughter card terminals and I noticed that the voltage was perfectly fine and steady at the time Klipper was declaring a fault. For fun, I am attaching the “wild ride” log file if anyone is interested in seeing how bad this became. I do also have a question for @koconnor: in all these fault cases the MCU has shut-down but klippy continued to run as evidenced by continuing log entries with PWM output and extruder temperature “frozen” at the last valid value - I am not sure if that is the desired behaviour?
klippy-thermocouple reader fault 4.zip (195.4 KB)
These shutdowns became so incessant with no apparent RTD signal anomalies that at some point I decided that it might be useful to see what the second MAX chip was doing during these faults. I defined a new temperature_sensor
section and connected a fixed 1.8k resistor into the second MAX chip. Following this change I could no longer induce the faults so I paused the troubleshooting (it was also 2 am). I also started suspecting that perhaps the shutdowns might be related to only having one MAX chip in use with the second one not connected to an RTD while both are on the same SPI bus.
After some creative searching I came across this old post from Kevin: Max31856: Cold Junction High Fault · Issue #2326 · Klipper3d/klipper · GitHub. Reading this post prompted me to explicitly force high the chip select of the unused MAX chip by means of static_digital_output
. So far with over 24 hours of testing the random thermocouple faults have disappeared.
For sake of completeness, here is the pertinent configuration excerpt for the modified PT100 daughter card on my Duet 3 Mini 5+:
[samd_sercom sercom7]
## define PT100 daughter card SPI sercom
sercom: sercom7
tx_pin: PC12
rx_pin: PC15
clk_pin: PC13
[static_digital_output rtd_disable]
## disable the unused MAX31865 on the same SPI bus
pins: PC7 # disable RTD2 SPI chip
[extruder]
## DRIVER0
sensor_type: MAX31865 # using PT100 daughter board
sensor_pin: PD11 # RTD1
spi_bus: sercom7 # sercom must be defined earlier in the file
rtd_nominal_r: 1000 # using PT1000
rtd_reference_r: 4000 # PT100 daughter board modified for PT1000
rtd_num_of_wires: 2
I will post if the situation changes, but for now it appears to me that all the unused MAX chips must be explicitly forced high since they all share the same SPI bus. This is something that is not immediately obvious and I do not believe it is in the documentation.
Peter.