Duet RTD & thermocouple daughter card configuration

I posted this earlier on Discord, but it will be more readily accessible and therefore more useful long-term here.

This is a word of caution to those using the Duet RTD or Thermocouple daughter boards. As I recently started expanding the operating envelope of my WIP Voron build, I suddenly started experiencing really strange issues with the hot end temperature. I am using a Duet 3 Mini 5+ with the extruder temperature provided by Duet dual MAX31865 PT100 daughter card that I modified to use with PT1000 RTDs. I am using only one of the two channels of the daughter card and the second channel is not currently connected.

While performing some extended extrusion tuning with the hot-end at temperature, Klipper suddenly and randomly started shutting-down and reporting a whole series of random “thermocouple faults”. As I was troubleshooting, this behaviour escalated to the point where the printer would shut down just a few seconds after each Klipper restart. And the reported faults were totally nuts - effectively all fault bits were being set most of the time (but not always), for example:

Max31865 Overvoltage or undervoltage fault
Max31865 RTD input is disconnected
Max31865 RTD input is shorted
Max31865 VREF- is greater than 0.85 * VBIAS, FORCE- open
Max31865 VREF- is less than 0.85 * VBIAS, FORCE- open
Max31865 VRTD- is less than 0.85 * VBIAS, FORCE- open
Max31865 Overvoltage or undervoltage fault
Max31865 RTD input is disconnected
Max31865 RTD input is shorted
Max31865 VREF- is greater than 0.85 * VBIAS, FORCE- open
Max31865 VREF- is less than 0.85 * VBIAS, FORCE- open
Max31865 VRTD- is less than 0.85 * VBIAS, FORCE- open
Max31865 Overvoltage or undervoltage fault
MCU 'mcu' shutdown: Thermocouple reader fault

This became so bizarre that I actually connected a high-end bench multimeter to measure the RTD voltage right at the daughter card terminals and I noticed that the voltage was perfectly fine and steady at the time Klipper was declaring a fault. For fun, I am attaching the “wild ride” log file if anyone is interested in seeing how bad this became. I do also have a question for @koconnor: in all these fault cases the MCU has shut-down but klippy continued to run as evidenced by continuing log entries with PWM output and extruder temperature “frozen” at the last valid value - I am not sure if that is the desired behaviour?

klippy-thermocouple reader fault 4.zip (195.4 KB)

These shutdowns became so incessant with no apparent RTD signal anomalies that at some point I decided that it might be useful to see what the second MAX chip was doing during these faults. I defined a new temperature_sensor section and connected a fixed 1.8k resistor into the second MAX chip. Following this change I could no longer induce the faults so I paused the troubleshooting (it was also 2 am). I also started suspecting that perhaps the shutdowns might be related to only having one MAX chip in use with the second one not connected to an RTD while both are on the same SPI bus.

After some creative searching I came across this old post from Kevin: Max31856: Cold Junction High Fault · Issue #2326 · Klipper3d/klipper · GitHub. Reading this post prompted me to explicitly force high the chip select of the unused MAX chip by means of static_digital_output. So far with over 24 hours of testing the random thermocouple faults have disappeared.

For sake of completeness, here is the pertinent configuration excerpt for the modified PT100 daughter card on my Duet 3 Mini 5+:

[samd_sercom sercom7]
## define PT100 daughter card SPI sercom
sercom: sercom7
tx_pin: PC12
rx_pin: PC15
clk_pin: PC13 

[static_digital_output rtd_disable]
## disable the unused MAX31865 on the same SPI bus
pins: PC7                              # disable RTD2 SPI chip

[extruder]
## DRIVER0
sensor_type: MAX31865                  # using PT100 daughter board
sensor_pin: PD11                       # RTD1
spi_bus: sercom7                       # sercom must be defined earlier in the file
rtd_nominal_r: 1000                    # using PT1000
rtd_reference_r: 4000                  # PT100 daughter board modified for PT1000
rtd_num_of_wires: 2

I will post if the situation changes, but for now it appears to me that all the unused MAX chips must be explicitly forced high since they all share the same SPI bus. This is something that is not immediately obvious and I do not believe it is in the documentation.

Peter.

I’m not sure what you are asking. A shutdown causes the mcus to transition to a “shutdown state” which is designed to disable motors and heaters. The host software continues to run, so that users may diagnose the event and to attempt a restart.

This is an important wiring requirement for all SPI devices. Every device that is physically wired to an SPI bus must be configured in Klipper, even if the device is not nominally in use. If a chip is not properly configured then its “chip select” wire may not be set correctly and the chip may respond to messages on the SPI bus that are not intended for it. Those unintended responses may conflict with messages received from the intended chip and thus cause corruption of those messages. This is not a “software issue”, but a low-level requirement of SPI hardware wiring.

Cheers,
-Kevin

Thanks Kevin, you answered my question. I do not recall ever before seeing logs where the MCU was shut down but the host continued running, so this seemed “unusual”. Good to know that this is the expected behaviour.

Understood and I was in no way suggesting that there is any software or implementation issue. I posted this because I suspect that this SPI hardware requirement may elude man users in the same way it eluded me until I started seeing the strange issues.

If you think that it might be useful to explicitly explain this hardware requirement in Klipper documentation I would be happy to prepare a PR, perhaps by including a note in Configuration reference - Klipper documentation.

Peter.