I am thrilled to report that it appears that I found a solution with a simple workaround for now, and indirectly a confirmation of the root cause.
The issue is definitely related to timing on the TMC UART. While testing the work-tmc-20210715 branch I decided to tweak some TMC2209 settings that I read about in the data sheet. Specifically, since I suspected some potential timing issue, the SLAVECONF register was of interest. Ultimately, changing the SENDDELAY value from the default of 8 to 6 or lower completely eliminates all occurrences of all TMC read faults. In many hours of testing with both my Duet Mini and the SKR Mini I was not able to trigger even a single retransmit! This was done with baud rates of 9,000, 40,000 and 57,600 and with the work-tmc-20210715 branch as well as the main (trunk) branch of Klipper. It looks like Klipper expect the response from the drivers a little bit sooner than it receives it with the default SENDDELAY setting of 8.
Hopefully this is easy to fix, but for now I am using the following macro as a workaround:
[gcode_macro TMC_SENDDELAY] gcode: SET_TMC_FIELD STEPPER=stepper_x FIELD=SENDDELAY VALUE=2 SET_TMC_FIELD STEPPER=stepper_y FIELD=SENDDELAY VALUE=2 SET_TMC_FIELD STEPPER=stepper_z FIELD=SENDDELAY VALUE=2 SET_TMC_FIELD STEPPER=stepper_z1 FIELD=SENDDELAY VALUE=2 SET_TMC_FIELD STEPPER=extruder FIELD=SENDDELAY VALUE=2
EDIT: Upon another look at the (very confusing) TMC data sheet I realized that the default value for SENDDELAY is actually “8 bit times” which I believe is equivalent to SENDDELAY VALUE=0. I am therefore retesting with SENDDELAY VALUE=2 (one step increase above default). This, however, does not invalidate my previous results that show flawless operation with SENDDELAY VALUE=6. But it does make some of my statements above invalid, such as Klipper expecting the response sooner - it seems that the response comes too soon instead and must be delayed.
EDIT2: I can confirm that there are no TMC UART retransmits with SENDDELAY VALUE=2. I have revised the above macro accordingly. I would consider this part of the investigation closed.