PID tuning... It's broken, but how is best to fix it?

PID autotuning in Klipper currently uses the Ziegler-Nichols method to generate PID parameters, with the critical point determined by the Astrom-Hagglund method but using a relay with hysteresis. It’s all in this file.

When implemented, PID uses a Conditional Integration PID loop.

It appears such a setup has numerous shortcomings, and fixing each one just makes the others more problematic… They are:

  • The hysteresis level in the Astrom-Hagglund method (TUNE_PID_DELTA, a 5C constant) would ideally be zero. By changing that constant, we can see quite how problematic that is:
Send: PID_CALIBRATE HEATER=heater_bed TARGET=50 TUNE_PID_DELTA=0.5
Recv: // PID parameters: pid_Kp=217.643 pid_Ki=24.183 pid_Kd=489.697

Send: PID_CALIBRATE HEATER=heater_bed TARGET=50 TUNE_PID_DELTA=1.0
Recv: // PID parameters: pid_Kp=152.234 pid_Ki=10.356 pid_Kd=559.459

Send: PID_CALIBRATE HEATER=heater_bed TARGET=50 TUNE_PID_DELTA=2.0
Recv: // PID parameters: pid_Kp=80.175 pid_Ki=3.448 pid_Kd=466.015

Send: PID_CALIBRATE HEATER=heater_bed TARGET=50 TUNE_PID_DELTA=5.0
Recv: // PID parameters: pid_Kp=41.185 pid_Ki=0.708 pid_Kd=599.243

  • Yes, the Kp constant really does vary by a factor of 5x depending how its measured!! I think this is the main reason people complain of a slow frequency response and long heatup times. They are using grossly undertuned PID parameters.

The TUNE_PID_DELTA value cannot be set smaller because it is vulnerable to temperature sensor noise. It would be necessary to simulate an ideal relay and use frequency analysis to find the critical frequency, but since the temperature sensors don’t sample at consistent intervals you need some pretty specialized methods to do this (ie. you’d end up needing most of scipy as a dependency). I did this anyway to get the perfect parameters… Which brings me to the next shortcoming:

  • The PID loop uses Conditional Integration - ie. we don’t update the integral term if the output is max power or zero. With undertuned PID constants, this works fine… But when correctly tuned with the Astrom-Hagglund method, the output value frequently hits maximum or zero because the constants are much higher, while the sampling rate and noise are unchanged. That in turn means that we miss enough integral updates to start to affect performance and stability. I tried removing Conditional Integration and stability is now great… but obviously there are now integral windup issues instead.

Here are some ideas for solutions:

  • Run the tuning twice, with TUNE_PID_DELTA=5.0 and then =2.5. Use that to linearly interpolate the values that would have been measured with an ideal realy (ie. delta=0). That should get close to the correct PID values, at the expense of double the tuning time.

  • Modify the PID loop to back-calculate the integrator value as an antiwindup mechanism. Ie. figure 6.7 here.

Ideas welcome.

Well, for my setup the PID tuning works good (good enough for me) but you are right, there are edge cases where it does not yield the wanted results. You might want to check:

I’m sure it is possible to improve the PID and PID_CALIBRATE process. However, my own experience is that one can spend dozens of hours attempting to tune/code a control system and end up with only minor improvements. In many cases it even subtly reduces robustness. FWIW, I therefore place high value on keeping things simple.

Some random comments:

TUNE_PID_DELTA: As you indicate, it is there to account for signal measurement noise. I’ve read papers that said it is okay to use this type of delta during the relay test, but I don’t have the citations handy. I always felt it was a bit odd to have it, but it is a common mechanism in relay tests.

Conditional integration: As indicated it is a common mechanism to avoid “integral windup”. It’s very common in PID controllers and this is the first I’ve heard that it is a significant issue. I agree that “back calculating” the impact of a full-on/full-off heat setting would be theoretically better - however it would require an accurate model of the heater - which would add significant complexity and risk a loss of robustness.

I don’t know why you say this. The time between temperature measurements should be very precise (handful of microseconds).

-Kevin