Experimental PID improvement changes

Over the last month or so, I’ve dug into the pid calibrate and the pid control sections of the codebase. I’ve come up with some significant, but not blatantly obvious performance improvements. I can only test and validate so much myself, so I wanted to make it available to the community for additional testing and feedback.

pid calibrate

If you set the WRITE_FILE flag when running a tune, you can plot the output and get a good idea of the systems dynamics. The following two plots are of tuning runs on my hotend and bed. For the hotend the calibration temp was 220 and the bed was 40. I chose 40 for the bed, as it’s at the upper end of tpe temps, and is an extreme tuning case that will expose any issue with the tuning algorithm.

What you see above is the output of a standard relay test. If you would like a general overview of what the relay test does, you can read about it [here].(https://d1.amobbs.com/bbs_upload782111/files_36/ourdev_614499E39LAH.pdf).

Both graphs show two undesirable phenomena, bias and asymmetry.

Bias is most easily seen as the difference between the power on and off time. If the system was perfect, the amount of time the power was on would be equal to the amount of time it was off. Since most printers can’t remove heat from the hotend & bed as efficiently as they can add it, the system will never be perfect. Thus, the system will always have some amount of bias. however, a good calibration will minimize bias as much as possible.

Asymmetry is most easily seen as the difference in amplitude of the temperature waveform. If the amplitude of the peaks above and below the target temperature are different, then the system is asymmetric. The more asymmetric a system is the worse the output of the calibration run will be. Thankfully asymmetry, can be controlled by varying the maximum power used during the relay test.

This paper presents several methods that can be used to minimize asymmetry, and bias. I implemented the iterative “Peak Based Correction” method, as it works well with multiple types of systems.

The next two plots are from calibration runs done with the updated algorithm. As you can see, asymmetry have been removed, and bias has been significantly reduced.

comparison of calibration results

Bed:

previous:
pid_Kp=37.957 
pid_Ki=0.146
pid_Kd=2469.585

updated:
pid_Kp=10.047
pid_Ki=0.052
pid_Kd=481.684

hotend:

previous:
pid_Kp=20.448 
pid_Ki=0.940 
pid_Kd=111.188

updated:
pid_Kp=15.647 
pid_Ki=0.745 
pid_Kd=82.144

The following 2 plots are 20 minute test runs using the old and new calibration parameters for my hotend. For this test the old controller is usesd so we have a like for like comparison. As you can see from plots there is less oscillation across the board.

The old calibration/tune overshoot by 3.58C and took 114.9 seconds to settle in at the target temperature. After reaching the target temperature maximum deviation was +0.51C and -0.46C

The new calibration/tune overshoot by 4.15C and took 127.5 seconds to settle in at the target temperature. After reaching the target temperature maximum deviation was +0.31C and -0.29C

The new calibration/tune overshooting more and taking longer to settle in, is because of a limitation of the controller (more on that later.) However, once it does settle in, it oscillates about the set point significantly less.

To visually show how much better the new calibration is, here is a plot that shows the summation of absolute variance from the target temperature over time.

In my next post (after dinner) I’ll discuss the pid controller.

6 Likes

pid control

I’ve been in the klipper community for less than a year, and it’s already been brought to my attention by several people that the pid controller is a contentious topic. Thus rather than going into detail about what I found wrong with the existing controller, I’ll just say it had issue related to integral wind up and smoothing that I’ve fixed or removed. I’ve also made some general performance improvements, and tweaks to how it handles windup based on a recommendation in the Astrom & Hagglund book Advanced PID Control.

Test runs of my bed will best illustrate the validity of my changes and improvements.

The following plot is a test run using the old calibration output and the old controller.

it overshoots by 2.65C and at the end of the 20 minute test, the closest it got to the target temperature was +0.78C.

The next plot is is a test run using the new calibration output and the old controller. Note the significant decrease in noise/oscillation thanks to the new calibration.

it overshoots by 3.74C and took 985.8 seconds to settle in at the target temperature. After reaching the target temperature maximum deviation was +0.37C and -0.32C

The final plot is of the new calibration output and the new controller.

it overshoots by 3.56C and took 873.3 seconds to settle in at the target temperature. After reaching the target temperature maximum deviation was +0.31C and -0.16C.

The settling in time is approximately 11.4% faster than the previous run and shows less deviation overall.

the files

In the attached zip file, you will find the following 3 files.
pid_calibrate.py - the calibration class file
heaters.py - The file that contains the pid controller class
heaters-l.py - a version of heaters.py thats will log data to the /tmp directory any time the target temperature isn’t 0.

To test these files you need to put them in place of their equivalents in the klippy extras directory.
on my install that’s /home/pi/klipper/klippy/extras. for the logging version you will need to rename it to heaters.py.

files.zip (11.6 KB)

Testing

  1. do you get get less oscillation about the target temperature when at steady state (not while printing)?
  2. do you see less oscillation while printing?
  3. do you see more or less overshoot?
  4. does it take more or less time to reach the target temperature?
  5. do you see any negative side effects?

what I’m working on now

  1. switching from the positional form of pid control to the velocity form. The velocity form is inherently better at preventing integral windup. How much better? initial testing shows up to a 63% reduction in overshoot (the primary side-effect of integral windup).
  2. better input smoothing. The velocity form of pid control is more affected by noisy input than the positional form, so we need to upgrade to a better algorithm than the current modified moving average.

If/when I can get the velocity form working effectively and reliably I will post it. I’ll also post an update if/when i get a better smoothing algorithm in place.

6 Likes

Many thanks for your contribution :+1:

Could you provide a fork of Klipper main line with a new branch containing your changes? I’d rather prefer to manage changes to my klipper installation with git instead of copying / replacing files hard.

I haven’t had my morning dose of caffeine yet, but I’m pretty sure I’ve got everything in place.

Thanks a lot.

FWIW when building a small robot with my son back some time, we needed PID too and I really enjoyed reading the series of blog entries here: Improving the Beginner’s PID – Introduction « Project Blog
But I guess you are way past this point :slight_smile:

I keep running into issues where I get errors such as

11:22:20
pid_calibrate interrupted
11:22:20
calculated power to high
11:22:20
sample:2 bias:-0.3509 pwm:0.6000 new pwm:0.6501
11:21:40
sample:1 bias:0.0522 pwm:0.6000 new pwm:0.6000

I had my max power set to 0.6 in the firmware. Before, I had it set to 0.5. Still didn’t get around this issue, however. This is the bed heater.

And on the hotend side, I got an error about “PID tune taking too long”. I am using a 60w heater and a pt1000 on a dragon UHF hotend.

At first glance it seems like you max power to low, as the calibration is trying to increase it. I have the code set up to specifically prevent going beyond the max value the user has set. You might want to try setting the max power to 1.0, and seeing if that solves the problem.

For the hot end it looks like your system might be a little to noisy to hit the very tight tolerance i currently have set in the code.

what the code does, is alter the maximum power output until it can get TUNE_PID_SAMPLES +1 consecutive samples within ±TUNE_PID_TEMP_TOL of the target temperature.

if you are ok with manually editing the code you might want to increase TUNE_PID_TEMP_TOL to 0.2 or 0.3. Ultimately i think this should probably be an optional input, with a larger default value.

1 Like

I’d rather not set the max power to 1, as I’m trying to avoid tacoing the bed due to too fast heating. For the hotend, I’ve changed the parameter you mentioned to 0.2 as you suggested, and it works!

I’ll take a look at it tonight tonight, but i can for sure cap the max power instead of throwing an error for attempting to exceed it.

I’m not sure how it will effect the outcome of the calibration though. I assume it will either take longer to converge on a solution, or the tolerance will need to be increased to prevent the run timing out.

1 Like

Sounds good. After running the calibration, I ended up getting this. Which… doesn’t look much different from the original PID tune.

I’ve been running this experimental PID controller on my machine for a few weeks now. It’s made a pretty massive difference in overall performance on my machine. I haven’t had a chance to see how his calibration routine does yet, but intend to do so in the next few days.

I’ve traditionally had two issues that have made me suspicious of the behavior of the existing controller and tuner:

  • during initial warmup, the bed heater would overshoot pretty badly. it would then take an extremely long time (10+ minutes) to settle at the setpoint. More concerningly, it would spike PWM duty cycle while still above setpoint. This set of issues in particular has always made me highly suspicious of integral windup in the existing controller.
  • The hotend generally had trouble maintaining setpoint. I typically print hot and fast – 280C @ 200mm/s (~16mm^3/s) w/ 1m/s travels is a pretty typical print profile for me these days. The PID controller would consistently struggle to stay within -5-10C of setpoint. I’ve had quite a few mid print failures due to faults from the HE temp dipping too far below setpoint.

For reference, the machine in question is a Railcore with a ~340x330x6.35mm Mic6 bed, 750W AC heater, Slice Mosquito w/ 50W heater, and a single 5015 parts cooling fan.

Worth noting there’s an older Github issue where multiple people were reporting an extremely similar issue with Artillery Sidewinders – in that case, adjusting heater smooth time was enough of a bandaid to get the printers functional. I experimented a fair bit with this and never got performance I’d consider acceptable.

(Personally, I suspect it’s the combination of a relatively high thermal mass bed with a high powered AC heater likely brings the issues with the current controller to the forefront; the fact that the Railcore also uses a top channel mounted bed thermistor which increases delay in heat soak probably exacerbates it further. Print conditions that saturate heater output for extended periods are also more likely to be impacted).

I spent a fair amount of time hand tweaking the PID terms and eventually landed where I could get settle from overshoot down to 5-10 minutes and the HE would typically stay within -5-8C (and thus wouldn’t fault), so big improvement but still far from ideal.

Unfortunately, I’m not anywhere near as rigorous as DanS in gathering data, but the performance I’ve been getting with the experimental controller has been pretty night and day. Bed warmup from ambient to settled at setpoint is now around 2-3 minutes total with little to no overshoot and hotend temperature generally sits within +0.5/-2C over the course of a print. Note that that’s using the exact same hand tuned PID values I had been running previously (and I had inklings that the HE I term tuning was still a little off – just didn’t want to sink more time into tuning it). As said, I intend to give the tuner a shot in the next few days and report back on how it’s performing with the PID terms that sets.

tl;dr – huge difference on my machine, even without updated PID terms. Other machines with less stressful print profiles may not see much if any change in performance.

1 Like

I think that’s to be expected given how noisy your hot end seemed during the tuning process. You most likely won’t see a noticeable change until a better smoothing algorithm is put in place.

@Lbibass I’ve updated the pid calibration class.

You can now pass in the tolerance at runtime, and i have increased the default value to 0.2

PID_CALIBRATE HEATER=extruder TARGET=220 WRITE_FILE=1 TOLERANCE=0.5

The code will also no longer throw an error if it tries to set the power level higher than the max. It will just set it to the max value and continue the tuning process. You will have to use a larger tolerance in this scenario if you want the calibration to finish successfully.

1 Like

Setup:

  • Extruder with 50W cartridge
  • Current PID parameters
    • pid_kp = 20.446
    • pid_ki = 0.995
    • pid_kd = 105.043
  • Bed with 6mm aluminum plate and 600W AC heater
  • Current bed PID parameters
    • pid_kp = 57.734
    • pid_ki = 2.391
    • pid_kd = 348.567
  • Overall very happy with the current PID parameters, coming from native Klipper auto-tuning

1st run extruder:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.2
  • Result: Shutdown with IndexError: list index out of range
  • Logs: 1st_extruder.zip (66.1 KB)

2nd run extruder:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.5
  • Result: Shutdown with Heater extruder not heating at expected rate (which I think is false)
  • Logs: 2nd_extruder.zip (108.6 KB)

1st run bed:

  • PID_CALIBRATE HEATER=heater_bed TARGET=105 WRITE_FILE=1 TOLERANCE=0.2
  • Result: Finished tuning run without error
    • pid_kp = 34.571
    • pid_ki = 1.048
    • pid_kd = 284.992
  • Logs: 1st_bed.zip (139.1 KB)

@Sineos I’ve looked at your logs, and both your hotend/extruder runs have the same issue.

Heater extruder not heating at expected rate

for the first run it looks like it cause some kind of race condition and the calibration run was allowed to continue even though klipper errored out. This is why it gave an index error. Your second run just got further along before it failed.

In your second attempt your logs show you ran the part cooling fan to 100% roughly 30 seconds before you started the calibration run.

Received 120622.351042: {"id": 281473331767504, "method": "gcode/script", "params": {"script": "M106 S255"}}

Read 120651.059866: 'PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.5\n'

At this point i have a concern, i would expect a 50W cartridge to easily be able to cope with your average part cooling fan at max output. So, unless you are running a leaf blower as a cooling fan (joking), you might want to see if your cartridge is going bad, or if it isn’t making good contact with the block.

The data from the logs and the graph is saying your heater can’t properly handle your fan at 100%.

even at max power you are 3.4 to 3.7C under target.

sample:1 deviance:-3.4133 pwm:1.0000 new pwm:1.0000
sample:2 deviance:-3.4313 pwm:1.0000 new pwm:1.0000
sample:3 deviance:-3.4312 pwm:1.0000 new pwm:1.0000
sample:4 deviance:-3.6850 pwm:1.0000 new pwm:1.0000
sample:5 deviance:-3.4132 pwm:1.0000 new pwm:1.0000
sample:6 deviance:-3.7940 pwm:1.0000 new pwm:1.0000
sample:7 deviance:-3.4676 pwm:1.0000 new pwm:1.0000
sample:8 deviance:-3.4494 pwm:1.0000 new pwm:1.0000
sample:9 deviance:-3.7214 pwm:1.0000 new pwm:1.0000

Do you run your fan at 100% when you are printing at 260C?

3rd run extruder

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.2
  • No fan
  • Stopped calibrating after 20 attempts to converge
  • Logs: 3rd_extruder.zip (68.5 KB)

4th run extruder

  • Klipper Main Line
  • PID_CALIBRATE HEATER=extruder TARGET=260
  • Fan off
  • Converged after 5 attempts
    • pid_Kp=33.721
    • pid_Ki=2.555
    • pid_Kd=111.280
  • Logs: 4th_extruder.zip (29.8 KB)

Looking at the main line PID run in more details:

  • Heating up is fast
  • Easily reaches 262.5°C
  • No errors about not heating at expected rate

5th run extruder

  • Klipper Main Line
  • PID_CALIBRATE HEATER=extruder TARGET=260
  • Fan 100% (Nozzle at printing height over the bed)
  • Converged after 5 attempts
    • pid_Kp=31.751
    • pid_Ki=1.210
    • pid_Kd=208.366
  • Struggles a bit more and only reaches 260.2°C
  • Logs: 5th_extruder.zip (47.5 KB)

6th run extruder

  • Klipper Main Line
  • PID_CALIBRATE HEATER=extruder TARGET=270
  • Fan 0% (With fan at 100% the run fails at 270°C)
  • Converged after 5 attempts
    • pid_Kp=33.937
    • pid_Ki=2.542
    • pid_Kd=113.264
  • Reaches 272.4°C
  • Logs: 6th_extruder.zip (36.8 KB)

So far I always did my PID runs at worst case conditions and of course I run fan 100% during prints with 260°C, but only punctual, e.g. low layer times or bridging conditions.

Side note:
I printed a part (but only PETG this time at 235°C / 65°C) with the new controller and my old extruder PID values and the new bed PID values. Stability and accuracy was on the level of my previous experiences, e.g. extruder +0.3 °C / -0.2°C. No issues here.

@Sineos this run is better, as shown by the output.

sample:1 deviance:-0.9064 pwm:1.0000 new pwm:1.0000
sample:2 deviance:-0.8700 pwm:1.0000 new pwm:1.0000
sample:3 deviance:-0.8339 pwm:1.0000 new pwm:1.0000
sample:4 deviance:-0.8152 pwm:1.0000 new pwm:1.0000
sample:5 deviance:-0.9425 pwm:1.0000 new pwm:1.0000
sample:6 deviance:-0.7066 pwm:1.0000 new pwm:1.0000
sample:7 deviance:-0.6699 pwm:1.0000 new pwm:1.0000
sample:8 deviance:-0.8154 pwm:1.0000 new pwm:1.0000
sample:9 deviance:-0.8337 pwm:1.0000 new pwm:1.0000
sample:10 deviance:-0.8336 pwm:1.0000 new pwm:1.0000
sample:11 deviance:-0.7246 pwm:1.0000 new pwm:1.0000
sample:12 deviance:-0.9606 pwm:1.0000 new pwm:1.0000
sample:13 deviance:-0.9064 pwm:1.0000 new pwm:1.0000
sample:14 deviance:-0.7791 pwm:1.0000 new pwm:1.0000
sample:15 deviance:-0.8703 pwm:1.0000 new pwm:1.0000
sample:16 deviance:-0.9063 pwm:1.0000 new pwm:1.0000
sample:17 deviance:-0.7975 pwm:1.0000 new pwm:1.0000
sample:18 deviance:-0.7975 pwm:1.0000 new pwm:1.0000
sample:19 deviance:-0.7246 pwm:1.0000 new pwm:1.0000
sample:20 deviance:-0.7612 pwm:1.0000 new pwm:1.0000

For this run you bounced between ~-0.9 and -0.7C under the symmetry target. so you got closer to the default tolerance of ± 0.2, but still outside of it.

this is better than your previous run were you where -3.7 to -3.4C under the symmetry target.

I did some quick hand calculations on your data to see if the derivative method would yield better results. Below are the derivatives from your 3rd run, the units are C/s. Positive values are heat up and negative are cool down. If your hotend was working symmetrically, the magnitude would be equipment but opposite in sign. The averages are 0.9891285714 and -1.291260317, so still asymmetric.

0.9699222222
-1.252688889
1.010333333
-1.293111111
1.010344444
-1.293166667
0.9698888889
-1.293255556
1.050711111
-1.293244444
0.8890222222
-1.212377778
1.010288889
-1.2932
0.9698666667
-1.252744444
1.050666667
-1.2528
0.9699222222
-1.374
0.9295111111
-1.252777778
1.010288889
-1.333644444
0.9698444444
-1.333588889
0.9699
-1.252744444
1.050666667
-1.252811111
0.9698444444
-1.293111111
0.9699
-1.2932
0.9698888889
-1.293166667
0.9295
-1.374
1.050655556
-1.293222222
1.050733333
-1.333611111

are you perhaps running a Mosquito or Mosquito Magnum? I’ve heard from some other users that they shed heat really fast, and that is probably the issue.

It’s sounding like this is an issue for a lot of people, either because they are worried about warping their bed, or because their heater cartridge can’t keep up with the heat loss to the environment.

I’ve thought about it over breakfast, and i think I have a solution that will take me a few days to implement and test. If i switch to looking at convergence of maximum power, users who have enough power overhead to reach symmetry, will still get the optimum pid parameters. Users who do not, either by choice or by limitations of their machine will get parameters as close to optimum as possible.

Thanks for coming back to this.
Some remarks:

  • It is a Dragon UHF hotend, thus similar to the Mosquito Magnum
  • It reaches 185°C quite comfortable (highest temperature I printed so far)
  • During regular printing I never observed any temperature drops. It has been rock steady
  • No silicone sock installed
  • Indeed it cools quite fast

WRT heat bed:

  • Taco’ing it has never been an issue for me since it is 6mm precision milled cast aluminum
  • It is 250 mm x 250 mm with 600W heater → 0,96 W per square centimeter, which is quite lot
  • The NTC is mounted on the heating mat. Typically my bed’s top is 5°C cooler than indicated by the NTC
  • Both the Klipper main line and your branch do a very good job controlling it

Interesting. I look forward to seeing your final results.

For what it is worth, I was not aware it is a contentious topic. For my own part, I have spent notable time experimenting with alternative approaches and I am interested in seeing the results of other people’s experiments.

It is true, that I rarely make or accept changes to Klipper’s PID code. I can provide some high-level thoughts on why that is, but ultimately I would not want to deter one from running experiments, or customizing their code.

I place a high value on “implementation simplicity” in the master Klipper PID code. In general, I view the current PID code as a “basic PID implementation” with performance that is “generally okay, but not great”. To wit, it seems to me that many alternatives add significant complexity to ultimately obtain performance that is “generally okay, but not great”. That is, the benefit is often hard to measure, and I fear some complex implementations may actually be “overfitting” to a subset of hardware.

Separately, I agree the “integral windup” part of the “basic PID” seems not ideal. (Specifically, that the code needs to “relearn” the base PWM value for a given temperature on every restart.) The non-symmetry of the PID_CALIBRATE relay test certainly also seems non-ideal.

Anyway, just some high-level thoughts. Again, not intended to discourage anyone from experimenting.

Cheers,
-Kevin