Experimental PID improvement changes

@Lbibass, @Sineos I’ve implemented and tested the changes to work with systems that are underpowered. It works by looking for for convergences in the maximum power setting using this method.

    def converged(self):
        powers = float(len(self.powers))
        if powers < TUNE_PID_SAMPLES + 1:
            return False
        powers = self.powers[-1*(TUNE_PID_SAMPLES+1):]
        if (max(powers)-min(powers)) <= TUNE_PID_TOL:
            return True
        return False

the conversion tolerance now defaults to 0.01, as its looking for convergence between numbers with a different magnitude than it was previously. You can alter the tolerance at run time just as you could with the previous iteration (if your system is to noisy). I’ll be interested to hear if people get convergence with the default or not, as picking a default is always a difficult decision.

PID_CALIBRATE HEATER=extruder TARGET=220 WRITE_FILE=1 TOLERANCE=0.02

To test it on my end, I intentionally set max_power far lower than it should be.

[extruder]
max_power: 0.4

this is what klipper master returns.

pid_Kp=14.195 
pid_Ki=0.627 
pid_Kd=80.378

this is what I just pushed to github returns.

pid_Kp=14.141 
pid_Ki=0.616 
pid_Kd=81.132

those values are within what I would expect for run to run deviances.

In summary if your system is underpowered you will get the same calibration results as what klipper master currently gives. If your system is overpowered, you will get better results, as the algorithm will decrease maximum power (during calibration) to make your system run in a symmetric way.

My comment was based on private conversations i had with several people.

The main issue with windup isn’t implementation related, but the inherent limitations of the positional form of pid control when used with a system/actuator that saturates.

I personally blame the establishment for this. In pretty much every text they spend a substantial chunk of time driving the positional form based on P and PD controllers. they then spend another significant chunk of time discussing the various methods of dealing with wind up, clamping/conditional integration, tracking, back calculation etc.

If your lucky some time later you will see something like this!

from Advanced PID Control by Astrom & Hagglund

13.4 Velocity Algorithms

The algorithms described so far are called positional algorithms because the
output of the algorithms is the control variable. In certain cases the control
system is arranged in such a way that the control signal is driven directly by
an integrator, e.g., amotor. It is then natural to arrange the algorithm in such
a way that it gives the velocity of the control variable. The control variable is
then obtained by integrating its velocity. An algorithm of this type is called a
velocity algorithm.

Velocity algorithms were commonly used in many early controllers that
were built around motors. In several cases, the structure was retained by the
manufacturers when technology was changed in order to maintain functional

compatibility with older equipment. Another reason is that many practical is-
sues, like wind-up protection and bumpless parameter changes, are easy to im-
plement using the velocity algorithm. This is discussed further in Sections 3.5

and 13.5. In digital implementations velocity algorithms are also called incre-
mental algorithms.

I did a rough pass of the velocity form, and it worked exceptionally well compared to the positional form, when the temperature reading had low noise. However it was more susceptible to noise, so i’m working on dealing with that.

Awesome contribution, thank you. I’ll give this a go when I get my printer working again.

How dare you calling my system underpowered. This is grossly insulting … Just kidding :innocent:

Generally, I think this is a very valuable information that could be placed more prominently. If such a tuning loop detects that there is an imbalance between heating capacity and heat loss, it is a good indication for improving certain aspects.

I did some tests with your new code:

1st run:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.02
  • Fan off
  • No silicone sock
  • Converged after 3 loops
  • Quite similar to Klipper main

2nd run:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.02
  • Fan off
  • No silicone sock
  • Converged after 3 loops
  • Quite similar to Klipper main
  • Same as first, I was surprised because it only did 3 runs

3rd run:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.02
  • Fan on
  • No silicone sock
  • Errored out

4th run:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.02
  • Fan off
  • Silicone sock installed
  • Converged after 3 loops
  • Quite similar to Klipper main

5th run:

  • PID_CALIBRATE HEATER=extruder TARGET=260 WRITE_FILE=1 TOLERANCE=0.02
  • Fan on 100%
  • Silicone sock installed
  • Converged after 3 loops

Data for the runs: pid_runs.zip (205.6 KB)

I attempted to run your PID calibration code on my heater_bed since I have the Klipper mainline PID constants as well as my manually obtained PID constants. Unfortunately it timed-out before converging:

I have not looked at your code (time constraints), but I am curious why it is taking such a long time to converge on the “symmetrical” power PWM value?

In principle, following the first run you should be able to get the max heating slope (at PWM=1.0) and the max cooling slope (at PWM=0.0). You can then take a guess at the PWM that would give you approximately same heating slope as the natural cooling slope. You could then iterate from that PWM value to converge…

From your logs, your 3rd run error is.

Heater extruder not heating at expected rate
Transition to shutdown state: Heater extruder not heating at expected rate
See the 'verify_heater' section in docs/Config_Reference.md
for the parameters that control this check.

You might want to loosen your verify_heater settings a little as it seems you are running on the razors edge currently.

For sake of completeness and to add to my above post, here is a run for the bed:

Bed run

  • PID_CALIBRATE HEATER=heater_bed TARGET=110 WRITE_FILE=1 TOLERANCE=0.02
  • Did fight quite hard across 15 samples to come to a result

bed_run.zip (106.9 KB)

[verify_heater extruder]
max_error: 140
check_gain_time: 40

Hmm, shouldn’t be too strict, I would think

Edit: Using this settings since ages (don’t even remember why I set them so lax).

@koconnor or someone else who knows more than me about the heater settings might need to chime in about what to change.

If I’m reading the documentation correctly, check_gain_time & max_error are inputs for 2 different tests.

check_gain_time would be whats looked at when you are going from target temp 0 to whatever value you are tuning for.

max_error would be whats looked at when the relay test has started cycling on and off.

Considering run 3 came up to temp, and started cycling, if i had to guess, you are falling foul of max_error. It defaults to 120, and you aren’t to far off that.

@Sineos @ReXT3D I reworked the logging based on some of your feedback.

output during a run now looks like this.
logging

Hopefully this makes it more clear what’s going on.
Sample: self explanatory
pwm: the pwm setting used for that sample.
asymmetry: how asymmetric the sample was. positive is overpowered, negative is underpowered.
tolerance: the value you are trying to minimize.

@Sineos The calibration requires at least 4 samples, you just didn’t see the output of the last one previously, because the call for logging was after the tolerance check.

@ReXT3D The tolerance field should hopefully let you see if you need to use a custom tolerance or not.

I guess you are right. Since it is a cumulative error target and never reset, because target is not reached, it fails.

What I do not understand is, when looking at the CSV for the failed 3rd run:

  • With PWM at 1 it takes from Timestamp 254.76 (250°C) to reach 260°C at Timestamp 300.96
  • Timestamp 300.96 to 322.26 is well above 260, even nearly 263
  • After 322.26 it stays well below target, even down to 250 unable to recover although seemingly PWM is at 1

you’re seeing two things here.

  1. Everyone’s hotend and bed has dealy. by that I mean when the power setting is changed, it takes a period of time for that change to show up at the sensor. This happens because heat doesn’t instantaneously propagate through the block. If you used a thermal camera you would see that the block is hotter by the cartridge than it is by the sensor. That’s why even after pwm goes to zero the temperature continues to rise for a while, the heat is still propagating through the block. This happens in reverse as well, even after the power is turned back on the temperature will continue to fall for a period of time.

  2. The second issue is that’s it’s much harder for the cartridge to change the temperature of the block as you get further away from the ambient temperature. The further away from ambient you are the faster you will lose heat to the atmosphere.

You can see these phenomena in the all your previous runs. take this run for example. You are loosing heat at a much faster rate than you can put it back in.

1 Like

So, some observations after having a chance to run @DanS ’ updated tuner:

  • Hotend tuning failed with the initial version posted as with other users
  • Tuning with the more recent version works successfully and does yield PID terms similar to the extant tuner as noted
  • As already mentioned, I’m fairly confident in agreeing this is the case because the Mosquito hotend I’m running sheds heat far more readily than it gains it, especially at higher temperatures. Given the tuning algorithm was initially trying to find unbiased terms, it makes sense that it’d fail with a fundamentally asymmetric system.

The bed is where, IMO, things are much more interesting.

As I mentioned previously, the bed is approx 330x340x6.35mm mic6 with a 750W AC heater and a top channel thermistor separate from the heater.

  • with the current tuner and PID controller, it pretty severely overshoots after about 2-3 minutes and takes at least 10-15 minutes to settle at an 80C setpoint, if it does at all. It’s not unusual for it to fail to settle entirely.
  • I have been running a hand tuned profile the last few months that manages to get to settle at set point from ambient in about 2-3 minutes total with little to no overshoot. This is the case with both the current Klipper and this experimental controller. The experimental controller seems to hold temp tighter to setpoint, however.
  • DanS’ tuner yields terms that cause a couple of degrees of overshoot along with a gradual cooldown and settle at 80C at about 8 minutes from ambient.

BUT! even though the terms generated by this tuning algorithm are slower than my hand tuned values for the bed, I actually think they likely make more sense as a general purpose tuner.

The huge difference between this tuner and both the extant tuner and my hand tuned values is that the latter are much more aggressive in calling for PWM duty cycle. They’ll constantly spike and cut PWM value. This new tuner results a near constant PWM output as it gets near setpoint – the reason why it’s slower is because it holds PWM value fairly constant. This makes for slower cooldown after overshoot (which is probably the result of some windup and dead time due to the remoteish thermistor), but means there’s less sudden temperature shock to the bed.

I actually think this is better behavior overall (at least as a default) even if it’s somewhat slower – holding the heater at a steady output should result in less temperature swing and, while I havent yet tested this theory, I’d bet it should result in less subtle Z banding artifacts caused by bed temperature fluctuation.

Overall, my experience so far is that the new controller is a clear win. Im seeing more stable performance whether I run terms generated by the current tuner, this new tuner, or that I’ve hand tuned myself. This tuning algorithm looks very promising, but definitely generates less aggresive tunes than what we have now. I think it’ll require testing across a range of machines to determine whether that’s a net win. I think it’s probably headed in the right direction, though (and I know DanS is already toying with some ideas for other approaches).

I had a bit more time to experiment with your PID auto-tune code and in my case the results are, as I suspected, somewhat complicated.

I was able to get your code to converge and to produce PID constants for the bed after I increased TUNE_PID_MAX_PEAKS = 80. It took 33 samples to converge:

// sample:27 pwm:0.4796 asymmetry:0.0542 tolerance:0.0138
// sample:28 pwm:0.4696 asymmetry:0.0497 tolerance:0.0209
// sample:29 pwm:0.4606 asymmetry:0.0320 tolerance:0.0236
// sample:30 pwm:0.4548 asymmetry:0.0334 tolerance:0.0249
// sample:31 pwm:0.4489 asymmetry:-0.0155 tolerance:0.0207
// sample:32 pwm:0.4515 asymmetry:-0.0158 tolerance:0.0117
// sample:33 pwm:0.4543 asymmetry:0.0328 tolerance:0.0059

However, in my case the results are unfortunately inferior to the current mainline auto-tune and substantially inferior to my manual tune:

Klipper mainline: pid_Kp=70.14 pid_Ki=1.218 pid_Kd=1010.1

DanS: pid_Kp=33.81 pid_Ki=0.456 pid_Kd=626.4

Manual Cohen-Coon: pid_Kp=212.1 pid_Ki=15.76 pid_Kd=423.5

It is worth noting that my manual tune uses Cohen-Coon derivation and is very aggressive. This seems to work extremely well in my case and is stable with the long bed time constant.

I by no means want to sound discouraging, but I think this further illustrates Kevin’s earlier point that with PID control there really is no “one size fits all” solution. The auto-tuning only gets us a somewhat acceptable result. For best compromise in control performance an iterative manual tune will always be required IMHO.

I will next try to experiment with your modified PID control code…

A while ago I made some experiments with my heat bed (6mm cast aluminum / 250x250 / 600W) in an attempt to improve my engineering material printing experience (bed temperatures between 100°C and 120°C)

  1. PID via the NTC on the silicone heating mat
  • Pro: Rock solid temperature control
  • Con: After the magnetic sheet and the spring steel, the top is roughly -5°C below set point
  1. PID via a PT100 screwed into the aluminum plate
  • Already more difficult to obtain proper values
  • Better set point control on the surface of the spring steel sheet (-2°C)
  1. PID via PT100 foil sensor between magnetic sheet and spring steel sheet
  • Of course not meant as solution since it throws off bed leveling
  • Nearly impossible to PID tune. Hand tuning worked but very unstable and would constantly swing (using FOPDT branch as starting point)
  • Spot on temperature on the surface but would swing +/- 1 °C

Returned to solution 1 and simply set the target temperature higher

Personally I think this is one of the many 3D printing myths and I have never noticed this. Maybe on the first handful of layers this could play a role but definitively not thereafter. Maybe if your bed temp swings by +/- 20°C but even then I’d be surprised that it is ever “noticed” by the current layer when you are at Z=10 an above.

Z height doesn’t particularly matter here. Artifacting driven by temp fluctuation is largely going to be a result of slight changes in deflection as the bed and anything it’s mounted to expand and contract throughout the heating cycle. Things like bed material and whether the bed’s kinematically mounted are going to have a huge impact on it on whether it makes a visible difference.

I’m usually pretty quick to push back against 3DP broscience, but enough people have demonstrated that you can induce this sort of artifacting by, e.g., switching between bang bang and PID control that it seems pretty well settled to me.

And to be clear – I don’t think the difference between these two PID tunings is going to be that drastic, just noting it as a potential benefit to this approach as a general purpose calibration. Heavy, kinematically mounted beds like mine are fine with an aggressive tune – machines like, say, a Prusa with a thin sheet bed or a Voron where they’re already limiting bed heater output because the overconstrained bed mounting can taco the bed may benefit more from a tune that errs on the side of staying closer to a steady state.

Empirical evidence attached:

RHS is the original “as shipped” CR-10S Pro firmware (I bought one of the very early units) that shipped with very poorly customised Marlin using bang-bang bed control. Not sure why they did this, but I suspect that they tried to awkwardly protect reliability of the awful bed SSR. But that’s a totally different story.

LHS is one of my early attempts at fixing some of the printer issues where I used mainline Marlin with PID bed control enabled.

I don’t have any test prints, but the bed “breathing” with the bang-bang control was resulting in vertical artifacts on the prints. This is what prompted me to investigate the root cause in the first place…

Although going off topic here:
Thermal expansion coefficient for aluminum: 23E-6 m/(m x °C)
Bed Size: 300 mm
Delta T: +/- 1 °C
Expansion = 1 x 23E-6 x 0.3 = +/- 0.0069 mm

I do believe there is an effect on thermal expansion when going from room temp to printing temp (e.g. tacoing, z-offset etc). I also do believe there could be an effect on the first layer(s). On a halfway stabilized (temperature wise) system with a swing of +/-1 °C, I do not believe in further effects.

That’s assuming uniform heating and expansion of both the bed and its mounting system. It’s the fact that that doesn’t happen in practice that causes the bed the deflect unevenly during prints.

Stefan from CNCKitchen ran into this exact issue a while back and saw artifacting with bang bang with the type of +/-1C variance you’re talking about:

Like I said, it’s the fact that people have been able to demonstrate this in a repeatable manner that sold me on it being a real thing.

You need to use a larger tolerance, not more peaks. given the slice of output you posted above , id say you need pass in a tolerance between .012 and .015.