Default [extruder] smooth_time is too long causing temperature oscillations

Greetings,

I am posting this to document my findings for some feedback before opening an issue on GitHub.

Background

Before switching to Klipper, while I was researching it, I came across random posts and comments from people declaring that Klipper PID control was nowhere near as good as in other firmware. I found that hard to believe.

Once I converted my printer to Klipper I did notice that the hot end temperature was somewhat less stable and tended to oscillate or “buzz” slightly around the set point. In the last few months I upgraded the hot end on my printer from stock Creality to Micro Swiss and I increased the heater power from 40W to 50W. I immediately noticed that the oscillations became worse with higher peak-to-peak amplitude. This prompted me to investigate.

Findings

Klipper uses 2 seconds as the default smooth_time values for both the extruder (hot end) and the heater bed. Heater beds generally have very large thermal inertia resulting in slow warm-up and cool-down times, and a 2 second (or even longer) filter is perfectly acceptable. In case of extruders, the thermal gain (heating and cooling rates) are significantly higher, particularly with high performance hot ends that are equipped with higher power heaters. In those cases the 2 second filter is significantly too long and results in destabilizing of the PID control loop. This results in periodic oscillations around the temperature set point. The following image illustrates the temperature stability of my hot end with the default 2 second value followed by shortened 1 second value. The power clearly shows the PID loop instability:

In my setup I ultimately settled on 0.4 second that provides a perfect balance between stability and noise immunity (filtering). The following image illustrates how well the PID loop behaves with optimized smooth_time:

Recommendation

I continue seeing people noting hot end temperature instability on Discord and I keep recommending that they experiment with the smooth_time value to fix their issue. I would therefore recommend that the default value be reduced for the extruder and appropriate notes are added in the documentation, perhaps in the PID tuning section, allowing users to optimize the value for their specific installation to get stable hot end temperatures while ensuring adequate electrical noise protection.

Peter.

3 Likes

Interesting. Thanks.

There’s no harm in changing the smooth_time. I can certainly see where a larger value could delay the pid response (specifically, the derivative response of the pid). As I understand it, smooth_time should be sufficiently high to avoid a derivative response to sensor “noise”, but should otherwise be low to reduce response time.

It’s hard to make a good general default, because of all the different boards and available sensors. It would be interesting to see what test results people get with a 1 second value on various boards.

-Kevin

What are other firmwares doing? Is this generally a board-dependent setting, or static in the firmware source?

I don’t know.

-Kevin

I had a very rudimentary look at Marlin and it appears to do what they call “oversampling” where they very rapidly read the signal 16 times and average the value. I think… They also may be doing something else downstream, but I was not able to find it. Unfortunately I really do not have huge amounts of time to look through the code and my understanding of the code is also extremely basic.

Klipper seems to be implementing a first order low-pass filter (Kevin please confirm if I am correct) where at every iteration: output = output + constant * (input - output), and the filter constant is calculated based on the selected smooth_time:

def calc_smooth(self, read_time, read_value, last):
    last_time, last_value = last
    time_diff = read_time - last_time
    value_diff = read_value - last_value
    adj_time = min(time_diff * self.inv_smooth_time, 1.)
    smoothed_value = last_value + value_diff * adj_time
    return (read_time, smoothed_value)

A first order LPF is IMHO the most elegant and appropriate solution, far superior to simple averaging. So the only issue at hand is optimizing the filter constant, i.e. the smooth_time.

I totally agree that it would be very difficult to select one value that is suitable for every installation, especially with the huge variation in control boards, wire routing, power supply quality, etc. Fundamentally there are two parameters in each installation that are difficult to quantify without measurement: (1) level of noise on the temperature signal to the ADC, (2) dynamic thermal characteristics of the hot-end that is somewhat reflected in the auto-tuned PID constants but also influenced but the smooth_time value used during the tune, particularly if the filtering is heavy.

In my case I had the eureka moment when I plotted the extruder power in fluidd, that clearly demonstrated the oscillating nature of the PID output with the default value of 2. I then started reducing the smooth_time value until the oscillations disappeared under steady state conditions. Then, I confirmed good PID stability with step changes in the input (picking different temperature set-point). Finally, I temporarily reduced the smooth_time by a factor of 4-5 to confirm that the temperature remained stable and no obvious noise influence was being seen on the signal - this ensures that, at least at the test conditions, there was good noise margin.

As a side note, I just tested my printer with smooth_time 0.001 and there was still no obvious signs of noise influencing the hot end temperature. But I did go totally overboard with signal routing and isolation in the printer, where possible.

1 Like

Klipper also oversamples in the micro-controller. Eight samples are taken with a period of 1ms each. The sum is passed to the host. See SAMPLE_TIME and SAMPLE_COUNT in klippy/extras/adc_temperature.py .

That code is specific to the Duet boards, which have a funky ADC system. On the Duet, that code only smooths the vref and vssa pins.

The primary code for the PID is in klippy/extras/heaters.py. The raw “unsmoothed” values are passed to the PID, and the PID does its own smoothing of the derivative term based on the smooth_time config option. See temperature_update() in ControlPID in that file.

Temperatures are sampled every 300ms (see REPORT_TIME in klippy/extras/adc_temperature.py) and so using any smoothing value less than that will have no further effect (all smoothing is disabled in that case).

-Kevin

Thanks very much, as always, for your detailed explanation. This certainly helps with my understanding of the architectural implementation in Klipper, since I usually only skim the surface when I look into the code. Basically, I don’t know what I don’t know…

Peter.

I have a very similar issue since I switched from a 50W heating cartridge to 70W. Doing PID calibration doesn’t solve it. I’ll try the above mentioned solution.

EDIT: the mentioned method works also for my issue.

Good to hear. My main motivation for posting this here was the number of times I have seen oscillation mentioned on Discord.

Can you please do me a favour and calculate the maximum slope of your hot end temperature vs. time, since you have a rather high powered hot end? I am curious what rate in degrees C per second does your hot end achieve. This would normally occur very near room temperature and the slope should reduce as the temperature increases. It would also be interesting to understand what the cooling slope is.

@koconnor Was there ever consideration given for dT/dt slope limiting as the “filter”? The cool thing here is that you can actually get pretty accurate slopes from doing the PID tune, both for heating and for cooling. Those values, or derived values, could then be used to define a slope limiter that would effectively filter out all signal noise…

Here are the the values:

without part cooling enabeld and 235°C target temp, logged via heattest.txt:
starting at 25.445°C == 54,3s ==> 236.792°C makes a delta of 211.347K. This results in ~3.892K/s or ~0.257s/K

Same again with part cooling enabled:
starting at 40.590°C == 53,4s ==> 236.025°C makes a delta of 195.435K. This results in ~3.67K/s or ~0.273s/K.

So far the heating phase… I’ll give you also feedback for the temperature loss in a few min.

EDIT: Here are the cooling averages…

With part cooling enabled:
235°C == 290.4s ==> 50°C makes a delta of 185K. This results in ~0.637K/s or ~1.57s/K

Without part cooling enabled:
235°C == 464.6s ==> 50°C makes a delta of 185K. This results in ~0.398K/s or ~2.511s/K

Thanks very much! So this would be the average slope. Curious what the maximum slope would be, you can calculate it by doing a rolling average of let’s say 5 readings near the starting point…

For reference, my 50W cartridge heating Micro Swiss hot end results in maximum slope of approximately 3.3 K/s and average slope of approximately 2.3 K/s.

I’m sorry it may sounds a little bit dumb but I don’t have any idea how to calculate an rolling average :neutral_face: but I can provide you my klippy.log…
klippy.log (1.7 MB)

No problem. Thanks for the log. It looks like your maximum is around 5 K/s and minimum is around -1.5 K/s.

Thank you. That is what I would have suspected from my observations.

Per Kevin’s first response here, would you be able to test your board with smooth_time set to 1 second to see if it also fixes the oscillations? Thank you.

This results in ca. ±0.2K deviance.

Edit: with smooth_time = 0.5 it is ±0.1K deviance.

I also tested this on my voron2 printer (which has a 50W heater cartridge). I did see minor oscillations before and changing to smooth_time=1 improved temperature stability for me. I get even better results disabling smoothing (smooth_time=0.1).

I’ve created a github PR to change the default: Change default smooth_time from 2 seconds to 1 second by KevinOConnor · Pull Request #4646 · KevinOConnor/klippe . I’m inclined to make this change prior to the v0.10.0 release and, assuming no issues, look to further reduce smooth_time after the v0.10.0 release.

-Kevin

1 Like

I have actually been experimenting with this, time allowing, over the last few weeks. The irony is that my primary specialty is actually control systems, but I don’t deal with the inner closed-loop control loops but rather outer loops, functional logic, man-machine interfaces, etc. So this is a bit of a learning curve for me as effective PID tuning seems to be a bit of “dark art”, meaning that it leans heavily on empirical experiences.

Depending on the references and recipes used, the PID constants used for 3D printing temperature control are rather on the very aggressive side, trying to keep the temperature as constant as possible while exposed to all external influences. I was able to obtain a nice and stable temperature control with 2 second smoothing time and with much reduced PID constants, but that naturally results in more overshoot and longer settling times post overshoot. This would naturally also translate to reduced quality of response to external disturbances such as extrusion rate changes.

I also looked at the current behaviour with 2 second long smoothing time and I captured some debug data from my printer. The following plot illustrates various amounts of derivative smoothing on my CR-10S Pro with Micro Swiss all-metal hot end and 50W heater cartridge:

I obtained the plot by heating the printer from room temperature to 225C, holding the temperature steady for a period of time long enough to illustrate the oscillations with the default 2 second long smoothing time, then disabling the heater. The plot includes raw derivative readings sampled every 300ms, and the filtered derivative values obtained with 0.5s, 1s and 2s smoothing time. It really nicely illustrates the magnitude of derivative noise on my printer, and the effectiveness of the derivative smoothing algorithm (it took me a while to digest the smoothing calculations before I realized that it’s just a simple first order LPF). It also shows the ~2 second lag associated with the 2 second smoothing filter. Another thing that it shows very clearly is that derivative limiter would be totally ineffective as the noise magnitude, at least on my printer, is small compared to the operational range of the derivative.

My personal thoughts at this point are aligned with yours regarding reducing or potentially even eliminating it. Derivative smoothing / filtering would be very important in a mechanical system in order to avoid unnecessary wear on actuators caused by wild swings in the PID controller output command driven by raw derivative noise. In our case these swings have no detrimental effect on the heater PWM switching as long as the overall PID loop remains stable. In fact, in some cases it is often desirable to add “random” output noise (dither) even in a mechanical system control in order to eliminate effector hysteresis. I have seen it done quite often in precision controls.

I think, however, it is useful to retain the filter code in Klipper even if the decision is made to remove it. I am sure some installations could be noisy enough to require it.

Peter.

As a side note, I’ve noticed better noise rejection when the samples are spread over the report time. With a setting like this in adc_temperature.py:

SAMPLE_TIME = REPORT_TIME / (SAMPLE_COUNT + 1)

Also, STM32F103 and faster mcus can handle a shorter report interval and a higher sample rate:

SAMPLE_COUNT = 16
REPORT_TIME = 0.100

These are only changes suggested empirically from fiddling around, I haven’t done any in-depth comparisons yet.

Maybe some of these constant could be made configurable?

If I understand your changes correctly, you are effectively creating a rolling average smoothing that spans the REPORT_TIME interval. The advantage of doing it this way rather than 8 samples 0.001 seconds apart is that you are going to be able to “smooth out” lower frequency components, ie changes in reading that are slower than 0.008 s (125 Hz). So with REPORT_TIME set to the default 0.300 (3.3 Hz) you end-up getting a much smoother temperature reading every report time and therefore less noise in the PID controller output. Reducing REPORT_TIME, as you suggested to 0.100 will shift the rolling average “filter” frequency back to a higher value (10 Hz) and therefore result in increase in PID controller output noise compared to 0.300.

Maybe Kevin could shed some light on why the implementation reads 8 samples in quick succession and then waits until REPORT_TIME elapses to take another 8 samples, as opposed to time averaging the samples across equal length SAMPLE_TIME. I suspect it’s to reduce MCU load.