Porting some idea's to Klipper, freq / amp modulated output shaper, (can someone suggest a proper name for this methodology?)

djamu · December 14, 2021, 1:36am

Hi all,
@koconnor suggested I should re-post a couple of ideas here.
Suggested improvements are written / production proofed, thought it be worth to contribute to klipper, have a couple of spare weeks to implement.

A little bit of history on myself and this project:

Original project is similar ( both in age and hardware ) as the Mechaduino, software, methodology and reason very different, I have been running it for years on my machines so I deem it production safe…
Project started off as an anti-ringing system, which inevitably made it a closed-loop one, but not necessarily…
Bit of a problem is that most people associate closed-loop controllers with precision, but that is (IMHO) a misnomer, closed loop just makes ( hopefully ) sure your motor doesn’t miss any steps / is able to catch up, ergo having it doesn’t mean motors will run more precise, as owners no doubt experienced, having a PID on-board doesn’t really help fix ringing… I’ll elaborate later, (much) more

Most of the proposed improvements can be used in both open/closed loop systems, and will actually improve print quality. I’m actually very interested how input shaping will work with this, changes are they will complement or anything but… no idea…
I myself call it adaptive closed loop or output shaping… ( maybe someone can suggest a name for this kind of process ? )

The analogy to describe it to people unfamiliar with the topic is that of of the history and development of CD-players (and their associated closed loop systems), before you had auto-focus anti skidding anti vibration etc. CD players needed to be heavy and precisely machined… it became a flimsy tray thanks to a lot of control software. It’s the same with (3D)printers and other household appliances, software allows compensating the lack of sturdiness… current cheap 400€ printers outperform 3000€ 10 year old machines…
and frankly I think we should get rid of the step/dir interface
That is what this thread will be about and much more…

Thanks @koconnor for pointing me to #1038, which is exactly what I was looking for to get starting…

Some graphs of what I will be implementing, makes it easier to explain my reasoning later on.
Motors are not linear, don’t assume all stepper driver steps being equal in distance.
Magnetic field across poles differ due to mechanical metallurgical and electrical material properties, differences in wire gauge, non matched magnets, different amount of winding on poles, different electrical properties of H bridge driver, detente torque etc…

Here’s how it looks when one maps it’s manufactured offset, the actual position of a random 200 step / rev bipolar stepper-motor.
X scale is 200 steps
Y is 32usteps/step.
Measurement is 16x over-sampled ( 8 times fwd / 8 bckw )

What you see is a cyclical and absolute error, the cyclical repeats itself 50 times and corresponds with the 4 phases.

Next are 2 sets of plots each with an abs and normalized version, both are run under the same conditions (same motor driver current gcode acc…).

1st set is normal driven motor without any of the proposed algorithms, however both sets have the above error correction applied, so the actual offset would even be bigger…

2nd set has a third curve (yellow), in the 1st set that curve is equal to the blue line, this represents the position of the magnetic field in relation ( and same scale ) to the intended path, the result is orange, the actual path this motor followed, the algorithm predicts future movements without prior knowledge.

Sample rate is 10kHz, movement is 9000mm/s2 max velocity in plotted fragment is only about 43.75mm/s (using a 32mm pulley > 16teeth gt2), motor is unloaded, belts where disconnected, the visible swing is only due to a motor’ s own inertia, loaded it is much worse…
I plotted it unloaded as the algorithms have to work harder, reaction under load is slower / easier to predict.
Horizontal scale is in 1/10000 sec.
Vertical in usteps 32/step.
blue: is position as received by stepper driver.
red: is actual position motor position
yellow: position of magnetic field ( 3rd and 4th graph ), in graph 1 & 2, this is the same as blue so I left it out.

and normalized version

Most will instantly recognize this as ringing, visible is the motor swinging around it’s course at it’s resonance frequency. One can see that the max deviation is almost a (1) full step.
( maximum is well over 40 usteps ), this is quite normal, although for some this may be hard to believe.
A lot of people think that ringing is caused by belts, but that is not the case, magnets work like springs, and some of you will have noticed that creating a bit of slack in belts can actually (up to a point) improve the effect ( due to damping of the belts ), motors barely damp.

next set is what my proposed additions do, same conditions.

the orange line is the actual motor path, as it now barely deviates from the intended course 2-3 usteps vs 32 (+40)…
The magnetic field now acts in order for the actual position to match the intended position.
This can’t be done using a step/dir interface. Neither by using a PID controller ( way to slow for this )

that’s it for now,
next post will be about PID’s why it’s not suited, and the alternatives what I replaced it with, perceptron filtering, freq / ampl modulating of wave-forms etc…

koconnor · December 16, 2021, 2:50am

Interesting. Thanks for sharing.

This is something that I’ve been interested in over the last few years. One of the driving reasons for me to deploy angle sensors on my Voron Zero printer was to investigate things like this (eg, the effects of belt spring vs stepper motor lag). It’s good to see that you’ve been able to get an implementation working and are seeing positive results.

In the past I’ve seen these types of systems referred to as “servo steppers”. I agree that the main advantage to them is the ability to improve performance, and not necessarily to reduce power or to prevent “lost steps”.

One of the challenges I ran into with the mechaduino (and the “smart stepper” and clones) is that the hardware was built on older a4950 h-bridge chips which run hot. The mechaduino gets around that issue by using a PID loop to control current - but it seems that tends to introduce a “salmon skin effect” to prints - and it now seems to me that it is an inherent problem of trying to reduce current at runtime (because changing current itself has a tendency to induce rotor movement).

It seems you’ve found a mechanism to directly control stepper motor current using tmc5160 chips (via the XDIRECT register). That’s interesting because these chips are widely available and don’t have the high RDSon issue that the a4950s have. (It seems this would also work with tmc2130s, but that chip is less interesting today.)

One challenge I think you’ll have is that this type of “servo stepper” interface will need micro-controller code, and I suspect it will be a challenge to make that code portable to other micro-controllers. One of the big advantages of Klipper today is that the mcu code is simple and portable - we basically run nearly the same code on all micro-controllers. This certainly helps with debugging and rollout - for the most part users can choose any printer board they want and they should get basically the same results. However, I suspect implementing precise coil control in the mcu would require hardware specific code - in particular DMA, SPI via DMA, and timing via DMA engine. This is one of the challenges I had with the demo Mechaduino code that I wrote - to get precise timing between the SPI angle sensor and the PWM coil control would have required utilizing the SAMD21 DMA engine, but no other chip has a similar DMA engine so the effort would have been substantial and result in a “one-off” implementation.

FWIW, as @dmbutyugin mentioned in another thread, it may be possible to implement a form of “coil control” using a step/dir interface. That is, it may be possible to alter the timing of step/dir pulses so as to obtain the effect of direct coil control. Said another way, if we can identify that a particular requested motor phase actually provides slightly more/less torque then it may be possible to account for that by delaying/advancing the corresponding step pulse. I agree the implementation of this wouldn’t be as straight-forward as direct coil control, but it may help with portability - in particular it shouldn’t require hardware specific features to implement (nor a particular stepper motor driver). It’s likely such a feature would still require mcu code, as I suspect a host implementation would not work well with the current Klipper step compression code. However, hopefully it could be portable mcu code. Just thinking out loud here.

Cheers,
-Kevin

djamu · December 16, 2021, 3:45pm

@koconnor
That was a lot of questions, good ones. It seems you fully understand the goals I wanted to achieve.
( actual precision ),
As mentioned before, the hardware development of my driver took place at the exact same time as the Mechaduino one, but at the time I didn’t want to release hardware / software that I deemed not ready…

The whole setup is very similar ( the sensor identical btw ) to the mechaduino, It’s only 2 years since I ported it to the TMC’s, which means in my code it’s just a # define to switch between ‘dumb’ H-bridge drive (4 pins bridge + 2 AD) and SPI… I figured the TMC’s would do a better job as a dual current controller…
The PWM code is interchangeable with that of SPI.
( in the end it’s just symmetrical amplitude modulation, symmetrical across coils )
Portability:
All math is Integer, I’m using a magic nr. for the wave pointer (819200), that allows for sufficient integer compute headroom. ( 4096usteps / 200/400steps ) and 15bit sensor position. It’s almost all boolean, so it runs fast even on 8bit avr’s. I’m not using dma, there’s no need.
Long story short, it will run just fine on the mechaduino without overheating the a4950…
Sine table is hardcoded uint16_t wave_sine[4096], that’s 8k, which might be to much for some 8bit mcu’s. ( I have another waveform too, but that can wait )

Lets’ do this, since you’re obviously interested
I’ve reviewed your sample code / changes for the mechaduino, and have a couple of suggestions to make the configuration more generic and backwards compatible, with few changes to klippy and the mcu code.

Let’s agree on the following:
An MCU will have following operation modes and conforms to following conditions.

step / dir OR wavedrive > wavedrive 2 submodes > pwm / spi
if wavedrive > either open- closedloop.
if open-loop > support for multiple steppers / heaters / endstops etc…
if closed-loop > single stepper / dual endstop.
whe’re not going to support step/dir driven closed loop aren’t we ? I strongely object because of the inherent latency this interface has caused by consecutive pulses needed for correction.

This means configuration changes to klippy can be implemented as follows with an additional stepper definition.

[stepper_{n}]
interface:
#interface defaults to stepdir, make it backwards compatible, doesn’t need to be provided.
interface: pwm # requires one to provide 4 control pins of which 2 are pwm capable + at least 1 AD pin
interface: spi # required pins for interface are already defined in the specific driver section [tmc2130][tmc2209] etc…

(optionally)
[closed_loop stepper_{n}]
sercom: sercom4
miso_pin:
mosi_pin:
clk_pin:
ss_pin:
sensor_type:
pid_values: p,i,d #(*) this is a tricky one, I’ll elaborate later, it’s actually PIIP(s) or PI2P(s) don’t know what to name it yet…
filter_values: n,p # (**) same here…

Since closed-loop mcu’s have only 1 stepper, there’s no problem in running the complete feedback control ( the other kind of pid ) inside the mcu.
A good other reason for not running control software on a host is, usb/can bus latency and it’s associated chatter. pid ( or whatever whe’re going to call it ) cannot be queued/planned. It’s all very lightweight and portable anyway…
This way all currently existing ( including my own ) drivers can be supported. One can then seamlessly switch to wavedrive if the drivers support it. Can we agree on this?

virtual_stepper.c / h
For now this is usable, however since it basically transforms a time/step into a steps/timeframe, it’s kind of wastefull on interrupts inside the mcu just to do that.
Eventually, any time one switches to wavedrive, planner / kinematics should switch to steps/timeframe.
Ideal interval is 1/10000 sec.
Load on mcu’s will become (practically) constant and easier to predict, and allow for crazy speeds and/or usteps…
So ideally either, klippy/mcu protocol should be extended or altered alltogether to steps/timeframe…
It’s propably not something you’re keen on considering, but please do, replace it altogether, it won’t make any difference for the step/dir users.

(*) Why PID is not suited for fast feedback, I mentioned it before, here it is.
Noted as follows ( can’t use mathematical notation here, that’s unfortunate )
where Kp Ki Kd are it’s coefficients e(t) the positional error… simplified it’s
P = Kp * e(t)
I += Ki * e(t)
de = e(t) - e(t-1)
D = Kd * de
One of the misconceptions of tuning a PID for stepper motors is the misuse of Kp, which shouldn’t be a tunable parameter, but a constant matching your hardware.
if Ki = 0, Kd = 0
then Kp should match the size of the smallest step (1 ustep), zeroing Ki/Kd should make a motor run like it has no control loop at all…( simplest closed_loop )

if Ki != 0 Kd != 0
The derivative allows for damping during velocity changes.
At a constant velocity, ( I ) will wind up/down until P=0, since velocity is constant D = 0 is also true.
This makes I the sole factor for overcoming friction @ a constant velocity.
if Kd=0:
Since I is an integral of e(t), to large of Ki will make it oscillate, to low Ki, will make it even slower to react. If Kd = 0 this windup will cause a PID to always overshoot on decelleration and vice versa, one has to tune Kd to overcome that…
Problem with D and Kd is that the faster one is sampling for errors the lower D will become, at very fast sample rates D will become binary… and that is bad as it creates a lot of noise in the output.

The overshoot / or the noise introduced by the control algorithm will make a PID unsuitable for closed loop, to make matters worse, the AMS 5047D chip advertised to have 14bit resolution, has in reality only a stable 10/11 bit output which is not even close to the 6400 steps required for the 32usteps/200steps resolution, and so will introduce even more noise…
oversampling doesn’t work as it creates lag, and therefore limits RPM…
kalman filtering is too heavy for an 8-bit MCU…

No doubt this is your conclusion too,
next post is about PIIP(s), and filtering

Cheers
Jan

koconnor · December 16, 2021, 6:26pm

Okay. I look forward to seeing your results.

Some random thoughts…

whe’re not going to support step/dir driven closed loop aren’t we ? I strongely object because of the inherent latency this interface has caused by consecutive pulses needed for correction.

If using step/dir doesn’t make sense then that’s fine. I don’t think latency is an issue though - a 10K update rate is 100us between updates and a TMC driver could be sent 1000 pulses in that time.

Can we agree on this?

I understand your config proposal. I suspect there would be a lot of review, discussion, and testing next. I don’t know enough to agree to anything at this point.

virtual_stepper.c / h
For now this is usable, however since it basically transforms a time/step into a steps/timeframe, it’s kind of wastefull on interrupts inside the mcu just to do that.

Yes, the virtual_stepper.c code was just a hack to enable testing of the mechaduino.

I did consider the possibility of encoding and compressing fixed time interval updates from the host code. However, in a similar discussion on Discord a few months back, dalegaard had a really good point - it’s possible for the mcu to calculate the position at a given time using the existing queue_step messages. That is, with a little math, the code doesn’t need to use “virtual interrupts”. In a similar way, the code can get at velocity and acceleration, which some servo stepper algorithms make use of.

That said, all options are open.

replace it altogether, it won’t make any difference for the step/dir users.

No, that wont work, as 100us is too coarse. It would break stealthChop mode for sure.

kalman filtering is too heavy for an 8-bit MCU…

I wouldn’t worry about 8bit MCUs. They are becoming less common, and I don’t think there’s any reason new hardware (with angle sensors) would use them.

It did seem to me that kalman filters may not be a good fit - in addition to the cpu overhead, it seemed that calibrating it would be difficult. FWIW, when I last looked at this it did seem that an “Alpha beta filter” might be useful.

Cheers,
-Kevin

koconnor · December 16, 2021, 6:49pm

That’s surprising. I was getting very good accuracy with the magnetic hall sensor chips. In particular I’ve deployed TLE5012b sensors on my Voron Zero printer, and I’m seeing repeatability in the “13bit range”. These sensors do need to be calibrated and calibration can be tricky. I have a separate branch with the latest angle sensor code at https://github.com/Klipper3d/klipper/tree/work-angle-20210722 . I have not done checks to see how temperature skews the angle sensor measurements though.

-Kevin

djamu · December 16, 2021, 11:38pm

1000 pulses / 100us, 100ns / pulse ? that’s a pulse rate of 10Mhz if true fine, but I find that hard to believe. Almost 2 years ago I hacked a pulse library for a cortex M4F (NXP MK66FX), that works in hardware, the original author got it to 300kHz, I managed to double it, 600kHz, where pins are set, unset by hardware… very lightweight, this particular controller CPU is no slough and yet that’s a far cry from 10Mhz.

I need to look at the queue code, you’re right, as an alternative one might be able to “slurp” the queue time-frame/frame, given messages have a timestamp on them…

I haven’t tested stealthchop like this, I need to verify, at the moment I only got a couple of 2130’s no spare 2209’s… will check, I’m not using stealthchop, we had this discussion last week, I’m all about accuracy and don’t mind one bit a little bit of buzz, it’s still nowhere near what a drv8825 produces, stealthchop doesn’t work well with higher velocities, The TMC crashes when revving it up with the xdirect register…
I don’t know how it works with the 2209’s but for the 2130 it’s pretty useless but for the lowest velocities IMHO…

Too coarse ?
In the graph I previously posted ( for a Wantai 42BYGHW811 stepper ) I got it to resonate around 105 Hz, that is around 9.5ms , 10kHz / 100us control is 10 times faster, , that’s far beyond a motor it’s ability to respond to changed coil current, beyond that (105Hz) this particular motor (and any similar one ) is dead weight, you can tell it to move a full step, and next 100us update it will have barely started moving, regardless of whether it’s being driven by waveform or step/dir.

You’re right it’s not, that was just an example, I have a good filter, the perceptron based one.
I was about to document it together with the PIIP(s) (PID alternative) thingy today but you beat me to it…

Did recheck, the plot I have at hand is labeled AS5048, but the chips are AS5047P’s, I’m definitely using the DAEC feature only 5047D & P have, must be a typo…

looks like 2-3 bit noise to me, the red line is the filter I’m using, ok I exaggerated a bit .

Next is some background on the closed loop and filter, it’s a pretty long story, as I didn’t think much of those, so didn’t document them until recently. I got a visit from a math professor (it’s assistant, an implementation …), that was enlightning.
But that’s really for next post…

btw. I got something that looks like bresenham too, thought it up when I was very young, always thought everybody implemented it wrong… but that’s not the case, (that same guy told me). It’s all together different, doesn’t need a lookup table / sin / cos / tan … but strangely in code looks very similar to bresenham…
The math behind it is very different very weird, Pythagorean theorem completely down the drain . It’s run like C=A+B(+C > for 3D movements) , When 1 axis steps timer reload multiplier is 1, when 2 axis step it uses (sqr 2) as reload multiplier, when 3 axis (cuberoot 3),that’s it really simple, can be implemented in integer math easily, including accelleration in whatever direction…, one just needs to determine longest travel axis, which you get from gcode.
Due to inertia it’s still runs like C= sqr((AA)+(BB)), it runs like an approximation, but it’s exact, much faster in execution…
, it’s getting late next post tomorrow…

dmbutyugin · January 24, 2022, 11:25pm

I was thinking of some sort of ‘output shaper’ to compensate for stepper motion irregularities lately. So I tried to collect some data to see how exactly the stepper motor motion behaves. Alas, I don’t have an angle sensor, so I worked purely with the accelerometer data. And I think I got some interesting data points.

I ran this simple test in spreadcycle mode (starting from X==0)

G4 P500
G0 X200 F4800
G4 P500
G0 X0 F4800
G4 P500

basically moving from X==0 to X==200 with a velocity of 80 mm/sec and then back. I used data_logger.py to capture the statistics during this test.

In order to process the data, I’ve made a simple analyzer for motan scripts to map adxl345 readings to tmc stepper phases (averaging the readings for the same phase). Since the scripts plots the data only by time, I just made the script repeat the values for the same phases at different times. I generated two charts with

scripts/motan/motan_graph.py -s 3.25 -d 2.4 test_x -g '[["trapq(toolhead,x_velocity)"], ["step_phase(tmc5160 stepper_x)"], ["average_by(adxl345(hotend,x),step_phase(tmc5160 stepper_x))"]]'
scripts/motan/motan_graph.py -s 6.3 -d 2.4 test_x -g '[["trapq(toolhead,x_velocity)"], ["step_phase(tmc5160 stepper_x)"], ["average_by(adxl345(hotend,x),step_phase(tmc5160 stepper_x))"]]'

Here are the charts (I enlarged two full phase periods) for forward move (top) and backward move (bottom). Note that I run the printer at 256 full microstepping without interpolation (hence the phase is in the range from 0 to 1023).

Here I noticed a few things. First, the data is non-periodic at full steps (256 microsteps) and at double-steps (512 steps), even though with double-step period, it is almost periodic with some small deviations. Though these deviations are repeatable across many different runs, so I think this is not an error of the measurements.

Second, backward and forward pass look almost identical modulo negation when acceleration is plotted with time as X axis (in the backward pass, the toolhead moves in the opposite direction, so acceleration naturally is inverted). However, I would normally expect the acceleration to demonstrate that symmetry when plotted with a stepper phase as X axis. Basically, I’d normally expect the “disturbing force” of a stepper to be the same at a certain stepper position, and that force to accelerate the toolhead when it moves in one direction, and decelerate it - when it moves in the opposite direction. Here we observe a different effect - as if poles of the stepper motor repel or attract the rotor regardless of which direction it moves.

If the latter effect is true, it means that a simple model to compensate for stepper motion inconsistencies like

X'(t) = S(X(t))

with S function accounting for stepper motion non-linearities won’t work, because the effect is direction-dependent. One would need to use something like X'(t) = S(X(t), dX(t)/dt), which becomes quite more complicated to implement and tune.

Another thing is that the results depend on the velocity. That is, at different velocities I get quite different acceleration profiles (but also repeatable within the same velocity). The most plausible explanation to this effect is that, depending on the speed, you may hit different resonances of the frame, which may amplify the vibrations (up to a few times) at different frequencies, changing the acceleration profile substantially.

The raw data from my test is attached:
test_x.zip (429.7 KB)

One can also repeat a similar test on their printer using mainline Klipper, and use the linked analyzer to process the results.

Topic		Replies	Views
Motion analysis by stepper phase Developers	22	4603	May 26, 2024
TMC Adaptive Microstep Table Developers	32	2955	January 21, 2025
Implement stepper shaper Developers	18	416	May 11, 2025
Experiment with hall effect angle sensors Developers	14	5867	November 12, 2024
Klipper 400MHz limitations Developers	31	1689	March 30, 2025

Porting some idea's to Klipper, freq / amp modulated output shaper, (can someone suggest a proper name for this methodology?)

Related topics