Motion analysis by stepper phase

A couple of months ago I attempted to see if the “motion analysis” (motan) tools could be used to facilitate spreadCycle tuning. The high-level idea was to use an adxl345 accelerometer as a substitute for a “current probe” during the tuning process. That is, to see if it is possible to detect mechanical vibrations induced by poor TMC driver settings and use that as a tool for finding good TMC driver settings.

Unfortunately, the experiment didn’t work well, at least on my Voron Zero test printer. No matter what settings I chose, I couldn’t induce a mechanical jitter (or, at least couldn’t find jitter in the data with my simple analysis tools).

The test code is available at: https://github.com/KevinOConnor/klipper-dev/tree/work-motan-20211124

The code is very raw and likely only of interest to other developers.

At a high-level the analysis idea involves creating macros (see config/sample-phase.cfg on that branch) that emit timestamps (via action_call_remote_method("motan_log")) at the start and end of particular test events. This data is recorded using the ./scripts/motan/data_logger.py tool. It can then be analyzed by the new ./scripts/motan/phase_graph.py tool. That tool collates the data by stepper phase, determines the median value for each sensor for each stepper phase, and then graphs the results. Unfortunately, the phase_graph.py tool is currently pretty slow (on some of my tests it could take a few minutes even when run on a desktop class machine).

@dmbutyugin - FYI. I saw that you did something similar. I haven’t looked closely at your implementation, but I thought I would publish my previous work in this area in case it is interesting to you.

-Kevin

Here is an example graph I generated after modifying the TMC HEND setting:

According to the TMC specs, I should have seen some increased jitter on low HEND values. But, as a whole, all the tests show remarkably similar results. (There is variation, but it seems mostly to be run-to-run variance and not indicative of a change behavior due to HEND tuning.)

Since running the above test I found that my stepper motors were set to irun=16 when they could have been set to irun=31 (see Rework tmc run_current selection to prefer vsense=1 by KevinOConnor · Pull Request #5150 · Klipper3d/klipper · GitHub ) and it seems that the HEND setting is impacted by irun. So, it’s possible I just need to rerun the test now that the steppers are using a better irun configuration.

It’s also possible that using a “median of values at each phase” isn’t a good way to detect induced mechanical jitter. An FFT analysis may be a better approach.

-Kevin

EDIT: FYI, in the graph above, the data on the left is for “forward direction moves” while the data on the right is for “reverse direction”.

EDIT2: FYI, the above graph was generated from data collected using a variant of the TMC_TEST_HEND macro found in config/sample-phase.cfg on the test branch. It was generated with the tool ./scripts/motan/phase_graph.py stest-20211212/hend_100.

@dmbutyugin - at Porting some idea's to Klipper, freq / amp modulated output shaper, (can someone suggest a proper name for this methodology?) - #7 by dmbutyugin you indicated that you saw a repeating pattern by phase during your accelerometer analysis.

I also saw a repeating pattern - as can be seen in the “HEND graph” above. There does appear to be a correlation between accelerometer and stepper phase, as well as a correlation between angle sensor data and stepper phase. There is also a pattern in the forward and reverse direction - and indeed the pattern seems to change slightly based on direction. It’s not entirely clear to me if the change in pattern due to direction change is due to a systemic difference or due to run-to-run variance. (Though stepper motor lag definitely dominates the angle sensor data and that is clearly influenced by direction.)

-Kevin

Thanks, Kevin. I think my approach is very similar, and only the implementation is somewhat different (I didn’t make any changes to emit timestamps, so I had to manually restrict the processing of the logged data to a regions of interest). I also used some numpy “magic”, like unique and reduceat functions to speed up the post-processing of the results considerably. The plot script works very smoothly on a desktop-class machine.

According to the TMC specs, I should have seen some increased jitter on low HEND values.

It might indeed be the case that the steppers are operating in such conditions that they are not exhibiting the jitter (it might be the case for 3D printers in general, or may be more specific to your configuration). TBH, from reading the TMC datasheets, it seemed that the jitter, even if present, could end up being high-frequency, something that accelerometer wouldn’t be able to detect (its limit is at 1600 Hz due to the sampling rate).

… indicated that you saw a repeating pattern by phase during your accelerometer analysis.

I think more like ‘reoccurring’? The generated pattern is ultimately accel(phase) and is not periodic. But it is periodically repeated (as phases periodically repeat themselves with time) in the charts for plotting purposes. Though the repeating pattern does exist in the raw accelerometer readings, just with some variability and noise at different positions.

There is also a pattern in the forward and reverse direction - and indeed the pattern seems to change slightly based on direction.

In my experiments, I could tell that the patterns between forward and reverse directions are reversed in the manner unexpected by me:

backward_accel(phase) ~= -forward_accel(num_phases-phase-1)

* I expected it to be

backward_accel(phase) ~= -forward_accel(phase)

I could tell that it is like this due to asymmetries in the forward_accel and backward_accel profiles. These asymmetries allows one to map different parts of the forward and backward motion to each other, and see that backward_accel(phase) != -forward_accel(phase) with high certainty.

It’s not entirely clear to me if the change in pattern due to direction change is due to a systemic difference or due to run-to-run variance.

In my experiments, this seems to be a systemic difference. That is, the profile itself and its assymmetries are repeatable across multiple runs of the same test (and even between different days with the machine powered off in between).

Yes. Some time after I ran these tests I tried going through the tmc2209 spreadcycle spreadsheet and I found that it too recommended a very low HEND. It was in that process that I found that HEND is dependent on IRUN and that I had a poor IRUN setting. I still need to rerun the tests with those changes.

I’ve found the accelerometer and angle sensors to be surprisingly accurate. If there is some jitter that it can’t pick up, I’d be surprised that “the print” could pick it up either. So, same end result - nothing to tune. That said, as above, the sensors (and “print”) may observe the issue, but my analysis of the data may be lacking.

I agree the data should be mirrored. I’m not sure I understand what you are reporting though. I updated my phase_graph.py tool and ran your data through it:


This is with commit 17fb2bd2 and the command: ./scripts/motan/phase_graph.py dtest-20220124/test_x -m 'tmc5160 stepper_x' -r '[[3.25, 5.65, "x"], [6.3, 8.7, "x"]]' -g '[["adxl345(hotend,x)"]]'

It looks to me like the “reverse” graph is mirrored across the X axis and is shifted about a quarter full step to the left. This seems about right to me - the acceleration is mirrored because of different movement direction and the phases are shifted due to stepper lag. This was with stealthChop, I assume? Am I missing something?

-Kevin

EDIT: It seems the graph labels are a little misleading in the graph above. I think the left graph is actually the “negative cartesian x motion” and the right graph is “positive cartesian x motion”. The tmc driver application of phases is goofy, so it’s not immediately clear to me which one is “incrementing phase” and which one is “decrementing phase”.

If there is some jitter that it can’t pick up, I’d be surprised that “the print” could pick it up either.

I agree. I mean it more like, high-frequency jitter may result in unpleasant noise, hissing, and/or higher power dissipation in stepper motors.

I agree the data should be mirrored. I’m not sure I understand what you are reporting though.

Well, I was reporting that you not only need to mirror over Y axis, but also X axis. This is unexpected by me. For instance, if you think about detent forces in the stepper, such force F depends only on the position: F = F(phase). It won’t mirror if the stepper rotates in the opposite direction. However, the data suggests that whatever forces are acting on the toolhead during constant speed motion get flipped when direction is reversed. Essentially, accel_backward(phase=10) == -accel_forward(phase=1013) and accel_backward(phase=500) = -accel_forward(523).

Why I think it matters: if that’s really true, it makes a simple output_shaper like

X'(t) = OS(X(t))

infeasible. In fact, earlier I did some experimentation where I tried to make an output shaper per stepper motor based on cubic splines. And indeed I was able to tune it on a forward motion somewhat to reduce the amplitude of vibrations, say, in the forward motion, from ~10K mm/sec^2 to ~1K mm/sec^2, which indicates that it sort of works. However, the backward motion was not affected so much, and I did not observe a reduction in the vibrations when the toolhead moves backwards. So, you need to compensate for forward motion and backwards motion differently. This also indirectly confirms the results I showed above.

And ultimately, well, it’s too bad. Because it seems that a simple output shaper won’t work. Not unless some clever scheme can be devised, which I’m not sure is possible. Of course, I may be missing something.

… phases are shifted due to stepper lag. This was with stealthChop, I assume?

No, as I mentioned in the other thread, this was done in SpreadCycle mode and 256 microstepping mode without interpolation. I used TMC spreadsheet to calculate the SpreadCycle parameters using the theoretical data from the stepper datasheet (it probably does not matter much, I just mention it for completeness).

[tmc5160 stepper_x]
...
interpolate: False
run_current: 0.800
stealthchop_threshold: 0
# For 17HS15-1504S
driver_TBL: 2
driver_TOFF: 3
driver_HSTRT: 2
driver_HEND: 3

The tmc driver application of phases is goofy, so it’s not immediately clear to me which one is “incrementing phase” and which one is “decrementing phase”.

[stepper_x]
dir_pin: P2.6

so this probably means that the forward motion [3.25, 5.65] has phase increasing, and backwards motion [6.3, 8.7] - decreasing.

EDIT: to clarify: the experiments I did with the output shaper were also done in SpreadCycle mode, where it makes sense to adjust the timing of the stepper pulses. That’s not really viable in StealthChop mode.

Okay, I understand.

FWIW, I didn’t notice a similar pattern on my Voron Zero. That may be due to corexy kinematics and also possibly more measurement noise in general.

I need to rewire the accelerometer to my Zero to run more tests. But, just as an example, I graphed a random segment of data from a capture taken a couple of months ago:

I looked into that further last night. In case you are curious, the Klipper code always reports an incrementing step phase with an incrementing nominal stepper position. So, for example, if on a cartesian axis, then a positive X move will result in a positive nominal stepper position, and thus an increasing reported stepper phase. On the tmc drivers, a non-inverted dir_pin will actually result in a mirroring of the reported step phase (reported_phase = 1023 - tmc_phase). So, on your particular printer, with a positive X move, the TMC driver will actually be internally decrementing the step phase. That said, it also occurred to me last night that this doesn’t matter, as the tmc sine wave is always a mirror image of itself anyway. So, it doesn’t really matter if it is incrementing or decrementing internally.

-Kevin

P.S. Graph above was with corrected labels (commit 58a3d5a6) and with: ./scripts/motan/phase_graph.py stest-20211129/moves_10 -m 'tmc2209 stepper_x' -r '[[28.0, 28.5, "x"], [29.0, 29.5, "x"]]' -g '[["adxl345(adxl345,x)"],["deviation(angle(angle_x),stepq(stepper_x))"]]'

It looks like Prusa have implemented this idea in their firmware:

C++ code: Prusa-Firmware-Buddy/lib/Marlin/Marlin/src/feature/phase_stepping at f69118723a41ecef761be9df96205b9cf4571331 · prusa3d/Prusa-Firmware-Buddy · GitHub
Python tools: Prusa-Firmware-Buddy/utils/phase_stepping at f69118723a41ecef761be9df96205b9cf4571331 · prusa3d/Prusa-Firmware-Buddy · GitHub

They are using the Direct mode / writing the phase currents to the XDirect register. But its not clear to me if they are doing that just for calibration or for all stepper movement. That will take a deeper reading of the code to understand how they did it.

(I have an XL and this work very well)

And if I got the video correctly, they seem to use only the accelerometer for calibration, no angle sensors or anything, which means it wouldn’t need extra electronics in many of the Klipper printers!

This is definitely very interesting! I intend to look into their implementation in the coming weeks. I think the most interesting part is tuning and the form of the distortions they introduce and optimize for. Briefly skimming through their implementation, it seems they introduce sinusoidal adjustments, and the optimization performs the amplitude and the phase offset adjustment. If that is true and it works well for Prusa XL, perhaps the same can be implemented in Klipper. FWIW, Klipper does not currently support direct current control mode of TMC drivers, and we may run into performance issues if we try it. But perhaps corrections of similar nature could be introduced to the kinematics position calculations (so that continuous motion would no longer be such, but instead would be oscillating based on the tuned profile). Not sure if that would work well though and be robust enough, but it is probably worth a try.

2 Likes

It wasn’t immediately clear to me if the used the current control only for calibration or for the entire print.

Also I’m not sure how they handle belt skips? I.e. if the phase de-syncs from the position how do they re-acquire sync?

Yes, I’m not quite certain myself yet. However, even if for calibration, Klipper does not support that at all.

If I understood your question correctly, they implement direct current adjustment for each phase. Then it is pretty much self-synchronizing thing: a given current configuration (currents on coil A and B) uniquely identifies a stepper position with a period of 4 steps. Since they tune and adapt their changes to 4 steps maximum (so they do not adjust for belt motion and typical 2mm ripple defects with GT2 belts, or stepper motor irregularities that have longer patterns), they do not need any further synchronization.

I skimmed through the Prusa code for the phase_stepping feature and here is what I’ve found:

  • phase_stepping can be implemented with 2 “backends”, both are compile-time options:
    • burst_stepping and
    • quick_tmc_spi
  • phase_stepping is only compiled for the XL printer(s)
  • A special xl-burst cofig compiles with the burst stepping backend

When phase_stepping is used, the method for controlling the motor changes drastically.
Where for regular step-generation, the motor increments in fixed steps and the timing of those steps is varied, in phase_stepping the time increments are fixed and the motor position (or for burst_stepping increment size) is varied.

  • With burst stepping an output rate of 10kHz is used
    • For each 1/10kHz time-window up to 200 level-changes on each pin can be emitted. These will be spaced out approximately evenly.
  • With quick_tmc_spi the output rate is 40kHz, though as I read the code, communication with the X/Y motors are interleaved so the effective output rate is 20kHz.
    • The motor coil currents are set directly using spi communication based on a lookup table.

The anti-cogging, ie the method for reducing motor vibration, is implemented slightly differently for the two backends. I havend studied this in detail.

  • With burst_stepping the requested electrical phase (communicated to the stepper driver through step/dir) is modified based on the phase, where an offset from a lookup tabe is added/subtracted. Basically within an electric revolution, the electric phase speed will increase and decrease to counteract the forces from cogging
  • With quick_tmc_spi the coil currents are output directly. Anti-cogging is performed by reading the currents from lookup-tables that can be non-sinusiodal. It looks like they use a weighted sum of phase-shifted sinusoids, like a fourier series.

As I read the code, the lookup tables are speed dependent, though I have not checked if it’s only the direction being used.

Both burst_stepping and quick_tmc_spi makes heavy use of HW acceleration in the form of timer-triggered DMA transfers and the code is full of comments about why the code is as it is to ensure best perforemance. All of which is (and has to be) very target specific.

With the main goal of Klipper being a cross-platform controller that can run on basically all 3d printer controllers, I think it’s going to be difficult to implement this feature in a way that doesn’t require an implementation per board.

@dmbutyugin is probably right that something similar to burst_stepping can be integrated into the kinematics of the printer, but I’m unsure if the bandwith between host/mcu can handle sending stepper commands at 10kHz, maybe more as I expect Klipper users to move the gantry faster than the Prusa XL.

2 Likes

Thanks for also looking into the code. I agree with your assessments of the two implementations. It also seems that with the burst stepping option, they periodically update the required corrections for the stepper (with this 10 kHz frequency?), so it means that the stepper motor should not rotate too much within that time frame. So, I think both implementations limit the attainable velocities:

  • For burst_stepping, let’s say only 1/4 of 1/4 of sine wave should be traversed max in 0.0001 sec, before the next correction is applied (which is fairly generous), this gives for a typical printer 0.2 mm / 4 / 0.0001 = 500 mm/sec maximum, but in practice it should be smaller than that for corrections to be effective.
  • For quick_tmc_spi, 20 kHz per stepper is 0.2mm / 64 * 20000 = 62.5 mm/sec at 64 microstepping. So you either have to use very coarse microstepping (I’m not sure, but I think interpolation does not work in direct current mode?), or have to live with very low speeds.

FWIW, I’m not sure which mode is used in the production version of their firmware for XL. Also perhaps I messed up the calculations above somewhere.

Since Klipper implements steps compression, it does not need to send every individual step pulse. And I’m not sure where does 10kHz limit comes from for Klipper. However, I do agree with you that this will substantially increase the traffic between host and MCU, which will rule out serial connections and perhaps CAN bus. That is unless the corrections are put into the MCU code itself, but that has other performance downsides and substantially increases the load on the MCU CPU. On the other hand, at very high velocities the sine wave through the coils get severely distorted due to back EMS, as was shown by eddietheengineer (video), so at those speeds it does not make sense to do any phase adjustments - the actual current won’t follow them anyways, and the majority of the noise AFAIU comes from those current distortions and non-continuous application of the torque.

1 Like

concerning the hardware specific implementation: it would be possible to enable this feature only for boards which are capable of this additional load ( enable it via menuconfig ). With this approch it needs some high level and low level implementation

Agreed.

For what it is worth, it may be possible for the host to use “anti-cogging step scheduling” at slow speeds, and use regular step scheduling at medium/high speeds. That may alleviate the bandwidth and processing limitations. It’s not immediately clear how one would go about coding that though.

Cheers,
-Kevin

To expand on this, i’m 95% sure at this point that those defects are exclusively caused by the motor resonance / vibration interacting with the belts and thus creating resonance matching the belt feature harmonics. That’s also why you’ll see those defects change with speed - even disappear completely. We did quite a bit of analysis a couple years ago and found that the vast majority of linear motion resonance can be correlated to a 2gt belt / idler / pulley feature, see: worksasintended/klipper_linear_movement_analysis (github.com).

The speed intervals at which 2gt artifacts show up on prints are largely dependent on motor type and driver configuration, belt tension makes little to no difference. Regardless of configuration and motor, there are always intervals which will generate motor resonance - at least from the steppers and config options we’ve tested.

1 Like

I may have expressed my thoughts not very clearly. I was just pointing out that the ripples are associated with 2mm belt pitch. I have been following some investigations around those ripples, and the conclusions I and some other folks arrived at are that

  • ripples are mostly connected with the belt and motor pulley contact: for example, if you have XY/2 (YX/2) kinematics with 1/2 reduction, the ripples reduce the pitch accordingly; so they are not dependent on belt contacting idler pulleys that much, though there are instances where such contact may play some role in inducing the ripples;
  • ripples (at least the large ones) can typically be associated with the printer resonances - if you have ripples at velocity V, you can typically find some resonance at frequency V / 2 (mm) and vice versa (but some of the resonances can be associated with a torsional twist of the axis, for instance);
  • motor and driver configuration may have an effect on VFA and can have some small impact on GT2 ripples, but it is almost never a case that replacing a motor or changing the driver configuration would help getting rid of 2mm ripples completely.

On the topic of motor de-cogging and reducing GT2 ripples, I went ahead and implemented a simple correction scheme for now:

x_c = x + sum(A[i] * sin(2 * pi / L[i] * (x + offs[i] + v_x * v_offs[i])))

where x is a non-corrected stepper position, v_x its velocity, A[i] and L[i] is a magnitude and period (in mm, typically would be 2mm, 0.4mm, 0.2 mm and perhaps 0.8mm) of correction, and offs[i] and v_offs[i] are offset coefficients. Unlike Prusa’s implementation, Klipper cannot have separate and independent ‘forward’ and ‘backward’ correction tables, and this v_offs[i] coefficient seems to be important for the corrections to be different for forward and backward motion, while still being continuous when switching stepper direction and not creating stepper jumps. For now it is an open question whether this simple scheme would suffice, or whether, perhaps, a more elaborate velocity corrections than v_x * v_offs[i] are necessary, e.g. instead of just v_x perhaps some saturating function would be necessary, such as atan. Similarly, I thought that speed-aware ‘anti-cogging’ can be implemented by making A[i](v_x) = A[i] * (1 - smoothstep(v_x - v_threshold)). Then above v_threshold the coefficients A[i](v) would quickly decay to 0, effectively disabling the corrections. But we can get there once we test the performance of the code as-is, I think. At the moment, I got somewhat promising results from manual coarse tuning (on just the accelerometer results, no real prints yet), and I’m putting together some more automated procedure of tuning these correction coefficients, and then we’ll see how it goes.

3 Likes

This is perfectly in line with my experience as well.

I think there is even a higher impact of motor resonances on VFAs.
This is dependant on the used print speed.
There are speed bands where those resonances have their peaks and produce visible VFAs.
Outside of those bands those VFAs are barely visible.
This could correlate with the motor topic of @TheFuzzyGiggler.

However belt tension shifts from GT2 induced VFAs to motor resonance induced VFAs and vice versa.

1 Like