Strain Gauge/Load Cell based Endstops

The code is not up to date right now, I’m still cleaning things up from my debugging session over the weekend.

The C code supports resetting endstop EMA filters with a dedicated command. In the update I have implemented automatic setpoint initialization and settling. This needs to be enhanced further to provide for a faster settling EMA alpha value on the trend filter during the settling period (maybe sample_alpha + 1 ?). With a data rate like 7K SPS settling over 100 samples happens in just 14ms. You could reset and settle for each probing move with no noticeable time lost in the probing cycle. But at 8 SPS… that’s gonna be so slow.

The python code needs an update to actually support this resetting. There needs to be a reset and then an appropriate pause for settling when a probing move starts.

The “driver” is just 1 function in c that does sample collection: klipper/sensor_load_cell.c at 417b25d5cd55e65437732b868beb33a5f0364ee7 · garethky/klipper · GitHub
Everything else, configuration, starting/stop sampling etc. is in Python.

And I haven’t tested actual probing yet. i.e. I wouldn’t try this on hardware you don’t want crashed until I validate that part works!

Thanks for the explanations. Can’t you simply initialise the EMA with a single measurement when resetting it?

Let me know when you have updated the code. Don’t worry, I know how to test such things carefully without destroying my printer :slight_smile: I will implement first an absolute maximum raw value as a safe guard which shuts off the printer immediately when it is exceeded. Also, I will not test right away with the hotend touching the bed of course…

Btw: I am now in the process of simplifying my current implementation of the load cell probe. After some discussion with Kevin I decided to reduce the number of configuration parameters by deriving as many as possible from a few physical quantities which can be measured (likely even automated). I now also derive the step size for the initial approach from the maximum acceptable force (with a big safety margin) and the stiffness of the system (force increment per Z distance).

On my printer, I now end up with a step size of 0.16mm, which is a bit more than I had manually configured before. If I assume an 80 Hz sampling rate (like the HX711), this would give me an a-priori speed of around 0.16mm * 80 Hz = 12.8 mm/s. That’s already above my normal homing speed of 7 mm/s on the Z axis.

So maybe after all we don’t need so high sampling rates? Even if my computation is a bit simplified (might need to take into account some processing delays etc.), I could imagine a speed of 7mm/s would be realistic in the end with only 80 Hz sampling rate, if I am ok with ~0.16mm “reaction path”. Faster speeds will anyway create other issues, since the motors might overshoot at some point…

I am doing that in the latest code, but the sample values are noisy, so you can get very unlucky with the initialization value. If the trend alpha value is properly selected it should take a long time to settle out. Strictly speaking we want to average of all of the settling points to be the starting value for the trend filter. I’m still looking for a clever way to do that without allocating memory.

code updated: Commits · garethky/klipper · GitHub
gist of the html debugging tool: Temporary load cell data capture and display Tool · GitHub
you need to put the tool in your front end web server folder on the pi (such as fluidd or Mainsail) otherwise you will get CORS and mixed content errors.

I solved the settling problem with a moving average function:

static int32_t
average(int32_t avg, int32_t input, int32_t index)
{
    return avg + ((input - avg) / (index + 1));
}

Requires no arrays of memory, risks no overflow and uses only integers. Now all you have to do is specify a number of settling points and it handles the rest.

I wont have a lot of time to work on this until next week. Next major topics are all in Python:

  • validate that load_cell_endstop triggers the trsync and stops an axis.
  • develop a calibration routine to pick values for alpha, deadband and so on.
1 Like

FWIW, I’d be curious to see the results of the ADS1100 in 128 samples/s mode. I would guess that more samples would be more important than a few bits of resolution in this context.

Cheers,
-Kevin

Looking at this a little closer, the ads1100 is a delta sigma adc. That’s a very good indication that a sample at 8 samples/s is the same as averaging 16 samples at 128 samples/s. (At it’s core, a delta sigma adc can be thought of as a 1-bit adc that is oversampled at a high rate.)

-Kevin

1 Like

The EMA sample filter is there to sort out the noise. Give us all of you same rate.

One topic I want to discuss before this becomes a PR is what data we want to return to Klippy by default. It would be cheaper to only return a max number of samples/second (say 100). Only filter calibration would need every sample. Also, after we are convinced the EMA filter works, we don’t really need it returned to klippy except for debugging. Hopefully that will reduce load on the MCU and Klippy.

Good points. Even without thinking about averaging and noise filtering I see potential to increase the sampling rate at my particular setup. The manufacturer of my 3D printer seems not to have but a lot of thinking how to configure the ADC optimally. They simply used the maximum resolution (which comes with the slowest frequency) at the lowest gain.

This does not even allow to use the full ADC range without violating the maximum acceptable forces. I would usually consider ~8000 ADC counts as the maximum acceptable force - much higher forces have the potential of damaging the load cells. This means, 13 bits are enough in principle. If I increase the gain to 8 (which is the maximum), I should end up with the same force resolution.

Unfortunately, 13 bits cannot be configured. I think, a resolution loss of one bit is even ok (12 bits give me 128 SPS). I am currently achieving repeat accuracies of 2 micrometers or so, which is already a lot better than the repeat accuracy of the end switches. I am not even sure if losing a bit of precision has any impact on that at all. So I guess, using full speed at 128 SPS with a maximum gain of 8 is definitively worth a try, even if I don’t (yet) use an EMA filter.

Of course, these considerations are quite specific to my printer model, but the generic message is: the ADC resolution is not super critical, 12 bits are probably fine (if the gain is properly chosen).

I will try that out soonish.

PS: Even with delta-sigma ADCs, averaging is not the same as more precise conversions, if the analogue signal has low noise (much lower than quantisation noise). You will often read the same value and won’t gain any information by averaging. The load cell read out of my printer is actually quite low noise, since the bit resolution is already very low. I have roughly 1 ADC count per gram weight.

PROBE_ACCURACY

… and then it shut down. The load cell encountered an error and the endstop immediately shut down the MCU. The error is a bug but the shutdown is expected safety behavior. So it can probe! :smiley: But its totally unreliable :weary:

The good news is it can pick up a collision event long before the frame of the printer shows any signs of deformation. This is with a 1KG load cell. I have a 5KG one that I need to print a test fixture for. Code is updated to fix shameful bugs in the c code. Also I implemented a basic calibration routine that can pick parameters for the load cell endstop.

Bad news is there are bugs I need to investigate:

  • Sometimes QUERY_PROBE reports TRIGGERED and stays that way until the endstop is reset. The graph shows that it is for sure not triggered. Thats a bug.
  • sensor_load_cell occasionally throws errors. That’s OK most of the time but I made it a fatal error during probing. I need to track down the source of the errors. Its worse the higher the read frequency.
  • sensor_load_cell is reporting a lot of duplicate readings. That means that the time between reads is slightly less than the interval between measurements being prepared by the sensor. I see patterns that look like rolling shutter, which is classic for a mismatch between the frequency of the sensor and the signal.

I’m going to rebase on top of the latest code and take a second look at sensor_angle for any tricks I missed.

I’m not sure how I would do this safely, but if I could read the miso_pin on the SPI bus, that effectively tells us if there is a new sample ready. If I could do that, I could sample the pin at say 4x the sample rate and just exit the sample function if the pin is high.

The SPI interface doesn’t give you the miso_pin, that seems to be MCU code specific. I could pass the pin id in from klippy, but I’m not sure if bad things will happen when I call gpio_in_setup on the pin.

  • Errors == duplicates. This was not ignored in the C code.
  • TRIGGERED == bad electrical connection on my part.

Testing with: PROBE_ACCURACY SAMPLE_RETRACT_DIST=3 PROBE_SPEED=5

@100 SPS
// probe accuracy results: maximum 27.325139, minimum 27.263160, range 0.061979, average 27.290785, median 27.288247, standard deviation 0.019441
@400 SPS
// probe accuracy results: maximum 27.297396, minimum 27.254306, range 0.043090, average 27.275969, median 27.276441, standard deviation 0.014316

Trying a slower speed with: PROBE_ACCURACY SAMPLE_RETRACT_DIST=2 PROBE_SPEED=1
@400 SPS
// probe accuracy results: maximum 27.352292, minimum 27.323958, range 0.028333, average 27.340250, median 27.338715, standard deviation 0.008303

More SPS == smaller range, slower speed helps but not as much as SPS. And my test fixture is plastic, so it has too much flex.

At 1200 SPS and higher the printer starts behaving strangely. It stops during the retract move as if the trsync was triggered. Then it puts the nozzle down and just stays there frozen recording samples. :exploding_head: Cutting the rate that the MCU reports samples to Klippy helps but its not a complete fix. My guess is there is some kind of watcdog on the trsync that I am violating at higher sample rates causing it to trigger. Not quite sure what I can do about that.

That looks like good progress. Beware that not all errors will show in the standard deviation of the PROBE_ACCURACY test. You basically have to add at least the moved distance between two sample points. This is not shown in the test, because it always start from the same z position, so you are always probing the same distance. There will be other effects, too, e.g. thermal dependencies and variations in the bed surface stiffness, latencies between steps and ADC readout etc. I also don’t have a good way to determine all these effects on the accuracy (though my method might suffer from fewer problems at least ;-)).

Also I am wondering how we can allow the use of your code with different ADC drivers. Have you already put some thought into that?

1 Like

My test fixture holding the load cell is a block of plastic. It’s far from ideal and I need to eliminate it as a source of error. Ideally I want to see 0.001 repeatability as I know switch based probes can do that.

I has the SPI speed set to 8MHz but that’s incorrect, it should be 25MHz. SPI transfer was taking up 60% of the runtime before fixing that.

Anything beyond 2Hz sample rate makes my Pi 3 very unhappy. Reducing the report rate (reporting every n samples) saves the Pi but now I can’t see where triggering might have happened.

Maybe I’m just seeing phantom triggering at higher rates. I’ll try to get to the bottom of that. Until it’s gone I can’t trust the results fully.

Short answer is yes pretty easily. Let me get you some proper developer documentation.

1 Like

Well… I guess it was bound to happen…

It crashed, smashed the load cell and bent the cross beam. Not sure why it happened. I actually have a graph of the event that clearly shows the measurement leaving the deadband which means it should have triggered. I got no logs because I hit the power button.

I learned that the probe code doesn’t drain the move queue before it issues the homing command to the endstop. So I added a toolhead.wait_moves() in my probe_prepare() so that the retract move completes and the load cell has a chance to cleanly exit the trigger state before the trsync is enabled again.

With higher sample rates I can trigger in fewer microsteps, so the overshoot is less. This was causing the trigger setpoint to be right on the boundary of the deadband. And this caused the deadband to drift upward while the toolhead was stopped. This is a pretty classic debouncing problem. I re-used the settling count to set up a debounce variable. It counts up for successful in-band samples and resets to 0 on the first out of band sample. This means that it wont shift the deadband unless the signal has been in the deadband for the settling time.

The best I got was a range of 0.017 @ 5mm/s @ 14K samples/s. The endstop switch has a range of 0.00059. I believe on this z-axis that’s 1 microstep, so the load cell range is ~28 microsteps wide. Not entirely encouraging.

I need to take a step back and build a test rig before I break this printer.

Ouch, I hope the rest of your printer is ok.

As a safeguard, I would also recommend implementing a maximum absolute force value which will immediately shutdown the firmware if exceeded. This absolute value shall not be affected by any offset compensation or averaging etc., so that a single ADC measurement exceeding the value is already enough. Also I would recommend to check the abs() value of the ADC measurement, just in case someone mounts the load cell flipped around… :wink: I have exactly this safeguard in my code, and it probably has saved me a couple of times. (Sorry I should have probably mentioned this earlier, I just didn’t think of it…)

So I have crash_min and crash_max (see here) settings implement the crash detection barrier. They are absolute values and the calibrator sets them to +/- 5x the width of the noise in the signal from the average at calibration time. The crash check is always on but only stops the printer during homing moves via the trsync (not a shutdown). There are shutdown calls if the load_cell reports an error or reports that it stopped collecting data. Both of those work (have been tested and saved the printer).

Endstops in Klipper only work during a homing move (they have to be activated with a trsync). Commanding a normal move that hits the endstop won’t stop the printer. We expect the extrusion force during a print (1-2Kg) to greatly exceed reasonable crash limits (500g? less?) and we cant have a shutdown then. So if you want shutdowns as insurance you either need to be able to change the threshold at print time or turn the check off entirely. I don’t like either of those as they are carry their own risks.

Not sure why the crash barrier did not kick in this time. Of course had I just changed several bits of code so its probably a bug I introduced.

width of the noise in the signal

Beware, the width of the noise observed at calibration time can be 0 (and hence 5x that width will still be 0). My printer with its ADS1100 typically has very little noise, chances are big that 10 consecutive samples are identical. I am adding the equivalent of 1 LSB for similar considerations.

I think using a narrow crash limit is not the best approach here. Think about it the other way round. The forces required to permanently deform a load cell exceed their rated weight force significantly. Hence you can conclude after this crash that your load cell was the weakest point. This is good, because you know its limit. I would set the crash limit such that this limit cannot be exceeded, unconditionally at all times. This is similar like the temperature limits of heaters: it is not safe to go beyond a particular point, no matter how we got there, so we must shut down everything once we see a value outside the safe range.

Such limit has the big advantage of being simple and stupid, that makes it less likely not to be effective when needed. Load cells are (as far as I know) normally quite “symmetric” around 0, so under idle conditions (without weight) you will measure absolute values somewhat close to 0. In a real printer setup you will typically measure roughly the weight of the hotend under idle conditions (as a positive number, if the load cell is not upside down). The contact force between hotend and bed will firstly reduce that positive value, because it works against the weight of the hotend. Still, a hotend weighs around 100g, even small load cells can take 1kg.

I am pretty sure if you had such absolute and unconditional limit in place, set to 1kg or slightly below, the damage would not have occurred. Still you will not suffer any negative consequences, because exceeding this limit is actually problematic. If you were running in such conditions regularly, the load cells were not dimensioned properly. (That said, I do consider 1kg too small if the load cell sees the extrusion force - you should use at least 5kg in that case.)

PS: From experience I can tell that an E3D v6 hotend can easily take forces of 5 to 10kg without being damaged - that’s the range where my safety kicks in. I have tried that out a couple of times unwillingly :wink: Also the original heat bed surface of my printer is not exactly the most robust material, since it is ceramics and others have managed already to rip chips out of the surface when printed material was too sticky. I only heard once of someone damaging his bed surface in a crash, he was moving full speed and our ADC originally reads out only with 8 Hz, so it takes way too long for the safety to react. So I think it is not a big problem to have a relatively high safety threshold. It is much more important that it reacts fast and unconditionally.

I’ll hold off on code updates until I build the test rig and root cause the issue. Parts have been ordered…