Strain Gauge/Load Cell based Endstops

This error is something I need to work with @kevin on. Something else cause the printer to shut down. If you look further up in the log file you will find the root cause exception message. e.g. it could be a timer too close.

Details:

Unfortunately users look in the log file from the tail up and report the first exception they see. This worked fine until sensor_bulk came into the picture. Now whenever something causes a shutdown in a printer with a bulk sensor you get this error at the end of the file, which isn’t helpful.

What happening here is during the shutdown, the process that’s expecting to receive bulk sensor data from the MCU fails to get that data and logs an exception. There needs to be a check for the shutdown condition and then the error should be suppressed. If the MCU shut down, not getting data is expected i.e. not exceptional.

yes @garethky it’s just the last log and not root cause, my issue is host TTC and shutting down, nothing wrong with loadcell stuffs

I’m wondering if the output of a cheap Module Flex Sensor could be sent to the bltouch input on the motherboard.

Thanks for your work on load cells so far, @garethky! I’m working on Klipper on my Prusa MK4S, and am interested in getting reports from the B channel of the HX717 to use the filament runout sensor. I’m able to get sensible data out of it if I just configure a load_cell with gain B-8 (but then I obviously lose the actual load cell).

I’ve dug through a lot of the code in load-cell-probe-community-testing, and I think I undertsand most of the flow. I’m wondering if you have any thoughts as to how an alternative channel could be implemented, both in the data transfer and configuration. It’s also not clear to me where the best place to have this discussion would be.

I do recall you mentioning somewhere that adding the secondary sensor would be tricky from a Klipper architecture point of view. The ā€œobviousā€ solution to me would be to just interleave samples in the bulk sensor data. I’m not sure if this stream is consistent enough to just synchronize on sample number, but I can imagine a scheme where a few of the top 8 bits (since the sample data is 24 bit) can be used for the channel/gain information. For my high-level sensibilities this seems pretty hacky, I’m not sure if it’s acceptable.

For configuration, my first thought was to manually spell out the list of channel/gain configurations that the sensor should collect:

gain: A-128, A-128, A-128, A-128, B-8

But that seems tedious, and prone to user mistakes.

My second thought was to allow something like

gain: A-128
alternate_gain: B-8
alternate_gain_frequency: 32  # Gather B-8 every 32nd sample

Any thoughts or pointers welcome!

2 Likes

Well I would be really excited to see someone klipperize an MK4!

Interleaving has another more subtle problem: it destroys the timing information of the samples. Because we implemented timing as linear regression, this assumes the timing between samples is consistent. When you command the sensor to switch channels is restarts sampling on the new channel and you have to wait for an entire sample period to get the next sample. Any work on the current sample is lost, along with the time taken. We don’t use interrupts, so we cant switch inputs exactly as the sensor is done with a sample. There is always some delay before the switch. This kind of switching would result in longer gaps between some samples and invalidate the timestamps calculated for all of them. So, if you are probing, you cant do interleaving.

A simpler approach might be to recognize that you don’t need the load cell while printing and you don’t need the filament sensor while probing. A GCode command could tell the sensor to restart and send back data from either the A or B channel. Something like ADC_SET ADC=my_hx717 CHANNEL=B. In klipper we could break out the data streams so one goes to the filament sensor and the other goes to the load cell.

The config could look like:

gain: A-128, B-8

Prusa, in the Buddy firmware, implements this silently. When you probe it turns off the filament sensor and all the associated code for switching channels. They can do this because they are using interrupts and tag each sample with an exact timestamp. They can do the switch in a single sample window. But in klipper we would have to switch inputs and wait for the linear regression clock to stabilize back on the host, so maybe up to 1s of time. In practice you wouldn’t notice this because you could switch to the load cell at print start, long before the first homing move.

Super long term, maybe we want to do both things while printing so we can have data about extrusion force for some future use cases. That might require the interleaving, with the caveat that the time info is not absolutely correct. But I hope we can avoid crossing that bridge for now

1 Like

Thanks for the detailed reply, lots for me to process…

Well I would be really excited to see someone klipperize an MK4!

It has happened :slight_smile: https://www.youtube.com/watch?v=PESbOHXqyIw

I’ve sent some PRs to your load-cell-probe-community-testing branch that were required to make probing work, and I still have work to do on improving macros etc, but it’s working well enough that I don’t think I’m going to need to be switching back and forth between the Prusa firmware and Klipper.

1 Like

Oh WOW! I think you win the race for the first klipper converted MK4! Thank you so much for proving that out! I’m stoked to hear that you don’t think you need to switch back.

I have some specific suggestions for bed mesh config with this probe:

# Use aggressive move splitting to fight high / low spots. High values may cause streaking in the first layer
move_check_distance: 3.0
# I paid for 1 micron accuracy, I want 1 micron adjustments. The minimum is 10 microns. Having this too high can also contribute to streaking.
split_delta_z: 0.01
# use 10 x 10 probes
probe_count: 10
# more mesh PPS allows more exact bicubic curve expression. You want this math to result in a number less than 2x the move_check_distance. Adjust mesh_pps and prove_count to achieve this.
# e.g. 285mm / 10 probes / 5 pps = 5.7mm mesh resolution
mesh_pps: 5, 5
# bicubic: this is the only option for high probe counts
algorithm: bicubic
bicubic_tension: 0.2  # the default, I've tried other settings and there is no clear winner

I have a long draft post about this, but that’s the upshot. Because the probe is more accurate/consistent you can configure the mesh to more aggressively follow what the probe says. Less aggressive settings cause the mesh to smooth or ignore the probe data. This could be beneficial if your probe is unreliable, but is detrimental here.

I didn’t get notifications about your PRs, I’ll have to look at why GitHub didn’t let me know

1 Like

Thanks, those settings all make sense. This is my first time using Klipper and the options are pretty overwhelming, so these pointers are appreciated! I’d figured out the higher number of probe points, but hadn’t looked at all the other options yet :slight_smile:

1 Like

Still not perfect, but this is an improvement over what I had yesterday! Thanks!

2 Likes

Its pretty rare that anyone ever attempts to print complete sheets. I suspect that lots of ā€˜good’ printers would fail to do that well. But still, I think that can be improved.

Prusa has a delay of 9 minutes (?) on the XL for the bed to thermally stabilize. I don’t know if they did the same for the MK4. But it could be worth trying that before doing the bed mesh. I have a macro on my Voron that measures a secondary thermistor and waits for that to stabilize before printing.

The other thing I worked out was homing the Z axis with probing:

[gcode_macro _HOME_Z]
gcode:
    SET_GCODE_OFFSET Z=0
    G90  # absolute move
    G1 Y140 X130 F12000
    G28 Z  # home
    # remember it cant be 1.0 because of the overshoot!
    G1 Z2 F{5 * 60}  # move up to 2mm
    PROBE SAMPLES=3  # take 3 probe samples
    # Dragons:
    # A move needs to be issued so the kinematics get updated position info!!
    # without this move its always off by the length of the pullback move!!
    G1 Z5 F{5 * 60}  # lift to 5mm
    M400
    _HOME_Z_FROM_LAST_PROBE # has to be a sub-macro call to see probing results in printer state

[gcode_macro _HOME_Z_FROM_LAST_PROBE]
gcode:
    {% set z_probed = printer.probe.last_z_result %}  # absolute position of the probe
    {% set z_position = printer.toolhead.position[2] %}  # absolute position of the kinematics
    # the probed position is going to be a positive number, like 0.05
    # and this means that the "actual" 0 position is higher than the homed position
    {% set z_actual = z_position - z_probed %}
    {action_respond_info("Probed z: %.6fmm, Z Position: %.6fmm, Corrected Z Position: %.6fmm" % (z_probed, z_position, z_actual))}
    SET_KINEMATIC_POSITION Z={z_actual}

I thought that this would do nothing because the bed mesh overrides the z position, but this turns out to have an impact. Homing moves don’t do the high precision pullback move. Homing tends to overshoot and be too low. Not sure if this would resolve the issues in your test print, but my guess is it looks too low.

The last thing I have going on is per-build sheet z tweaking. I found I didn’t need it for a smooth sheet, but some additional squish was helpful on a textured one. The Prusa sheets don’t have as aggressive a texture as the after market ones, and I suspect you wont need that adjustment on the MK4.

There are some localized high spots, but overall no one is going to cancel a print like this:

2 Likes

Its pretty rare that anyone ever attempts to print complete sheets

Hah, yes! I don’t think I’ve ever actually done it on my MK4 with stock firmware, so I don’t even know what target I’m really chasing here. I have a TODO list going of things to try/check for a last switch back to it, though, so I’ll put this test on that.

Prusa has a delay of 9 minutes (?) on the XL for the bed to thermally stabilize. I don’t know if they did the same for the MK4.

Nope. But I agree it’s worth letting things thermally settle in general, and I have been doing that manually most of the time before my tests.

The other thing I worked out was homing the Z axis with probing

Interesting. The homing vs probing paths is on my list of things to understand; I’m not thrilled about how the homing works at the moment, and you may have provided the solution to me :slight_smile:

my guess is it looks too low. The last thing I have going on is per-build sheet z tweaking. I found I didn’t need it for a smooth sheet, but some additional squish was helpful on a textured one.

Yes, I think it is too low. I did it on a cheap AliExpress smooth plate which I tend to use for low value stuff, or, currently, in cases where I think I might make a mess of things. The stock Prusa smooth sheet has some ā€œgiveā€ to it while this one doesn’t, that probably amplifies the need for a bit of offset. I have also been doing some per-build-sheet z offsets in the slicer with the stock firmware, so I’m not sure I’d consider what you’re doing a downside in any way.

1 Like

Super long term, maybe we want to do both things while printing so we can have data about extrusion force for some future use cases. That might require the interleaving, with the caveat that the time info is not absolutely correct. But I hope we can avoid crossing that bridge for now

Having both filament runout and jam detection during printing would definitely be my goal. I definitely see the benefit of an incremental approach, but I’m wary of doing something which paints me into a corner, even if that is just having to make backward incompatible config specification changes.

When you command the sensor to switch channels is restarts sampling on the new channel and you have to wait for an entire sample period to get the next sample. Any work on the current sample is lost, along with the time taken.

I’m struggling to find concrete information on these ADCs. To be fair, I haven’t tried translating the datasheets yet. Is there any specific English resource which spells this stuff out, or should I get comfy with translating the Chinese and/or experimenting?

I understand the timing constraint during probing, so I don’t want to interfere with the single channel mode as it currently stands. I imagine that jam detection and runout mode would be fine at even 1Hz (but I’m guessing we could do at least an order of magnitude better than that).

My assumption was that I could just write the next channel/gain at the end of the sample read like the Arduino libraries do, and that that would be the only thing that determined what the next sample should be. It’s not clear to me what you mean by switching channels (the only thing I see about commanding the sensor which channel to read is that write at the end of the read), and how I’d determine how long I’d need to wait before reading the next sample.

1 Like

The HX711 datasheet is available in English. The HX717 is just a 711 with a higher clock speed and slightly different config.

This gets very hairy without an ISR to watch the DRDY pin. Klipper doesn’t do ISR’s.

For this to ā€œjust workā€ the sensor has to keep working on the current sample being prepared and not reset or restart its internal state. Time between samples has to be a constant even when channels are switched. Lets assume that’s the way it works. The sequence of events would have to look like this:

Start condition: sensor is reading from channel A
1 -  read sample - data from channel A returned. Command switch to Channel B
2 -  read sample - data from channel A returned. Command switch to Channel A
3 -  read sample - data from channel B returned.
4 -  read sample - data from channel A returned.

The reads are not triggered by an interrupt. They happen at some unknown time that’s probably well after the sensor starts preparing the next sample. So switching channels doesn’t change the immediate next sample returned, it changes the next time the sensor starts to prepare a new sample.

There is additional complexity around what happens in rare cases where the code gets very lucky and reads the sensor right at the time when the sample is ready, right when an ISR would have been triggered. We don’t have any way to know that because: no ISR. So, is it possible to switch channels at that time and its before the next sample starts being prepared? I.e. the switch is instant.
The the timing diagram looks like this:

Start condition: sensor is reading from channel A
1 -  read sample - data from channel A returned. Command switch to Channel B (this read has lucky precise timing)
2 -  read sample - data from channel B returned. Command switch to Channel A
3 -  read sample - data from channel B returned.
4 -  read sample - data from channel A returned.

That means I cant tell what is in sample #2 and after a channel switch and I’d have to discard it.

I think we probably have to fork off a discussion about this and get kevin’s opinion. I’ll start a thread…

1 Like

I’ve pushed an update to the algorithm for everyone to test, its pretty nerdy but the details are here. I made a small mountain of test prints this week to validate all this:


The fact that I am spitting out full sheets at-will has me feeling confident that this is an improvement.

Other stuff in this update:

  • The default pullback move distance was increased from 0.1mm to 0.2mm. Some users reported problems with taps that were too short and its better to be slow and safe vs go fast and fail. You can still change this if you find that its too long.
  • There is now a check that the pullback elbow does not fall within the last 1/4 of the pullback move time. When the elbow is this late its not possible to be certain that the move was long enough.
  • The pullback_extra_time option was removed. This was adding an extra 300ms to all taps to gather additional data. With the longer default move distance this should not be necessary.
  • The PROBE_ACCURACY tool now reports a new metric called average delta. I wrote about this here. Hopefully this will help to spot cases where PROBE_ACCURACY is used in adverse conditions and try to separate what is the probes fault and what is the environments fault.

Debugging tool updates:

  • The debugging tool will now report the time it took for the python code to run the tap analysis. On a Pi4 I’m seeing around 130ms to 200ms.
  • All of the JSON for the tap is copied to the clipboard (before it was only time and force)

The next thing that will happen is an update to surface the cause of bad taps in the debugging tool. I’ll be adding error codes to the websocket output so you can know why a tap was rejected.

4 Likes

The testing branch has been updated to capture all errors that happen inside the TapAnalsys code and report them both in the console and over the web socket. This means:

  • You’ll get a short text description of the error in the gocde console if a single tap fails. This wont stop the whole probing sequence.
  • The debugging tool now shows if the tap is valid or not
  • If the tap is invalid an error code is shown. The intention of the error codes is to later allow front ends to provide support that can be translated:
  • You can always copy the JSON of the failed probe to pass along when reporting a bug. This will include, at a minimum, the Time/Force data and the raw contents of the Trapezoidal Movement Queue. These are really the only inputs to the algorithm.

Other Updates:

The settling_time option was deleted. Its not in use anywhere in the codebase. If you needed settling time the same quality can be achieved by enabling the continuous tare feature.

The continuous_tare_trigger_force option was deleted. It was was confusing to have 2 options that did kinda the same thing but also having one totally supplant the other. The original intention was to have this option enable the continuous tare feature. But this is accomplished now by specifying the continuous_tare_highpass option (though I suspect I will rename that option because its not clear that it is the magic option). The trigger_force option now is the trigger force for both operating modes.

It is now possible to flip the polarity of the force graph with a reverse option. This allows the tap graphs to have any polarity you like. The default is to plot force based on the output data from the ADC. This is driven by the attached strain gauge and which force causes resistance to go up or down. My graphs being ā€œupside downā€ is just a function of the default polarity of the Nextruder. They built it to have extrusion for be positive and collision force be negative. If you set reverse=True the polarity is flipped. All of the tap code is polarity agnostic and can handle load cells built and installed in either orientation.

1 Like

I made some updates to address comments on the PR: #6792

The two impacts for users:

  • The GCode Command names have changed. They are all now prefixed with LOAD_CELL_. So LOAD_CELL_TARE, LOAD_CELL_READ, LOAD_CELL_DIAGNOSTIC and LOAD_CELL_CALIBRATE. The win here is, on front ends like Fluidd, you just type LOAD and hit tab to see all the related commands.
  • The raw counts graph in the debugging tool is blank. The separate socket feed from the sensor was turned off so that there would be just 1 data feed. I need some time to update the debugging tool to have that data in the graph again.
1 Like

rcloran reported a bug and got to the root cause for me! Over long periods of time, like a day, the time conversion from 64 bit to 32 bit started to incur rounding errors. As more bits got used to store the integer part of the time there were fewer bits to store the decimal part of the time, which in this case is pretty much all that matters. So the fix, at least temporarily is to use the float64 type to store time.

When I changed the code to use Numpy’s float32 type for data processing I assumed that I was seeing a speedup due to 32 bit instructions being faster on the FPU on the RPI chips. Now I think what I was seeing was due to optimizing for not repeatedly converting from Python to Numpy data types. Doing the conversion once was a major speedup.

But we both tested this fix on various Pi flavors and came out with a head scratcher: 64bit types are faster than 32 types. Perhaps there are some vectorization optimizations that only apply for 64 bit? Whatever it is, on a Pi4 I’m seeing about a 50% improvement in overall execution time. P99 times went from ~250ms to ~150ms and minimum times are now around 90ms. This is now in the realm of being undetectable to the naked eye.

I also pushed a change to surface errors to the console if the probe could exceed the safety limits when probing. Previously this caused an MCU shutdown and left a cryptic message. Now it will report Probe aborted, trigger force exceeds safety range.

2 Likes

Kevin merged the gram scale PR: Load cell gram scale by garethky Ā· Pull Request #6729 Ā· Klipper3d/klipper Ā· GitHub

The next PR, ā€œLoad Cell Endstopā€, is in draft now: PR: Load Cell Endstop by garethky Ā· Pull Request #6871 Ā· Klipper3d/klipper Ā· GitHub

I’ve been working for the past week on a long delayed re-structuring of the commit stack so the next PR can be submitted. I want a clear stopping point where homing works but high accuracy probing isn’t included. That is now done.

Short version is I made changes to the firmware so you will have to re-compile and flash your MCUs. I also changed the names of some config options to try and make things clearer:

Old Name New Name
safety_limit force_safety_limit
continuous_tare_highpass drift_filter_cutoff_frequency
continuous_tare_lowpass buzz_filter_cutoff_frequency
notch_filter notch_filter_frequencies
pullback_dist pullback_distance

One changes that you might notice is the drift filter is used for triggering when testing with QUERY_ENDSTOPS and QUERY_PROBE. Before this update this used a different codepath based on the absolute force. So perhaps a test with QUERY_PROBE might have worked and then probing could have failed because of a bad filter configuration. Now its all one codepath, so if it triggers with QUERY_PROBE it will trigger the same way when homing & probing.

If you have time, please read the documentation and send feedback! Writing technical documentation is hard.

1 Like

I have a FLSUN T1 Pro printer with the community testing branch installed. This is a newish delta printer and comes from factory with 3 underbed load cells for probing. The stock system has a separate board using an HX717 and STM32F103 to toggle a conventional probe signal to the main mcu. Load cells are connected in parallel to the HX717 and located on opposite sides of the bed from the towers. The STM32F103 did nothing more than read/process the HX717 data and toggle the probe pin wired to the main motherboard.

The performance has been fantastic, here was before and after for 10x probe_accuracy at 0,0:

before: range 0.046875, average -0.420110, stDev 0.010482, average delta: 0.011671
after: range 0.002557, average -0.304737, stDev 0.000692, average delta: 0.000532

After messing with filters a bit and delta_calibration, and adding some shielding to cables, I went on a quest for dragons. Script was used to do a probe_accuracy with 5 samples at 200 points in a hexgon pattern with 125mm radius (130mm printable radius). At each point it probes twice and discards that data then performs the sample. I ran script 5 times, homing the printer in between each trial. Order of sample locations was randomized after each home. There was one outlier point near the strain gauge opposite the B tower (-100,70). So still some work to do to characterize problem areas but sure looking promising.

Probe z |555x500

Range | 547x500

SD | 547x500

raw data.csv.txt (102.1 KB)

2 Likes

Hi all,

I want to bring up some hardware that can be used for experimenting with load cells: Creality K2 Plus hotend and Strain Gauge.

Of course a BYO ADC is still needed. I’m looking to adapt this hotend into the EVA 3 toolhead ecosystem.

1 Like