Experimental mesh leveling changes

After receiving the latest feedback in the Impossible Bed Mesh Leveling thread I decided to take another look into bed_mesh.py. I am unable to produce a situation where bed_mesh applies an incorrect adjustment based on the mesh, however I have identified a couple of areas where I believe bed_mesh can be improved. If you aren’t interested in the details and just want to test the new changes, skip down to the to the testing section.

Bicubic Interpolation

I first decided to take a look at the bicubic interpolation algorithm. While the algorithm is generating mathematically correct results, I believe that the recommended tension value should change to .5. In my tests it seems to deliver more consistent results, particularly as the surface variance increases. Below are two visualizations generated on my MK3 for comparison. The interpolated mesh is intentionally dense to help show the difference.

Tension 0.2:

Tension 0.5:

As you can see, the .2 tension is too low, resulting in ridges. My MK3 printed okay with this mesh, however this behavior could very well result in a mesh that is “unprintable” on some machines.

Additionally I made a change that affects the edges of the interpolated mesh. Bicubic interpolation requires four control points; when interpolating at the beginning or the end of an axis it is necessary to “fake” a control point. Previously I set it to the same value as the point at the edge of the mesh. This PR uses linear interpolation to calculate the value of the “fake” control point. The result is that the edges shouldn’t “flatten” out. In the following illustration the top curve represents the current implementation (no interpolated control point), and the bottom curve represents the updated implementation (interpolated control point).

BedMeshIllustration-Page-6.drawio

Move Splitting

The current move splitter checks the z-value along a move segment and splits it when the difference exceeds the configured split_delta_z. When I first PRed bed_mesh Kevin suggested that bed_mesh split moves when they intersect the edge of a triangle in the mesh. At the time it was agreed that this was worth looking into, but not necessary to merge. The experimental branch implements this. This will result in more move segments, the split frequency scales with mesh density. Naturally this increases Klippy’s CPU usage, however in real world tests I haven’t noticed a difference. In synthetic tests using a mesh with a density of 1 sample every 11mm (on both X and Y) it takes Klippy about 3.5% longer to process a gcode file with the new splitter.

The experimental branch also includes a few other changes:

  • TMC driver queries are disabled when debugoutput has been set. This allows a user to run Klippy in batch mode without removing the TMC sections from a config.
  • debug_input and debug_output are reported in system_stats. I used this to have my PRINT_START macro load a mesh, rather than probe one, when running batch mode. There may be a better place to report this, system_stats seemed like the best fit after a cursory glance of modules that report status.
  • I added an initial_profile option to bed_mesh. This allows you to select the profile to load at startup, or skip loading a profile if the specified profile name does not exist
  • The last commit includes the ability to dump mesh adjustments over the unix socket. I am currently working on a script that will use scipy to validate that the adjustment for each move is correct. I’ll add that script to this branch when its done. This is going to be a lot of data to serialize and send so I’m not sure yet how its going to impact Klippy when a client is subscribed.

Testing

The branch can be found at: GitHub - Arksine/klipper at dev-bed_mesh-experiment-20220703

I recommend checking out detached as its possible that I will force push to this branch. For those unfamiliar with git, ssh into your machine and perform the following commands:

cd ~/klipper
git remote add arksine https://github.com/Arksine/klipper.git
git fetch arksine
git checkout arksine/dev-bed_mesh-experiment-20220703
sudo service klipper restart

Should I update this branch, you would repeat the fetch, checkout, and restart commands, do not use git pull. When you are done testing and want to go back to the official branch:

git checkout master

It would be helpful to get feedback both from users that are struggling with their first layer and users that have a consistent first layer.

Specific things to look for:

  1. Is your first layer improved? Worse? No visible change?
  2. Any noticeable increase in cpu usage? Any stalls during a print that otherwise would not occur?
  3. Any additional artifacts on a print? Any reduction?
  4. Any obvious regressions? (Klippy crashed, invalid tool movement)
7 Likes

So I tested the experimental version (with .5 tension) and here are the answers to your questions :

1 - No visible change.
2 - No stalls, no significant difference in CPU consumption.
3 - I can’t be sure for the moment I concentrated on the first layer. I will run a bigger print tomorrow.
4 - Not for the moment.

Thank you again for your involvement in solving our problem.

1 Like

On a somewhat related note @Arksine, after reading the many reports of people struggling with bed mesh, I thought it would be very useful to have a built-in script that can auto-validate the selected mesh. In essence an automated process that would re-run the grid probing of the bed, but with the mesh active in order to produce corrected surface error plot. This validation could be done with a somewhat denser grid if required…

On my modified CR-10S Pro the current implementation seems to work very well with a reasonably dense mesh. But I will experiment with the tension value and perhaps try your experimental branch if time allows…

Thank you for your excellent work!

1 Like

I thought it would be very useful to have a built-in script that can auto-validate the selected mesh. In essence an automated process that would re-run the grid probing of the bed, but with the mesh active in order to produce corrected surface error plot. This validation could be done with a somewhat denser grid if required…

That is an interesting idea. I’m not sure it would detect probe location bias, however it might be able to detect stiction, backlash, or inconsistent probing. I think I should be able to incorporate this in my current script. I might look into creating a moonraker component for this in the future. It could facilitate the validation procedure and report results to clients which can provide visualization.

@Murdock Thanks for testing, it was worth a shot. Its good to know that thus far it hasn’t introduce any regressions.

bicubic_tension: 0.5 is instantly better in both of my printers. They’re very warped creality beds and have always been difficult.

The bed on VS373 is particularly bad, having a slight ripple in the magnet sheet, and having a string saddle shape. I’m now getting near perfect looking centre and corner superslicer calibration chips with the experimental branch rather than settling for somehow getting a print to stick. Extra load is not noticable. Pi3B/SKR mini e3 2.0

2 Likes

Many moons ago I did this manually to evaluate steady state bed shape differences at different temperatures. I manually subtracted the array height values for different meshes to visualize the delta. This is when my idea first came about - create a reference mesh and have a MESH_VERIFY command (or some such) to only establish the differences at a different time or conditions.

As you pointed out the XY positioning errors would not be detectable, but you’d still be in essence getting a visualization of the bed correction repeatability for the complete printer “system”. It might be helpful with diagnosing various issues such as excessively constrained bed mounts, Z backlash, probing repeatability, etc. As a minimum it would confirm that something untoward is happening on a specific printer.

Thank you for considering it.

1 Like

Also I mentioned different mesh density earlier because higher validation density could be used to verify the interpolation algorithm…

But this is probably an overkill for a “production” system.

I wonder if it would work to mount the probe directly where the nozzle is, i.e. X / Y offset = 0, and then do the mesh. Result should be that any location bias, e.g. twisted extrusions etc, is eliminated.

After meshing, put the probe back to its usual location for regular operation.
@Arksine would this cause problems, because essentially two different offsets are used? I would think no, because I could manually do the mesh as well.

I also tested this branch today with the following steps:

  • new bed_mesh tested with old settings and original klipper branch + testprint created with this mesh
  • changed klipper branch
  • new bed_mesh created + new testprint
  • again new print created mesh with optimiced z_offset of the sample

testprint with clean klipper branch:

testprint with new branch + new bed mesh:

testprint with new branch + new bed_mesh + updated z_endstop_position:

so i see a little change, but not really a good improvement. what i did notice during the print was that the 4 z-steppers have become a little quieter. (i tested it on my V2)

1 Like

With the dev branch, the difference is not really noticeable.

  1. Printed the left one in the master branch, with 2, 2 pps, and bicubic_tension: 0.2
  2. Fetched and installed the dev branch
  3. Probed a new bed, set bicubic_tension: 0.5
  4. Printed a new test piece (right).

No noticeable increase in CPU load or motor quietness. The z offset remains spot on on the right side of each print but the filament barely adheres on the left.

[bed_mesh]
speed: 80
horizontal_move_z: 5
mesh_min: 20, 20
mesh_max: 190, 195
probe_count: 5, 5
mesh_pps: 2, 2       # Change to 0, 0 for debug (default 2, 2)
algorithm: bicubic
bicubic_tension: 0.5

I will revert back to Marlin 2.1 for testing and cross-checking.

1 Like

After meshing, put the probe back to its usual location for regular operation.
@Arksine would this cause problems, because essentially two different offsets are used? I would think no, because I could manually do the mesh as well.

Yes, this would work for a twisted axis. You would have likely have to calibrate your z-offset each time you move the probe, and it would probably make sense to just make it detachable since it couldn’t be used reliably when its offset on xy relative to the nozzle.

Thanks for the feedback @meteyou and @keybored.

so i see a little change, but not really a good improvement. what i did notice during the print was that the 4 z-steppers have become a little quieter. (i tested it on my V2)

It does look like the layer is a little bit low in some parts, and a little bit better on the left side of the bed. Maybe a tension somewhere in between, like .35, would help?

No noticeable increase in CPU load or motor quietness. The z offset remains spot on on the right side of each print but the filament barely adheres on the left.

I suspect that something is causing the probe to trigger too high on the left. The comparison with Marlin will be interesting, some get better results with it.

I forgot mention it in the OP, but one of the theoretical advantages of the new move splitter is that travels without a hop over infill/skin shouldn’t drag as much. I haven’t tested this yet, but its something I plan to look into.

1 Like

Thanks for the fast reply. On Marlin 2.0.8 with UBL (2.1 gave me some troubles while compiling), the issue is “gone” (in the meaning that it is not as exacerbated as in Klipper). I can feel some differences between the left and right side of the print, but visually they’re not noticeable. I run an M423 (X-axis twist calibration) and the result are within range of a normal X-axis (probe_calibrate and Location Bias Check gave the same results prior). The probe is also ok (both in terms of repeatability as well structurally/wiring). We can wind off the mechanical issue as the main cause in my case, also because I’m experiencing the same problem on two very different printers.

What I noticed is that the probe Z-offset needed to be changed. With Klipper, I had -2.850 “working”, while with Marlin I had to jog to -2.745 in order to achieve a visually identical result. The motion system was untouched during the firmware change.

Frankly, the more I think about it and go trough bed_mesh.py to look for issues, the more I’m convinced that that module is not the issue. Quite frankly, I’m under the impression that the debug road I’m on is a dead end.

I see, thanks for the info. I’m currently working on the analysis script. It will be able to perform two types of analysis, a print analysis and a mesh analysis.

I have completed the print analysis functionality. It is initialized with the mesh, then receives every unmodified move, as well as every adjustment, from bed_mesh. It uses scipy’s RegularGridInterpolator to do its own z-lookup and validate that the adjustment is within 10nm (there is some rounding error so they won’t be exact). If there is a bug in the adjustment code this analysis should be able to detect it.

The mesh analysis is stubbed out, however it will perform something similar to what Rext3d suggested above.

I have pushed what I have, so if you want to play with the print analysis you can do the following:

  1. Make sure klippy’s unix socket is enabled (if you use Moonraker then its already enabled)
  2. Install scipy:
sudo apt update
sudo apt install python3-scipy
  1. Make sure Klippy is running and ready, then ssh in and start the script:
python3 ~/klipper/scripts/mesh_analysis.py -a print /tmp/klippy_uds

This assumes the unix socket created at /tmp/klippy_uds. Its also possible to dump the output to a file:

python3 ~/klipper/scripts/mesh_analysis.py -a print -o ~/mesh_analysis.txt /tmp/klippy_uds

Once you are running, start a print and see what happens. If there is an error I just need to determine if the bug is in bed_mesh or the analysis script :laughing:

Also, when the print is going, you will see a spike in cpu usage if you test a print beyond the first layer. There is a lot of data to serialize and deserialize. That said, a Pi 3B shouldn’t have trouble with it.

I re-fetched the branch and installed SciPy. I re-compiled and flashed my Mega2560 with the latest firmware. Launched the script (with dumping) and started the same print job as before.

Z-offset stopped working: the nozzle was too close to the bed and screen/web interface jogging wasn’t effective. The unevenness of the bed was evident, more than during previous attempts. I had to stop the print cause the nozzle was scratching the surface. No mesh_analytics.txt was created (understandably). I saw no significant increase in CPU or RAM load.

If you need any log let me know.

Wow, that is unexpected, no change was made to Klipper’s code, only the addition of the script.

Its also strange that it didn’t generate mesh_analytics.txt. That file is created upon starting the script. It should have at least dumped the mesh data (it should have shown up in the terminal as well).

Go ahead and attach klippy.log from that run. Thanks.

Here’s the log. Heads up: I have two instances running.

klippy-2.log (576.8 KB)

Thanks. According to the log, the script never connected to the unix socket. I’m confident that mesh_analysis.py wasn’t directly the source of the issue and the host code running would have been identical to the code running in your prior test of this branch.

I do have a few observations/recommendations:

  1. It may be best to work with a single instance while troubleshooting. Dealing with multiple instances adds additional complexity that isn’t desirable when trying to determine what is causing an issue.
  2. Likewise, I recommend starting with Raspberry Pi OS or a derivative. We can rule out DietPi having an undesirable affect that way
  3. I notice that you are using a bltouch. Anecdotally, this seems like a common factor for users having first player issues. That said, there are users that successfully use the bltouch, and there are users without a bltouch that have issues, so it isn’t definite that the bltouch is the source of the problem
  4. I wonder if you have tried configuring a higher step_pulse_duration on Z and Z1? The default of 2us should be sufficient, but I wonder if 4 or 8 us would give you different results. That would be step_pulse_duration: .000004 or step_pulse_duration: .000008.

I’m confident that mesh_analysis.py wasn’t directly the source of the issue and the host code running would have been identical to the code running in your prior test of this branch.

I agree on this point.

  1. I troubleshoot with only one instance running, but there are two installed. Theoretically, the container shouldn’t give any issue (at least, it has never given me any).
  2. I run on an Odroid XU4, so Raspbian isn’t really an option. DietPi is fine: it’s basically a Debian Bullseye on ARM with additional scripting for software management and hardware config. It’s more UI than substance. The socket was created under tmp, but with the name klippy_uds2 (due to the aforementioned multi instances). Here’s the culprit. I’ll edit the references in the script. Any other important folder/file to look after? Otherwise, I’ll just do a clean install of a single instance.
  3. BLTouch is fine: tried two different units, and works well under Marlin. I’ll try an induction sensor when I’ll get it. I would rule this out.
  4. No, I haven’t. I imagined that it wouldn’t be worth it with A4988s. Will do.

I also ruled out anti-backlash nuts from the equation.

Actually I’m starting to wonder if it’s not Klipper who doesn’t really like bltouch.
I still made about 20 tests and I am almost convinced that the problem comes either
from the management of the bltouch, or a problem related to the z_offset.

In fact visulement it looks like there is overcompensation where he has to lower the nozzle
the firmware lowers it too much and where he has to raise it he raises it too much.

But if he applies the correct correction but starting from the wrong reference it would do the same thing…

I’m not an expert in pyhthon (more in C#, PHP, C++) but the mesh_bed code seems totally logical.
What starts to make me think of a problem around the z_offset is when I added a G28 after the bedmesh and I had to add 0.10mm to the z_offset (thickness of the sheet of paper used in tolerance for the calibration bltouch) to avoid “air printing”.

Reference to Klipper / octoprint / Ender 3 S1: problem with bed mesh leveling
As indicated in the very last posts, it actually was a hardware issue.

IMO the problem description looks quite similar to the discussion here Impossible Bed mesh leveling - Cr10sPro V2