AKA, the thread that will probably have more math and algorithms in it that anyone can stand. I think this discussion will be more productive in its own thread. I’m here for ideas, improvements, suggestions, verification etc.
The idea is to test the effectiveness of the two main algorithms involved:
- The “Elbow Finder” which tries to find the first point in the data set that contains probe collision data
- The “Collision Finder” which takes the elbow point and tries to compute the exact time that the collision occurred.
I created the test collision data from a straight line at y=0 that runs from 0.0 to 10.0 and these polynomials:
- 4x + 3x^2
- 20x - x^2 + 3x^3
- 25*log(x + 1) + x^3
- 4x^1/3 + 3x^1/3
Which run from 10.0 to 11.0:
This is a mix of concave, convex and linear functions. The nice part about using functions is that they are continuous and can be sampled at any interval we want to simulate. I don’t think these functions are in any way ideal, but they give us somewhere to get started. One way this can be improved is for us to pick test function that better model real world collision data.
Each function is sampled using a sliding window that sweeps across the collision point. This lets me generate the full range of conditions that I couldn’t replicate on demand in a printer. For 6 polynomials I get a set of 10 sample for 60 total test cases.
Noise can then be added to each sample set. So far I have random noise and 60Hz power line noise. (I haven’t worked out how this noise model maps to noise on my printer yet)
I started working on the “collision finder” evaluation first. Job 1 in any scientific context is to establish a control. Some benchmark that we should easily be able to beat. I have 2 control algorithms:
This just returns the time at the elbow.
Named after the cursed bogosort. This takes advantage of the fact that the collision point is somewhere between
[elbow-1, elbow]. So it guesses a random value in that range. And I am very sorry to say that this is annoyingly effective. In some scenarios it outperforms all other algorithms which makes it an excellent control.
So then I have some genuine attempts to solve the problem:
Crowd favorite linear regression does what it says on the tin. Perform linear regression from the elbow to the end of the data set and then solves for y=0 on the regression line. It performs brilliantly on
50x and is absolutely destroyed by
bogo_collision_finder on every other function. So this cant handle data with curves in it.
This tries to fit a degree 3 polynomial to the collision data. Then the root that is nearest the elbow is selected as its guess. I really had high hopes for this one. But for curves that are concave and shallow (30x^2) it fits a function that flattens out near y=0. This makes the predicted collision point very sensitive to noise and it generally performs worse than
bogo_collision_finder unless noise is low. This has the advantage that it only requires numpy to run.
This is the algorithm I came up with that’s in the code right now. It tries to make a guess based on the average rate of force increase during the collision, and what fraction of that is seen in the elbow point. Its results are clamped to the range of
[elbow-1, elbow] so it cant perform worse than
dumb_collision_finder. But unexpectedly is beats all of the algorithms above. Its not “good” at any particular polynomial but its not horribly bad at any of them due to the clamping. I’m both pleased and dismayed I did this well with my first guess.
This fits a b-spline to the collision data and uses it to extrapolate back to the collision. BSplines can’t be solved for y=0 in the same way as a polynomial, so I just solve it 100 times at an even spacing over the
[elbow-1, elbow] range and pick the value where y is minimized. This is the best performing algorithm so far. At high noise levels it performs on par with
estimator_finder but as the noise drops off it can do about an order of magnitude better. Unfortunately this requires scipy and they are using fortran code internally, it wont be something I can port out into python easily.
You can see here that if the SD of the noise is less than 0.001 we get a pretty reasonable ranking of the finder functions, lower is better:
If the noise was raised to SD of 0.1 it a different story:
bogo_collision_finder is near the top of the results . Also the noise absolutely confuses
polyfit_finder. So a big learning here is that noise is a huge factor and no algorithm is going to be successful under noisy conditions without some clever filtering. On the electronics side increasing signal to noise is important.
On the physical side, squishy printers make for gently curving collisions which are difficult to plot solutions for. We might want to try and characterize what is “too squishy” and warn people or reject such probes.