Barriers to Multi_Mcu Homing on Multi_Mcu Shared Axis?

So, before I realized Klipper doesn’t actually allow this “out of the box” I converted my Voron 2.4 (CoreXY) to a purely CANBus setup with a seperate mcu for each stepper.

My current setup is…
Using the Pi as a secondary “main” MCU so I could alias my Z drives to make it easier to keep track. Plus since the main MCU sets the timing for the rest I thought it might help the issue (it didn’t).

Z Drives (4) - 3 Mellow Fly SHT42 and 1 BTT EBB42 (I was dumb and only bought 3 SHT42 cause I had the EBB42 on hand, I bought another SHT42 to replace it to make it easy to update klipper on all MCUs).

X and Y axis - A Huvud board on each (I did this portion first when I built the printer)

Toolhead - Mellowfly SB2040, controlling the extruder, fans and a Voron Tap probe.

Of course when I fixed up my printer.cfg I immediately got the “Multi-mcu homing not supported on multi-mcu shared axis” error message. For fun I went in and commented out the check for this and just ran it anyways. I now see why it’s not allowed.

It caused my probing tolerance to become unacceptable (but not as high as you’d think, it was just barely over the edge of workable).

My srtt, rttvar and other stats are consistently low. My max srtt I see is .001-.002 so “hypothetically” I should get .015-.03 mm resolution (15 mm/s * .001 (.002) s) but in reality it’s more like .01 - .1

Below is a snapshot of my klippy log when probing showing the timings and stats. They’re all fairly good. Oddly enough Z3 is a little slower (srtt of .002) than the other Z drives even though they’re in close proximity. The furthest away is the sb2040 on the physical toolhead on the gantry.

Stats 22023.6: gcodein=0  
mcu: mcu_awake=0.001 mcu_task_avg=0.000009 mcu_task_stddev=0.000015 bytes_write=1128 bytes_read=5493 bytes_retransmit=0 bytes_invalid=0 send_seq=163 receive_seq=163 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=50001503 
sb2040: mcu_awake=0.024 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=23310 bytes_read=44222 bytes_retransmit=0 bytes_invalid=0 send_seq=1963 receive_seq=1963 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=11999921 adj=11999595 
HuvudY: mcu_awake=0.001 mcu_task_avg=0.000010 mcu_task_stddev=0.000008 bytes_write=8599 bytes_read=15683 bytes_retransmit=0 bytes_invalid=0 send_seq=629 receive_seq=629 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=72000471 adj=71998558 
HuvudX: mcu_awake=0.001 mcu_task_avg=0.000010 mcu_task_stddev=0.000008 bytes_write=9003 bytes_read=15728 bytes_retransmit=0 bytes_invalid=0 send_seq=638 receive_seq=638 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=72000203 adj=71998483 
Z: mcu_awake=0.026 mcu_task_avg=0.000018 mcu_task_stddev=0.000018 bytes_write=31917 bytes_read=40399 bytes_retransmit=0 bytes_invalid=0 send_seq=2179 receive_seq=2179 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47999140 adj=47996930 
Z1: mcu_awake=0.025 mcu_task_avg=0.000012 mcu_task_stddev=0.000013 bytes_write=32337 bytes_read=40360 bytes_retransmit=0 bytes_invalid=0 send_seq=2191 receive_seq=2191 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63998932 adj=63997553 
Z2: mcu_awake=0.027 mcu_task_avg=0.000018 mcu_task_stddev=0.000018 bytes_write=32011 bytes_read=42735 bytes_retransmit=0 bytes_invalid=0 send_seq=2194 receive_seq=2194 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47998909 adj=47997401 
Z3: mcu_awake=0.026 mcu_task_avg=0.000017 mcu_task_stddev=0.000018 bytes_write=31775 bytes_read=40149 bytes_retransmit=0 bytes_invalid=0 send_seq=2178 receive_seq=2178 retransmit_seq=0 srtt=0.002 rttvar=0.002 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47999196 adj=47997478  
FLY-SB2040: temp=22.9 HuvudY: temp=43.3 HuvudX: temp=47.1 Z: temp=46.8 Z1: temp=38.4 Z2: temp=50.0 Z3: temp=46.7 heater_bed: target=0 temp=18.8 pwm=0.000 sysload=0.45 cputime=103.123 memavail=1678616 print_time=56.226 buffer_time=0.000 print_stall=0 extruder: target=0 temp=19.8 pwm=0.000
probe at 330.000,330.000 is z=-1.517500
Stats 22024.6: gcodein=0 
mcu: mcu_awake=0.001 mcu_task_avg=0.000009 mcu_task_stddev=0.000015 bytes_write=1140 bytes_read=5525 bytes_retransmit=0 bytes_invalid=0 send_seq=165 receive_seq=165 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=50001492 
sb2040: mcu_awake=0.024 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=23854 bytes_read=45162 bytes_retransmit=0 bytes_invalid=0 send_seq=2007 receive_seq=2007 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=11999921 adj=11999576 
HuvudY: mcu_awake=0.001 mcu_task_avg=0.000010 mcu_task_stddev=0.000008 bytes_write=8633 bytes_read=15805 bytes_retransmit=0 bytes_invalid=0 send_seq=632 receive_seq=632 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=72000464 adj=71998482 
HuvudX: mcu_awake=0.001 mcu_task_avg=0.000010 mcu_task_stddev=0.000008 bytes_write=9037 bytes_read=15850 bytes_retransmit=0 bytes_invalid=0 send_seq=641 receive_seq=641 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=72000190 adj=71998237 
Z: mcu_awake=0.026 mcu_task_avg=0.000018 mcu_task_stddev=0.000018 bytes_write=32848 bytes_read=41413 bytes_retransmit=0 bytes_invalid=0 send_seq=2239 receive_seq=2239 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47999116 adj=47996901 
Z1: mcu_awake=0.025 mcu_task_avg=0.000012 mcu_task_stddev=0.000013 bytes_write=33257 bytes_read=41370 bytes_retransmit=0 bytes_invalid=0 send_seq=2251 receive_seq=2251 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=28 upcoming_bytes=0 freq=63999032 adj=63997178 
Z2: mcu_awake=0.027 mcu_task_avg=0.000018 mcu_task_stddev=0.000018 bytes_write=32934 bytes_read=43809 bytes_retransmit=0 bytes_invalid=0 send_seq=2254 receive_seq=2254 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47998908 adj=47997354 
Z3: mcu_awake=0.026 mcu_task_avg=0.000017 mcu_task_stddev=0.000018 bytes_write=32706 bytes_read=41149 bytes_retransmit=0 bytes_invalid=0 send_seq=2238 receive_seq=2238 retransmit_seq=0 srtt=0.002 rttvar=0.002 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47999231 adj=47997403  
FLY-SB2040: temp=22.6 HuvudY: temp=43.3 HuvudX: temp=47.1 Z: temp=46.7 Z1: temp=38.2 Z2: temp=50.2 Z3: temp=46.8 heater_bed: target=0 temp=18.8 pwm=0.000 sysload=0.41 cputime=103.391 memavail=1678740 print_time=57.286 buffer_time=0.000 print_stall=0 extruder: target=0 temp=19.9 pwm=0.000
probe at 330.000,330.000 is z=-1.495000
Probe samples exceed tolerance. Retrying...

So my question is… Is the barrier to the multi_mcu on a shared axis just purely due to communication latency concerns? Is there a deeper issue at play with keeping indenpendant z axis movements synchronized.

@koconnor mentioned in his outlook for Klipper in 2024 redoing the homing/probing code and I feel like this is pretty intimately connected but I’m trying to understand the “big picture” of the factors at play.

I had a few potential thoughts on possible fixes, but I’ll be the first to tell you I don’t fully understand the klipper source code. I’m building up a mental model as I read through and debug it but any deeper insight from those with experience would be very helpful.

1.) Speed up the mcu code/reduce latency - I noticed in the mcu firmware code there are places where code can be streamlined a bit. Some bounds checking and things like that could be sped up by using some fairly easy bit manipulation checks. Mainly I was looking in the command.c and basecmd.c but any place the mcu code could be optimized would potentially reduce communication latency on the mcus by speeding up their processing on tasks.

2.) Additional sync communication during homing/probing moves - I’m still wrapping my head around the drip movement portion and the fact that “homing” is spread across like 4 different files (,, and for the drip move portion). But my thought was possibly introducing an additional parameter or flag to tell the mcus “This movement is part of homing, so send continous updates of your position/steps every x microseconds (100s of microseconds) until the endstop is triggered” and then work out a way on the Pi to keep a reference of them and if a stepper starts lagging/leading send communication along the lines of “WAIT! Z3, You’re x number of steps behind. Here’s your new movement command to adjust for you being so slow.”

Obviously that’s a major undertaking but I feel like it would have the potential for a major upgrade to accurate homing/probing.

3.) Allowing multi_mcu probing on a shared axis if an endstop probe is defined in the stepper portion. For example, if I split the output of my Voron Tap probe and connected it to an endstop pin on all my Z axis drives so they all received the endstop pulse at the same time. Of course I’d have to check the fanout capability on the opto-interrupter, this may require a buffer to keep the signal strong enough. I just realized this is actually possible, I’ll try it out tomorrow.

4.) This isn’t a total solution, but I see that there has been work done on FD CAN. I’ve thought about creating a few custom boards with an FD CAN Transceiver and seeing just how far I can push down any communication latency. I know this isn’t something that would work on the mainline klipper because it isn’t applicable on every printer. Unless we decide it’s okay to do that only if there is something in the printer.cfg that specifies FD CAN is available and active.

I know that’s a lot of info, questions and words in general.

I’d like to do my best to help make Klipper better and more awesome, Any guidance on the current state of affairs on how the code is working at the lowest level during homing, probing or stepper synchronization would help me immensely in getting up to speed.


Klipper doesn’t currently support “multi-mcu homing” on “multi-mcu shared axes” because that setup would require resynchronization of each stepper motor after a homing/probing operation and that code has not been implemented.

Specifically, if one mcu signals an endstop event, Klipper is capable of relaying that signal to the mcus moving the steppers. However, it is known that those stepper mcus will not receive the stop event at the same time. Thus, after each homing/probing operation, if a single axis was driven by steppers on multiple mcus then that axis could get slightly misaligned because each stepper might take 1-2 steps more or less than the others. To avoid this poor result, the code explicitly checks for this type of unsupported configuration and raises an error at startup.

The homing/probing code could, in theory, check for any resulting over/undershoot and move the steppers back into a synchronous position. Alas, that code is a little tricky, and it was never implemented. (I was aware of the limitation when I first wrote the multi-mcu homing code and I did not want to delay the release of general multi-mcu homing support, so I went forward with just the error check for that unusual setup.)

Timing of commands, latency, and round-trip timing have no impact on this issue. Klipper always schedules these events in advance with precise clock timing, and always interprets the results using the event timing. So, normal timing jitter due to communication or scheduling latency does not impact the results.

As you’ve indicated, one workaround for homing is to route the endstop pin to each of the stepper mcus. This isn’t a solution for probing though.


That actually simplifies the issue a bit then I think.

So, let me check if I understand the logic here, the timers should all be synchronized to the main mcu correct?

So if, lets say Stepper_Z gets a probe trigger signal at x time, could I then resync the steppers on Z1,Z2,Z3 by using the “get_past_mcu_position” on the other 3 stepper objects to find out where they were at that “x” trigger time and command them to move back to that position after a probing move?

Expanding beyond that a little, my probe is actually on another board and I’m using the “virtual_z_endstop”, if the times are synchronized can I use the trigger time of the probe on the toolhead pcb and command my z steppers back to that position to compensate for any delays? (The klipper docs say homing/probing compensates for overshoot so maybe it already does this?)

This at least somewhat narrows me down to where the issue occurs.

It makes sense you wouldn’t delay a release, this is such a niche issue compared to most setups out there. But CANBus boards have gotten cheap so the urge to minimize wiring and have a minimal, clean, setup is now more feasible than ever so I can see more and more people run into this.

Thanks for the info.

Edit: Thinking moving between probes would be slow, might be better to just compensate for the probe position taking into account the overshoot from the other steppers. That is a bit more complicated. I’ll keep messing with it.

I appreciate your sense of tinkering and adventure.
TBH, I fail to see where this improves the printer’s layout WRT to z-steppers.

It is running 4 stepper wires to a central location, versus running 2 CAN data + 2 power lines from each stepper to a central location.

IMHO this adds complexity, cost and points of failure where none are needed. YMMV.

I have some wire managing left to do, I literally just got it all back together a day or two ago to test the Z steppers and ran into the issue. But I ran all the wires under the extrusion and used silicone extrusion covers to hide the wires.

It’s not just about the power and the CAN data lines, Now I’m flexible about how I can run my wiring for fans, LEDs, etc.

If a new generation of CANBus toolheads come along, I can replace them one at a time instead of waiting to buy a new board all at once. If I blow a stepper driver I’m less likely to kill an entire board and have to replace everything. There are benefits to more modularity.

If I want to create a custom CAN toolhead board and test it out, I can test it on one stepper with 3 other known good drivers to see the difference in functionality and performance.

I’ll be honest though, my main reason was it just LOOKS nicer. I know, how often am I going to see the underside of my printer? Probably not often, but now I have room to expand if I want to add additional things under there for some reason.

Plus, Honestly, I know it’s a “newer” technology regarding use in 3D printers but CANbus has been a part of safety critical Automotive systems for a few decades due to it’s robustness and simplicity of wiring. I can 100% see this becoming a common setup.

Plus, you never know what the future may hold in regards to new and unique ways to go about fine tuning and control. Lets say there is a new technology that comes about that requires a certain interface. With a CANbus setup I can just replace one of my CAN boards to have that new interface and let that board talk to the rest of them.

I understand your point as well, For a lot of people using a single board is more simple and less complex in terms of setup and maintaining. It works well enough for them and it’s reliable. If that’s the enjoyment they get out of 3d printing, more power to them.

Others like to tinker and see what they can do with their hardware and push the limits. It’s more about hardware and the adventure of learning and discovering than the actual 3d printing itself (which is also useful and fun).

That’s what makes hobbies so great, a wide variety of people can get vastly different enjoyment out of different aspects of it.

I’ve been following this thread with interest as it’s something that I’ve thought about and things really didn’t gel for me until @Sineos reply last night.

If you’ve followed this site for a while, you’ll know that I’m a proponent of using CAN toolheads as the toolhead controllers are quite inexpensive (often just about the same cost as a passive toolhead PCB that provides sockets for toolhead devices to simplify the task of replacing devices like thermistors and heating elements) and a single CAN cable is much easier to work with than the 13+ wires that are required for a typical direct drive toolhead.

One of the things I like about CAN is that it is very tolerant of poor wiring, especially in the (relatively) short distances that are found in a 3D printer.

I’ve also thought about using CAN toolhead controllers at each stepper location, as @TheFuzzyGiggler is doing but I was pretty sure that there would be the timing/precision issues that @koconnor has described.

There is a certain elegance/appeal to the idea but there is a big showstopper that isn’t really being addressed here and that’s cabling.

In the @Sineos gave, he notes:

It is running 4 stepper wires to a central location, versus running 2 CAN data + 2 power lines from each stepper to a central location.

When I read that something didn’t seem right and that is @Sineos is portraying CAN as having a star network topology (and, when I look at the image @TheFuzzyGiggler posted of the inside of his printer it seems that he is making the same assumption as well). This is incorrect as CAN has a linear bus network topology (often called a “line topology”) and wiring a CAN network in this way will lead to signal integrity issues.

Just to review, a star network topology looks like:


The line network topology used by CAN looks like:


When wiring CAN, the distance from the bus is tapped to the controller is typically called a “stub” because it radiates like an antenna. This stub also affects the characteristic impedance of the CAN bus which is a problem which means that it needs to be as short as possible.

This is where the star network becomes problematic in this application; the nominal characteristic impedance of a CAN network is 120Ω and, if you have a number of lines connected in parallel (as you do with a star network) you’re simply not going to have 120Ω at the source and you’re going to have a lot of reflections coming back from the various stubs.

To properly implement a CAN star network, you need a CAN hub, which can be quite expensive and adds another level of complexity:

Now, if you were to properly implement a linear bus for your CAN, you’re going to have an issue with power. If each stepper were to draw 1A, for your four Z axis steppers, you would require 4+A which means a cable running from connector to connector will be fairly hefty - the obvious solution to this is to provide a central power supply and run power and ground from it to the various toolhead controllers.

So, your wiring will be a linear bus for your CAN and a star topology for your power - quite a bit more complex than what I see in the image and I think is being considered.

To net it out, you shouldn’t be doing this without properly wiring the CAN network as well as ensuring appropriate power to each node and, in doing this, wiring a CAN toolhead controller to each stepper in a 3D printer (especially one with Quad Gantry Leveling) becomes impractical.

That might work, I’m not sure. The way I would probably do it would be to record the mcu position of each stepper prior to homing (the code already does this) and note the mcu position after homing stops (again, the code already does this). If any of the steppers controlling a single axis have moved a different distance then issue a series of per-stepper moves so that they ultimately all move the same distance. The actual ending spot isn’t particularly important, but the steppers do need to be in sync on the distance they have travelled. There are many “corner cases” that need to be addressed though (eg, technically each stepper could have a different step distance, need to make sure the movement is always away from the bed, the following correction moves have to be scheduled, …).

Just to be clear, I’m not aware of any timing or precision issues with running steppers on many different mcus - it’s actually a pretty common setup now. The specific setup of multi-mcu homing combined with a multi-mcu axis was not fully implemented in software, and so the code explicitly prevents that configuration (at least until the code is implemented).



@mykepredko Please forgive my atrocious paint skills. My connections are all in parallel as per the CAN standard. It just looks weird because the cables are hidden. The reason there are wagos mid point is I didn’t realize I could run the wiring under/through the extrusion covers until I had already crimped my connectors so I made do with what I had. The wagos are for power only. The connection at the USB to CAN Board is in parallel. I have two sets of CAN wires crimped at every connector on the chain (except the final board on the toolhead which is where I terminate the CANBus).

The power is ran along both sides (left and right side are independent in regards to power), And the 24V junction is the set of wagos in the center bottom of the picture. It’s roughly 18 gauge wire in a mostly star bus like you mentioned. Plus it’s open air so I should be well within the boundaries of the insulation rating, especially considering each toolhead isn’t drawing max power constantly. The toolheads on the gantry have their own independent 18 awg power feed per toolhead.

@koconnor I started looking at the l last night and tried to blindly copy this section into the probing section to see what would happen.

over_steps = {sp.stepper_name: sp.halt_pos - sp.trig_pos
                          for sp in self.stepper_positions}
            if any(over_steps.values()):
                    halt_kin_spos = {s.get_name(): s.get_commanded_position()
                                     for s in kin.get_steppers()}
                    haltpos = self.calc_toolhead_pos(halt_kin_spos, over_steps)

From what I can tell it just essentially set my stepper positions in sync even thought they really weren’t. So the probing accuracy was literally dead on but the actual z steppers kept backing themselves away until I got a “probe not triggered after full movement.”

Again, that was just a blind shot in the dark messing around. I think we’re thinking along the same lines though. I need to set down and write out some notes on the homing/probing code to keep a solid mental picture while I’m working on it cause I’m still “figuring it out as I go”. I’ll keep messing with it though, pretty much have to without jumping through hoops cause I cut my stepper wires short when putting the can toolheads on.

Oh well, necessity is the mother of all invention.

1 Like

Okay, So I’m kind of beating my head over where this is going wrong.

Here’s where I’m at… I changed up the code in probing under homing_move to add a few things under the probepos check… namely…

        if probe_pos:
            halt_steps = {sp.stepper_name: sp.halt_pos - sp.start_pos
                          for sp in self.stepper_positions}
            trig_steps = {sp.stepper_name: sp.trig_pos - sp.start_pos
                          for sp in self.stepper_positions}
            haltpos = trigpos = self.calc_toolhead_pos(kin_spos, trig_steps)
            if trig_steps != halt_steps:
                haltpos = self.calc_toolhead_pos(kin_spos, halt_steps)
            #The highest halt position is the one with the least amount of steps (the largest number)
            highest_halt = max(halt_steps.values())
            step_diff = {key: highest_halt - value for key, value in halt_steps.items()}
            steppers = {stepper.get_name(): stepper for stepper in kin.get_steppers()}
            step_dists = {stepper.get_name(): stepper.get_step_dist() for stepper in kin.get_steppers()}
            self.FM = self.printer.lookup_object('force_move')
            for z_diff in step_diff:
                self.FM.manual_move(steppers[z_diff], (step_dists[z_diff] * step_diff[z_diff]), 2)

To make sure I’m moving AWAY from the bed, I based the differences on the largest (most positive) number of steps from the bed. (check my logic on that)

Then I take the different from that step count for the others, multiply it by the step distance to get the distance difference and then command a force move to move those other steppers so that they’ve moved an equivalent distance.

Is starts off well but my probing starts to drift higher and higher until my variance is still .02 - .06 mm which is much higher than I was getting on a single board setup.

I’ve tried every which way I can think of to move the steppers but my probing keeps slowly moving higher and higher

Any insight on what I’m missing here?


Edit: Forgot to add, you were right that it’s not a lot of steps that occur with the misalignment. For example below is the halt_steps and the difference from my last probing

halt_steps: {'stepper_z': -2000, 'stepper_z1': -2002, 'stepper_z2': -2012, 'stepper_z3': -2010}

diff: {'stepper_z': 0, 'stepper_z1': 2, 'stepper_z2': 12, 'stepper_z3': 10} 

Another edit! Better news this time. I forgot I was doing the force moves in another section of the code and when I moved it to the homing function I forgot to remove it from the other portion. So I was doubling up on adjustments. No WONDER it was creeping upwards, I was constantly adjusting it upwards.

Anyways… With the code above I’m now at this point…

9:55 PM
probe accuracy results: maximum -0.727500, minimum -0.730000, range 0.002500, average -0.728250, median -0.727500, standard deviation 0.001146
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
probe at 175.000,175.000 is z=-0.730000
9:55 PM
probe at 175.000,175.000 is z=-0.730000
9:55 PM
probe at 175.000,175.000 is z=-0.730000
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
probe at 175.000,175.000 is z=-0.727500
9:55 PM
PROBE_ACCURACY at X:175.000 Y:175.000 Z:4.270 (samples=10 retract=5.000 speed=20.0 lift_speed=20.0)

I will need to test to make sure I’m not merely telling the printer I’m that accurate and that is physically is, since I changed the “set toolhead position” a bit. But it’s a literal order of magnitude better than where I was.

It’s still not back to single board level, but it’s progress and I’m happy with that so far.

As always Kevin - Any insight or guidance would be invaluable as there are large portions of the code I still don’t understand the low level workings of.


Update to an update, I realized my original idea of getting the “previous_mcu_position” at the trigger time was the same as the existing trig_pos in the homing_move function.

So I took a much more straight forward approach.

if probe_pos:
       halt_steps = {sp.stepper_name: sp.halt_pos - sp.start_pos
                  for sp in self.stepper_positions}
       trig_steps = {sp.stepper_name: sp.trig_pos - sp.start_pos
                   for sp in self.stepper_positions}
       difference = {key: trig_steps[key] - halt_steps[key]for key in halt_steps}
            steppers = {stepper.get_name(): stepper for stepper in kin.get_steppers()}
            step_dists = {stepper.get_name(): stepper.get_step_dist() for stepper in 

            self.FM = self.printer.lookup_object('force_move')
            for z_diff in difference:
            self.FM.manual_move(steppers[z_diff], (step_dists[z_diff] * 
                  difference[z_diff]), 5)
            haltpos = trigpos = self.calc_toolhead_pos(kin_spos, trig_steps)

Literally just taking the different of where the steppers halted vs where the trigger position was and moving the steppers back up to the trigger position and then setting the toolhead at that location.

Since as far as I can logic, the halt position will ALWAYS be further than the trigger position (that’s pretty much the definition of an endstop, if it halts before triggering there are other issues)… Then it will always move “up” and away from the bed.

It’s working pretty well so far and seems fairly realistic in probe variance.

There might be a more efficient way to move the steppers than using “FORCE_MOVE” but I couldn’t wrap my head around how to make stepper movements the low level way.

I’ll try it out for a few days to shakeout the bugs and if it doesn’t have anything major wrong with it, or Kevin doesn’t see any logic issues I’ll post it on the Github or in Discord for some others to test.

Don’t mean to make this a blog, but just updating for anyone following.

Did my first print since this experimental change and WOW, It’s some of the best printing I’ve ever done. I know there was always some misalignment caused by the issue described here

But this change effectively eliminates that gap I believe since it commands all steppers back to the “trigger_pos”. So as far as I can tell they’re all aligned to that ACTUAL trigger_pos and it shows.

This is mostly anecdotal for now, I will continue to test. But promising results.

What is to be gained? Saving wire is unimportant. But optimal control of high performance motrers is, or will be important once we start using them. Today the bottleneck is control of the extrusion process. (If wire saving is a goal, use BlueTooth. But its a silly gaol until you have 6 or 8 tool heads)

But the OP’s question is “what is the barrier?” I think it is because of the way Klipper uses CAN. In a robotics application using one MCU per motor works well. The robot has 12-axes They all stay perfectly in sync and we can do a 1KHz PID control loop with CAN in the loop. The trick is to use CAN to move data and not as a transport layer for another protocol.

Klipper is unique in the way it uses the CAN bus.

The root of the problem is illustrated by a story about how a computer programmer makes tea. He has perfected a recipe: First, fill the kettle with cool tap water, place it on the stove until the whistle indicates the water is boiling, then pour the water over a tea bag you have placed in the cup and wait. This has been working well for years. Then one day the programmer wants tea and walks into the kitchen and finds the tea kettle is already filled with boiling water and the whistle is blowing. What does he do? He thinks “No problem” and dumps out the the boiling water so he can apply his tested recipe. He then adds cool tap water to the kettle and so on.

This behavior leads to tall stacks of layered software where each layer converts protocol from what we have to what is required. In our case we have “klippy to MCU” over serial over USB over CAN. The robot has just “CAN” with nothing else.

Protocol stacks save hugh amounts of thinking and debugging and work. For example, if the tea drinking programmer did not dump out the boiling water he would have to invent some way to measure the quality of water in the kettle to know if it was enough to fill a teacup. It was conceptually simpler to dump it out and start over. Plus he knew the result would be what he wanted with no other surprises.

But dumping the water was slower. If you want to pick up the speed you have to give up the savings in thinking, and debugging and re-invent a better way to make tea.

What the robot controller does is at (say) 2,000 times per second it sends a “target point” for each motor to the CAN bus, a target point has a position, velocity and optionally acceleration. this is about 40 bits. for 12 motors this is about 500 bits or 10,000 bits per second. The CAN bus runs at 1Mb/sec so it is not stressed. CAN handles prioritization at the hardware level so there is no harm at all in also sending housekeeping and status information. Remember that CAN has zero software delay. The Pi3 pulls a pin low and all nodes see the low pin with only a speed of light delay. Bus arbitration is done in hardware. It is very good real-time performance.

notice I am not saying “You should do this.” It is a LOT of work. Just that it is certainly possible to synchronize up to 12 motors well enough that the Cheetah robot can do a back flip. The robot commands its moving parts to move fast enough to propel the entire machine up into the air, do a rotation and land on the ground. Printers don’t have to move anywhere near that speed and have three times fewer motors to synchronize. The robot uses about the same computation and communications hardware and the same Linux OS. and even the same kind of CAN bus and types of MCUs.

The inefficiency in Klipper’s use of CAN happens because the MCU serial protocol handles timing and actually scheduling too in the queue. In “pure CAN” this is not needed. The CAN bus operates in real-time. The bus hardware will prioritize the most important data and drop the less important data to be resent later. Using CAN is like finding that the water is already boiling, you should be able to skip the filling and waiting but it is easier not to.

But is 1 or 2 kHz fast enough? it depends what you are sending to the motor. You would need to send position, velocity and acceleration, not step and direction data. As it turns out when you send position and its first two derivatives, in general, you don’t have to update the commands so frequently.

But is this needed for a 3D printer? Printers are way-slow compared to say a beer can filling machine at a Budweiser plant. We just chug along at 200mm/sec and it seems to work well enough.

The bigger problem to be solved is finding a way to measure the plastic flow rate out of ther nozzle. The best we do today is to use filament movement was a proxy. Solvingthis is a better use of limited brain power.

Two month later edit: I wrote .02 layer height multiple times in the comments after this. I don’t know what I was smoking. The general idea is right, the math is way off. Just a note for anyone that reads and thinks “This guy is an idiot.”. Yes, I know.

What’s to be gained is that the implementation already in place in Klipper has a difference in where the endstop is triggered and where the stepper motor halts in multi_mcu setups. It’s not often very much, I think the max I saw was 8-10 steps and an average of about 5. Which for a 1.8 degree stepper with 16 microsteps and a rotation distance of 40 is .0125mm per step so averages out to about .0625mm.
But then you’re probing ~5 times per point… So .3125mm then you probe at multiple points when doing things like Quad gantry leveling.

If wire saving is a goal, use BlueTooth

Bluetooth latency is on the order of 10s of milliseconds at it’s absolute best and it’s very variable making it nearly impossible to have clock synced.

For example… The “aptX Low Latency codec” from Quaalcom is for low latency bluetooth and their “advertised low latency” is 40ms.

.04 s * 10 mm/s movement = .4 mm variance at it’s BEST. Or 20x the layer height of a .02mm layer height. Printing would be a mess.

But its a silly goal until you have 6 or 8 tool heads

You can have 1 toolhead and it be an issue due to the trigger and halt difference described above. The issue just gets WORSE when you add multiple toolheads. I have 7 by the way, but I’m definitely an outlier which is why this came up.

finding a way to measure the plastic flow rate out of ther nozzle

I honestly don’t think we’ll ever fully get there in off the shelf hardware at a cost reasonable enough for a general home user due to the physics of pushing a viscous semi-liquid through a small diameter nozzle. There are too many factors to model.

I don’t just mean the basic ones… Lets list them out…

The “Normal” Factors affecting extrusion:
• Filament diameter and quality
• Nozzle Diameter and condition
• Extrusion Temps
• Print Speed
• Cooling
• Layer Height
• Filament type

The physical factors influencing extrusion:
• Thermal conductivity of the filament, heater and nozzle
• Viscosity of the semi liquid thermoplastic
• Pressure applied to the filament (which is impacted by everything listed here and the actual extruder)
• Gravity (yes, Gravity does in fact impact extrusion)
• Shear Rate (because of the last two and other factors)
• Filament consistency
• Type of Filament and it’s properties and additives (ABS is an amorphous polymer, PLA is a semi-crystalline material. The difference affects how the polymer chains are ordered and plays a factor in the above issues)
• Nozzle geometry
• Friction between the plastic and the nozzle walls and the nozzle tip

And I’m sure I’m missing others.

Plus, work can be done on improving multiple aspects of 3d printing concurrently. It’s not an all or nothing ordeal. Plus, I’m 99% certain I already fixed the issue and I’ve submitted a pull request.

What is the requirement for MCU clock sync. I know “better is better”. but how close does it really need to be. I would not suggest a method unless I knew it would meet a requiremnet. Are we looking at a few usec or a few nanoseconds?

Yes, all those things affect plastic extrision. THat is why it would be ideal to measure it. I used to know an engineer who was a sppecialist in measuring fluid flow in pipes. He dod stuff with ultrasound. The trouble is the cost.

Closed loop X,Y,Z where the posiution loop is closed with linear scales really does solve the pproblem and it is not expecsive. I use it on a milling machine. Closed loopp extrusion will be harder.

The other thing we are seeing in robotics but not in printing is MPC. This is a predictive control where we try to minimize a future cost. In printing that wouild be defined as defects. Time horizons for "The Futurez’ can be up to a few seconds. But with a g-code driven machine runing open loop, look ahead gets you the same thing. But when you close the loop, you can’t look ahead as well.

Klipper is driven by synchronized clocks, you can’t have effective motion control if you don’t have a shared timebase. How do I tell two steppers where to move, how far and for how long if they don’t share a timebase?

I agree on closed loop that’s the obvious next step in 3d printing.

The issue I bring up is solely regarding probing/homing. The time base example I gave above is still true, but as long as you start the movement at the same time on both steppers and they start their clocks at the same time you’re fine.

With probing and homing, you’re relying on a sensor so the command from that sensor to tell each mcu that it was triggered chains across multiple mcus usually.

To the “how close does the timing have to be”, Lets look at it by homing/probing speed.


When you’re trying to hold a .02mm layer height (or even .04) anything over 5ms and you’re already off by multiples of that height. You’ll never get your bed level, your first layer will never be right (without a ton of tweaking), your quad gantry leveling will always been skewed.

So it’s extremely important for anyone that uses a CANBus toolhead with a probe attached to it, or really any multi_mcu printer. XY homing is not such a HUGE deal (although I implemented the same fix for that in my PR cause why not?).

Shared time base? There are several ways. One is what (I thinkk) Kipper does and time tag each command then they are queued until that time and executed. #2. You can send a time , position and a velocity. The time might be in the past or future and the MCU figures of where to be based on appliying the velocity over the time diifference. Ideally all the times are in the future so it works like Kipper but this system can acomidate deplays

With no time sync then you need a fast way to send commands and each is executed immediatly. THis is what the robot cheetah does, All 12 motors work together because there is so little comunications overhead with CAN. There are no queues or buffers. There would be a ripple as each motor stoarts at the different time but with fast enough commnication it does not matter.

You only need about 1 millisecond? That is easy. There are solutions that do sub-microsecond synchronization with no specialized hardware.

If you allow specialized hardware the sky is the limit. But if all you have is a serial data link you can do better than one microsecond. The secret is to send a “sync” message and then immediatly with no delay a “followup” and from there the recieving end can deduce the offset between the clocks and the length of the communications delay. Then over time, hundreds of these messages are sent and the client can use the mean offset and mean delay.

PTP is the protocol most people would use for this. People are getting offset uncertainlies of a few hundred nanoseconds using software-only implementations

If you look up PTP, you might think the standard is overwelmingly complex. It can be because it was designed to synchonize hundreds of clocks over a network with complex topology. In the simple case we have, of two clocks, one master clock and one slave clock and one data line, the protocol is very simple.

eating popcorn and reading walls of theoretical advice with no practical benefit :popcorn: :beer: