Klipper 400MHz limitations

According to this post, the microcontroller cannot run faster than 400 MHz.

I was curious about why and where these restrictions appear. And is it possible to fix it somehow?

The first thing I realized is that the STM32H7 microcontrollers do not have a 64-bit hardware timer. This is not a problem because some microcontrollers don’t even have 32-bit timers, however klipper has a solution.

Secondly, as far as I understand, the klipper source code always uses the uint32_t data type for time. Is it safe to change this type to 64-bit? Will there be any problems with 8-bit architects? And do we need any changes for host code?

@koconnor said it’s been about 10 seconds, but I don’t understand where this number comes from? Is there any way we can change this?

Klipper at it’s core relies on strict timing between all sections (Seperate mcus, stepper pulses etc.)

I don’t have an example of specifically where he’s referring to the 10 second limit from as there is a lot of timing related code and checks in various locations.

But 400 Mhz = 2.5ns per clock tick
10 seconds / 2.5 ns = 4 x 10^9
uint32 can hold 4.294967296 × 10^9 values

Hence it would overflow the clock within that 10 seconds expected timing window.

Will there be any problems with 8-bit architects?

No, Because they stay within the timing window specified above.

When the timer hits a max value in the mcus faster than 400 mhz it rolls over to zero and starts again (simplistic view, depends on the mcu, what timer is used etc.)

When Klipper sees discrepancies in the timing sync it throws and error and shuts down.

Is it safe to change this type to 64-bit?
Is there any way we can change this?

What are you trying to do? Keep in mind that everything is Klipper is about driving stepper motors to move things around. That’s what it’s built for, that’s what it does.

As stated in the other post

which is very close to the minimum supported by the trinamic drivers (~103ns)

Of course there is some overhead in the mcu so it isn’t a stepper pulse per clock cycle but some of the faster mcus are already at the limit of the stepper drivers. But if you’re curious you can see the existing processor benchmarks here.

https://www.klipper3d.org/Benchmarks.html#micro-controller-benchmarks

One of the best is the RP2040 and it’s a 133 Mhz processor

rp2040 ticks
1 stepper 5
3 stepper 22

So really, It all comes down to this… What are you trying to do?

A faster mcu isn’t going to result in faster prints because you can’t drive the stepper motors faster than certain step speeds. Your physical printer setup will be a barrier long before the software/processor is any kind of impediment.

Edit: Also see here for more details about the timing. I forgot the code does handle timer roll over (it would have to), and it converts 32 bit timers to 64 bits and vice versa.

https://www.klipper3d.org/Code_Overview.html#time

1 Like

I think this was mostly answered above. But to repeat, the micro-controllers use a signed 32bit integer to track times. Currently heaters program a maximum “heater pin on time” of 5 seconds in the mcu. So, the mcu has to be able to schedule out an event at least 5 seconds into the future without rolling over a signed 32bit integer. At 400Mhz, five seconds is 2,000,000,000 which fits. Indeed, one could go up to about 429Mhz and still fit. If using a clock rate faster than that, then an event scheduled 5 seconds in advance would appear to be in the past and result in errors.

Changing to something other than 32bit signed integers in the mcu code is unlikely to be a solution as it would likely decrease code efficiency significantly. It may be possible to audit all the code and never use a timer that far in advance (eg, limit to 3 seconds in the future), but that would be a bit of work.

In most cases, we don’t need to run the micro-controllers at over 400Mhz as they are already “more than fast enough” at 400Mhz.

Cheers,
-Kevin

@TheFuzzyGiggler @koconnor thanks for the detailed answer!

However, I did not understand one thing. Kevin said about 32bit signed integer. Why not a 32bit unsigned integer?

  • Consistency in calculations
  • Unambigious
  • The ability to count down as well as up

Times in the mcu are tracked as 32bit numbers. The code can’t do a direct comparison between times (eg, now_time == scheduled_time) because one can’t guarantee the comparison will be at just the right tick (the desired time may have been a few ticks earlier). So, the code needs to compare if the desired time has elapsed. However, directly doing that does not work (eg, now_time >= scheduled_time) because of 32bit counter rollovers (eg, now_time might be less than scheduled_time only because of a counter rollover). So, the code performs comparisons by checking time differences (eg, (int32_t)(now_time - schedule_time) >= 0). This is efficient to calculate and works reliably even during counter rollovers, but it does limit the maximum schedule time to no more than a 2^31 ticks in the future.

-Kevin

I looked at /src/generic/timer_irq.c.
If I understood correctly, this is some kind of tricky solution, assuming that A < B but if the distance between events is less than half of the virtual timeline. Right?
MCU timeline

Again, This is one of those situations where, if you said what you were trying to DO you’d probably get a better, more direct answer.

Are you just curious about how Klipper works and why it’s setup like that?

Are you trying to modify something for a specific end goal?

Help us understand what the underlying question is to give you the answer you’re actually looking for.

Again, That question might actually be just how the timers work that way and why, but it seems like you’re trying to achieve something because you keep asking about changing things.

2 Likes

As you said, I’m really interested in understanding how it works.

And yes, I would like to use more than 400 MHz just because the MCU can.

However, unfortunately, I still do not see a simple and elegant way to do this. 64-bit timestamps will really greatly reduce the performance of weak MCUs. Using 64-bit timestamps only for the H7 MCU leads to support for 2 host-to-MCU communication APIs, which is bad. We can use the same trick as on RP2040, use a separate clock source for the timer, even if it is 10 times slower than the MCU frequency, if I understand correctly how it works, it will not harm and will really increase the performance of the MCU and the maximum step rate. But as if the solution rather looks like something is not very correct.

P.S. I understand that it really won’t be possible to use all this performance for one driver. The change is likely for AWD systems, where 5 drivers are used at high speeds (X, Y, X1, Y1, E) and for complex IDEX assemblies, where there can be 10 motors on one MCU at the same time, which can be used simultaneously (X, U, Y, Y1, Z, Z1, Z2, Z3, E, E1)

FYI, if one were to increase the mcu frequency above 400Mhz, then one would likely need to disable the CONFIG_HAVE_STEPPER_BOTH_EDGE optimization. Currently that optimization is used to improve the performance of Trinamic stepper motor drivers. Those drivers need 100ns between steps (40 clock ticks at 400Mhz). The optimization takes advantage of the fact that the mcu can’t complete the Klipper step pulse and scheduling work in less than 100ns.

However, if the mcu frequency is notably increased then the optimization would need to be disabled. The optimization roughly doubles the number of steps per second that the mcu can schedule. So, it’s unlikely a faster mcu speed would increase stepping performance unless one were to go over 800Mhz.

Cheers,
-Kevin

P.S. - check out Benchmarks - Klipper documentation for the measurements of the number of clock ticks per Klipper stepper pulse on each mcu.

1 Like

Kevin, thank you so much for your reply.

I found the discussion where there is information on this optimization. The only thing I did not understand is whether the timing of 100ns will not be observed only when the stepper motors are moving at high speeds, or will this problem arise anyway? I mean, can we change the step pin output more often than at the speed at which we want to rotate the stepper motor?

P.S. About benchmarks. Are the RP2040 results in processor ticks or in clock cycles of the 12 MHz counter? Apparently, it is the counter cycles that are somewhat confusing when viewing these results.

The 100ns is built into the chip itself. Not to mention that Stepper motors can physically only be driven so fast.

From a cursory reading it seems like any faster than that the chip will consider it spurious noise and try to filter it out.

100ns is 10 million steps per second, or for a 1.9 degree stepper motor (200 steps per rotations), that’s 50,000 RPS (per second, not per minute).

Not that I’ve ever tried or will ever try but for one, I don’t think a stepper motor could turn anywhere near that fast due to the physics of their construction and two, if they ever did spin that fast they’d almost certainly tear themselves apart from centrifugal force and be an extremely deadly high speed shrapnel weapon if the case didn’t contain it.

I’m sure the 100ns is due to micro-stepping, but that’s still nearly 200 RPS. Far faster than a stepper could feasibly turn and even if it tried to approach that you’d likely have zero torque so it would be useless.

can we change the step pin output more often than at the speed at which we want to rotate the stepper motor?

Why would you want to? and per the data sheet, no, it would see it as noise and try to filter it (if I’m reading that right)

1 Like

@TheFuzzyGiggler I think you completely misunderstood me (or I explained it badly).

  1. All calculations about 50,000 RPS do not make any sense, because no one controls the motors in full-step mode
  2. No one is going to spin a stepper motor to the speed of light. Kevin said that CONFIG_HAVE_STEPPER_BOTH_EDGE only works stable because of the MCU cannot schedule events faster than 100ns. So my question was: Why would we need to control the step pin with an interval of less than 100ns, if there is no purpose to turn the stepper motor at these speeds?

Thus, is this limitation of MCU performance a “natural” protection mechanism against violation the limit of 100 ns, or is it an important “feature” without which optimization will not work at all?

1 Like

It’s a general limitation even if not stepping at those rates. As an example, if the motor is stepping once every 50us, but the mcu gets busy for a few 100us, then the klipper mcu will “catch up” by running all the pending timers as fast as it can - thus the code needs the timing enforcement regardless of desired scheduling.

Yes, the numbers are in “scheduling ticks” and not “processor ticks”.

-Kevin

1 Like

You made me curious.

Turns out, assuming a standard GT2 pully with a diameter in the grooves of 12mm, the linear speed at point on that circumference for a stepper turning at 50,000 RPS would be ~1885 m/s

Which, doing the math for the acceleration is 6.04 x 10^7 g’s of acceleration.

To get up to the speed of light at a point on the circumference you’d need a pully with a diameter of ~2km or a little over a mile.

Don’t think that’ll fit in my printer, nor do I think my NEMA 17 steppers have the torque to turn it.

Plus, I’m pretty sure that would be considered creating a WMD. Which is generally frowned upon.

No relativistic 3d printing speeds for me. :cry:


Edit: For fun I went a little further…

Assume a 1/8" ball bearing attached somehow at a point on the circumference of our imaginary pulley.

A 1/8" ball bearing has a weight of .131 grams or .000131 kg.

Using the relativistic kinetic energy equation (y -1)mc^2 where y is the Lorentz factor (1/sqrt(1-v^2/c^2))…

Linear speed of that pulley would be 2.78x10^8 m/s, just baaaaarely under light speed.

Lorentz factor in that case would be 2.67… Plug it into the equation and with the ball bearing mass and c and you get…

1.97x10^13 joules or a 4.7 kiloton nuke

Definitely a WMD for a 1/8" ball bearing

Could wipe LAX off the map
NukeMap

1 Like

I think this topic is just missing a part where someone wants to remove the 400 Mhz limit, to hit the limits.
This is not so important of course but may have value, like for mad deltas with 256 microstepping, or AWD.
Let’s ignore the practical side here.

RP2040 is a bad example because these ticks are in internal timer ticks, not processor clocks.
Actually, it is 5 times slower than STM32H723 (550Mhz) where we can hit that limit.
For reference, RP2040 runs at 150 Mhz but has 2 cores (I think only one is used).

About the TMC limit, if TMC has an external clock reference faster than 12Mhz - it can support “faster” timings, but that’s for hardware guys, I think no one here has actually tried that.

About Klippy, it is tricky.
But at least Edge optimization is fairly easy - a patchwork solution:

// patch just in case
--- a/src/stepper.c
+++ b/src/stepper.c
@@ -77,6 +77,8 @@ stepper_load_next(struct stepper *s)
     s->interval = m->interval + m->add;
     if (HAVE_SINGLE_SCHEDULE && s->flags & SF_SINGLE_SCHED) {
         s->time.waketime += m->interval;
+        if (CONFIG_CLOCK_FREQ > 400000000)
+            s->next_step_time += m->interval;
         if (HAVE_AVR_OPTIMIZATION)
             s->flags = m->add ? s->flags|SF_HAVE_ADD : s->flags & ~SF_HAVE_ADD;
         s->count = m->count;
// UnOptimized step function to step on each step pin edge
uint_fast8_t
stepper_event_edge(struct timer *t)
{
    struct stepper *s = container_of(t, struct stepper, time);
    gpio_out_toggle_noirq(s->step_pin);
    uint32_t count = s->count - 1;
    if (likely(count)) {
        s->count = count;
        s->interval += s->add;
        if (CONFIG_CLOCK_FREQ <= 400000000) {
            s->time.waketime += s->interval;
            return SF_RESCHEDULE;
        } else {
            uint32_t curtime = timer_read_time();
            uint32_t min_next_time = curtime + s->step_pulse_ticks;
            s->next_step_time += s->interval;
            if (unlikely(timer_is_before(s->next_step_time, min_next_time))) {
                // The next step event is too close - push it back
                s->time.waketime = min_next_time;
                return SF_RESCHEDULE;
            }
            s->time.waketime = s->next_step_time;
            return SF_RESCHEDULE;
        }
        return SF_RESCHEDULE;
    }
    return stepper_load_next(s);
}

It still will be faster than dedge=0, but slower than the optimized one, not sure how much slower.
*for reference dedge=0 stepper_event_full() has 51
By taking into account assembly instructions 20 → 30, slower by 33%.
400 * 1.33 = 532 Mhz
With taking into account branching - slightly less.

*I’m not sure which part of the code can schedule it if the timer has passed, I still do not fully understand how all things work together underneath