Klipper 400MHz limitations

D4SK · February 1, 2025, 2:41am

550mhz may reqire different voltage scaling or flash latency, you can search VOS or flash latency in the stm32h7.c file.

also it could mess up some other part of the clock scaling system where the frequency of a peripheral is now too high, so it could require some different clock dividers etc

nefelim4ag · February 1, 2025, 5:22pm

550mhz may reqire different voltage scaling or flash latency, you can search VOS or flash latency in the stm32h7.c file.

IDK, from the high-level view there is a handling for that case (higher frequency).
It is switched to VOS0 (unconditionally), changes flash latency and for some reason enables Power ByPass.
That is the reason, why I easily can just crank up the frequency and it still boots and mostly works.

Honestly, I’m not experienced with that to say something meaningful or try to debug it.
I can only add to anamnesis:

that all errors related to Software SPI, for some time it was just faster than hardware in stm32h7. Even with software rate limiting, I got IO errors.
I2C can be fixed by recalculating scalers
HW SPI, looks like, will be better off with a lower main frequency: 510_000_000 / 128 = 3984375.0 because of TMC limitations.
it looks like on higher frequency GPIO goes weaker, and with that, my 1 meter i2c starts to produce errors sometimes (weak pull-ups), this is speculation, cause it just worked fine before and works fine now, if I do not try to overload the board.

Or there should be changes to switch the peripheral clock source to a different PLL, and/or OSPEED increase, I’m not sure.

Also, this code is used for STM32H743, which has a limit of 480 MHz.
Which is also can be worked out with more ifdefs, but I’m not sure it is worth it.

For now, I think there should be some goals.
For me, it is not like going as high as possible, it is more like unlocking some limits:

Like for pulse stepping which is substantially slower right now
It was interesting to unlock the edge event, and I hoped in my heart, to get over 10000k with 3 steppers
Higher overall step rate, like for some corner cases (crazy deltas with 256 microstepping & awd).

I truly love Octopus Pro for the possibility and freedom it gives.

As a matter of bonus: It is really cool to query motor encoder with 50kHz (or higher). It can be sampled faster than an accelerometer, but it is niche and can only matter for developing/testing stuff.

FWIW, I think, that I found all the places where klipper needs to be changed to support higher frequencies, if there is an interest in that - they can be upstreamed and may be changed to work slightly differently. They are mostly tested.
That only matters if there is a will to allow STM32H7* to go higher and the request for that.

So, this can be only profitable right now, if there are setups like AWD:

1000 mm/s limit, awd, 256 micro steps.
(1000/40) * 200 * 256 * 4 = 5120000 steps per second peak.
(1440/40) * 200 * 256 * 2 = 3686400 steps for CoreXY and diagonal movements.

Or for good enough servos, which will use pulse infrastructure and will get a much lower step rate, because of software limitations.

1000/40 * 80000 = 2000000 steps per second
one servo is enough to get 'Stepper too far in past'

Alas, from my experience, people sometimes get confused and think that TTC events come from slow MCU, and not from slow SBC or because of connection issues.

mykepredko · February 1, 2025, 6:30pm

Can I ask, what problem are you trying to solve here?

As I understand it, FDM 3D printing is limited by material and mechanical considerations.

With main controller boards running at significantly less than 400MHz, you can get 64 or more microsteps running on multiple steppers without any issues. Other than that, you don’t have any significant cycle sinks that you have to provide for - even with more complex architectures (ie IDEX or CoreXY-UV) I haven’t heard of any problems that additional main controller MCU MIPs will fix.

It’s interesting that people really don’t appreciate how much computing power we have at a remarkably low cost. Don’t forget that Klipper runs very well on ATMega 8 bit systems running at 16MHz. ARM Cortex processors (which is what is being discussed here) are significantly more efficient at processing data (due to optimized instruction sets, pipelining and 32bit data sizes) as well as being many times faster than the basic main controller board MCUs.

I get that it’s cool to push processors as fast as possible but if they’re driving mechanical systems that are limited by physics, there’s no obvious benefit to “crank up the frequency” especially when you end up with something that “mostly works”.

This is really where you lose me.

nefelim4ag · February 1, 2025, 7:12pm

My apologies, looks like I should prefer rewriting posts, instead of appending information to them, which in the end creates some confusion.

I will quote myself.

prehistory: a buddy of mine runs servos with really high microstepping because of some servo controller limitation (like 80000 per rotation). Servo controllers use pulses instead of edges for stepping, so it is easily got Stepper too far in past

So, for reference, with “normal” stepper drivers (pulse limit is 44 ticks ~ 100ns) the limit is:

171 ticks or 2339K (400Mhz) → 2544K (435Mhz)

361 ticks or 2216K → 2410K

535 ticks or 2242K → 2439K

And quote again:

Or for good enough servos, which will use pulse infrastructure and will get a much lower step rate, because of software limitations.

1000/40 * 80000 = 2000000 steps per second
one servo is enough to get 'Stepper too far in past'

Alas, I can’t provide information why servos can prefer higher microstepping levels. I do not own at least one.

as well as being many times faster than the basic main controller board MCUs.

If I understood you correctly, this is not factually correct.
SBC can be faster, by a high margin

I get that it’s cool to push processors as fast as possible but if they’re driving mechanical systems that are limited by physics, there’s no obvious benefit to “crank up the frequency” especially when you end up with something that “mostly works”.
This is really where you lose me.

Fair enough, I expected someone would stick to that mostly works phrase.
If you spend a little more time reading the posts above, you will notice that I described my initial reasons. Also, I described when I had problems, and what at 435MHz, where timing issues should arrive it works, cause it is pretty simple to fix.
Same with 480Mhz, it works, because it looks like the current initialization code is suitable for that frequency.
And finally, with 550Mhz, there are issues.

Alas, I have my own knowledge limitations, so I can’t provide a one-line fix for that.
And because of sanity reasons, I said that moving farther may not be worth the time investment, because of the same reason you mentioned, it is already “fast enough”, a peripheral clock & etc.

mykepredko · February 1, 2025, 8:51pm

Is this an actual problem or a theoretical one? It’s not clear from what you’ve written.

What is the expected speed of the printer? What is the speed at which you’re trying to move the toolhead at?

The approach you’re taking seems to be taken from here:
https://www.klipper3d.org/Benchmarks.html

Maybe it’s buried above but I don’t see a correlation between ticks and stepper/servo turn rate. Ideally, I’d like to understand what speeds are we talking about with a 2GT and a 16/20 tooth pulley.

nefelim4ag:

And quote again:

Or for good enough servos, which will use pulse infrastructure and will get a much lower step rate, because of software limitations.
1000/40 * 80000 = 2000000 steps per second
one servo is enough to get 'Stepper too far in past'

So, what does this mean in the real world? What will be the speed of the toolhead running at this speed?

I don’t think you do - I’m talking about the difference between a main controller board running an ATMega and one that is running an STM32F466

Again, what is the real world problem that you’re trying to solve?

If it’s your buddy running a printer with 80k step/rotation servos can’t run at 200mm/s (which I believe is the practical maximum speed for FDM) then I get it but right now I don’t see anything that tells me that there is a practical application for this work versus it just being a thought exercise.

nefelim4ag · February 1, 2025, 10:36pm

This is completely depends on your point of view. On CoreXY where we limit tool head motion, but not the motor, motors can go higher up to sqrt(2) = 1.41.

I can agree on the point, like if you can’t go that speed - do not use it. 500-700 mm/s is still a pretty high limit.

So, it is practical in the sense that you can encounter it. Calculations are provided below.

Yes, but it is for TMC drivers with step-on edge enabled.
There are small modifications needed to test the general pulse code.
invert_step=0 step_pulse_ticks=44

General widespread gt2 with 20 teeth pulley, will give you 40 mm per rotation.
Then you work with your servos and try to overcome servo-specific issues (like salmon skin, lags & etc), crank up resolution, and get “Stepper too far in past”, because you tried to move the toolhead too fast, like 1000 mm/s mentioned above.

Got confused, checked benchmarks, did calculations, realized something was off, and you didn’t get close to theoretical board limits.

This is where I come into play because I got confused too, and start to dig, into what causes that and what can be done with that. (like speedup stepper code & to speedup MCU)

My mistake. Got it.

This is where I can’t agree. I don’t think this is the right topic to argue about printing speeds, in the sense like “This is practical and this is not, because I think so”.
If there is a will to argue about that in any sense, I can suggest moving to the general category.

For the current 3d printing epoch running higher than 200mm/s is a general recommendation because it allows us to completely avoid issues with VFA and motor resonances because they mostly happen at speeds below that, but I should notice, that it makes sense only for steppers.

Just as a personal example, I’m printing strongly faster > 200mm/s, up to 500 mm/s, with a travel speed of 1000 mm/s. It is totally normal and works normally on my pretty heavy/large 400 Ratrig. I like 0.1-layer thickness, nice overhangs, the perfect surface at that speed, and large prints. I like that, and can suggest anyone to try if your situation allows that.

So, I suggest avoiding arguing in that sense, there is obviously someone who will say 30mm/s is fine for everyone. The same is true for the other side of the spectrum.

From a practical perspective, we know there are hot ends with high flow capacity like >50 mm^3/s. It is pretty easy to use 0.1 layers, whereas, with a 0.4 nozzle, you get 50 / 0.4 / 0.1 = 1250 mm/s. (Mine is ~25mm^3/s btw).

So, I think 1000 mm/s is a pretty reasonable limit if we want to define one.

Hope it helps to understand what I’m talking about.

mykepredko · February 1, 2025, 10:52pm

I don’t think there’s any reason to discuss printer speeds but I did (and do) think it’s important to talk about what kind of speeds we are talking about here.

Thank you for giving me that background.

It sounds like you have a pretty impressive printer - you should post specs/photos of it.

Sineos · February 2, 2025, 9:41am

I appreciate this initiative, because the MagnetoX suffers from the same effect.

Its linear motors are also driven by a step pulse interface with a step_pulse_duration: 0.0000002. If you command a movement between 15k and 20k acceleration, it is quite easy to run into a "Stepper too far in past" error.

It seems exactly like @nefelim4ag explained: What we commonly know may be true for the “traditional” setup and TMCs with their stepping both on rising and falling edges. As soon as an “uncommon” setups come into play, things may look different.

D4SK · February 2, 2025, 7:31pm

480mhz should work, its just that the 550mhz variants didn’t exist when the smt34h7.c code was written

koconnor · February 3, 2025, 2:38am

FWIW, we can certainly update the code to avoid the 400 Mhz limitation (or more precisely, the 2**32/10 hz limitation). There are a few things I’m aware of:

There are various places in the code that attempt to schedule timers in the mcu up to 5 seconds in the future - they would need to be reworked to a time that would fit in a signed 32-bit integer.
Any really fast mcus that could conceivably step more than ~10Million times per second would need to disable the CONFIG_HAVE_STEPPER_BOTH_EDGE optimization.

FWIW, it may not actually be an overall improvement for the majority of users, so we’d have to weight the overall pros/cons prior to merging in the mainline Klipper branch. It might still be worthwhile though.

Cheers,
-Kevin

nefelim4ag · February 3, 2025, 4:59am

So, then, if it may be considered worthwhile,
I opened PR with the proposed changes and things that I’m aware of.

github.com/Klipper3d/klipper

RFC: Added support for fast MCU

master ← nefelim4ag:fast-mcu-support

opened 03:59AM - 03 Feb 25 UTC

nefelim4ag

+53 -6

This is the patch set from the discussion: https://klipper.discourse.group/t/kli…pper-400mhz-limitations/15807 This PR is the group of patches, which fixes or move all known to me limitations, which can cause issues with STM32H7 running with timer frequency, which violates current requirements: `2**32/10` hz Shortly, allow Klipper to work on MCU where timer overflows in less than 10s. My personal opinion: It feels more right to recompute constants instead of just changing them for everyone. List of required changes, because implementation may change before merge: - Stats are reported every 5s, which causes computation overflow. - Heaters expect a max duration of 5s. - PWM Tools expect a max duration of 5s. - Step on edge will violate 100ns limit between edges. - Software SPI will run faster than 4Mhz - brakes TMC SPI communication. My short self-review here: - Software spi can be limited in different ways, with a wait loop it can run slower than before or maybe enabled only for "fast" MCUs. - Stats limited to 3s for simplicity (initially I just recomputed it based on timer frequency). - ClockSync changes added "just to be sure", it should work without them. - PWM max duration reduced to 3.5 because `2000_000_000 / 550_000_000 = 3.63` - PWM recomputed based on MCU frequency as PoC, to not alter behavior for normal ones. - Step on edge limited in the most non-intrusive way that I can find. Step on edge numbers: 550Mhz: 1 stepper is 49 ticks, so formally there are 11224K. 3 steppers as on the benchmark page - 195 ticks. On the oscilloscope, there are 210ns per pulse - 105 ns per step ~ 57 ticks. 480Mhz: 1 stepper is 46 ticks, so formally there are 10435K.

I humbly argue to not going that way:

That will break data on the benchmarks page.
It is substantially slower than the current approach. Ticks do not scale linearly and decrease with the amount of steppers.
Like 400->480Mhz (stm32h743) +20% (total ticks).
Disabling edge optimization (3 steppers) 171 → 535 -70% (ticks per step).
Enable inlining +25% (ticks per step).
Which probably leaves us with ~ 480_000_000 / (535 * 0.75) = 480_000_000 / (535 * 0.75) = 1196K per stepper (totally 1196 * 3 = 3588K). That will break someone’s setup (and mine , if I enable 256ms).
I hope the disabling of edge inlining (which may add enough overhead) or just a conditional busy loop (which I proposed above and in the PR) can be an acceptable solution here.

Thanks.

With that said, I may suggest leaving this topic mostly for stm32h7-related changes, if someone is willing to participate and change the code to support >480Mhz, I may also provide some assistance.

BTW, STM32H7 because of the architecture of the chip and architecture of the Klipper will be theoretically limited to ~24Mhz pins toggle, then there is ODR overhead and then scheduler.

So, this is why I saw ~5.6Mhz * 4 = 22.4Mhz pins access for SW SPI on 550 MHz. This is high, but it is a known limit now.

nefelim4ag · March 30, 2025, 9:08pm

I spent some time trying to find what is wrong with STM32H723 at 550 MHz.
Everything is still working at 512 MHz.

What is wrong? At least Software SPI is broken, there is too much zeroes:

// GCONF:      00000004 en_pwm_mode=1
// GSTAT:      0000000c uv_cp=1(Undervoltage!) register_reset=1
// IOIN:       0000101c encb=1 enca=1 drv_enn=1 output=1 version=0x0
// DRV_CONF:   00000000
// GLOBALSCALER: 000000e1 globalscaler=225
// IHOLD_IRUN: 00020f0f ihold=15 irun=15 iholddelay=2
// TPOWERDOWN: 00000000
// TSTEP:      0007ffff tstep=524287
// TPWMTHRS:   00000000
// TCOOLTHRS:  00000000
// THIGH:      00000000
// ADC_VSUPPLY_AIN: 00e60003 adc_vsupply=0x0003(0.029V) adc_ain=0x00e6(70.196mV)
// ADC_TEMP:   000000c8 adc_temp=0x00c8(-238.7C)
// MSCNT:      000001e2 mscnt=482
// MSCURACT:   007000c8 cur_a=200 cur_b=112
// CHOPCONF:   20000030 hstrt=3 mres=0(256usteps) dedge=1
// COOLCONF:   00000000
// DRV_STATUS: 80000000 cs_actual=0(Reset?) stst=1
// PWMCONF:    c004000c pwm_ofs=12 pwm_autoscale=1 pwm_lim=12
// PWM_SCALE:  00000000
// PWM_AUTO:   0000000c pwm_ofs_auto=12
// SG4_THRS:   00000000
// SG4_RESULT: 00000000
// SG4_IND:    00000000

This is with all the patches needed to make klipper work at higher timer frequencies.
So TMC SCK is correct <5MHz (I’m testing 5MHz for a few weeks).
I checked with an oscilloscope that it is true.
It does work with lower MCU clock, but does not with 550Mhz.

With 4Mhz as the target it is working, which also seems strange.

I was unable to find a fix.

What is probably wrong so far, but has no effect on this issue:

H723 has maximum VOC frequency of 836Mhz vs 960Mhz on H743. If I understood the code correctly, right now VOC Frequency is equal to x2 of the PLL1 frequency.

image967×570 136 KB
~~I2C buses has a limit of 125Mhz peripheral frequency. Right now peripheral clock is 1/4 of PLL1. So, 120Mhz at 480Mhz and 128Mhz at 512Mhz.~~ My bad, reference manuals are too similar.

image1441×977 365 KB

So, I tried to make VOC equal to PLL1, no effect.

    uint32_t pll_freq = CONFIG_CLOCK_FREQ * 2;
    // H723 has a different VOC limit 836Mhz
    if (CONFIG_MACH_STM32H723 && pll_freq > 836000000) {
        pll_freq = CONFIG_CLOCK_FREQ;
    }

Tried to increase dividers x2, so all peripherials and memory buses should have a lower base clock.
No effect.

#define PDIV (CONFIG_CLOCK_FREQ > 500000000 ? 8 : 4)
#define FREQ_PERIPH (CONFIG_CLOCK_FREQ / PDIV)
...
    MODIFY_REG(RCC->D1CFGR, RCC_D1CFGR_HPRE,    RCC_D1CFGR_HPRE_3);
    MODIFY_REG(RCC->D1CFGR, RCC_D1CFGR_D1PPRE,  RCC_D1CFGR_D1PPRE_DIV4);
    MODIFY_REG(RCC->D2CFGR, RCC_D2CFGR_D2PPRE1, RCC_D2CFGR_D2PPRE1_DIV4);
    MODIFY_REG(RCC->D2CFGR, RCC_D2CFGR_D2PPRE2, RCC_D2CFGR_D2PPRE2_DIV4);
    MODIFY_REG(RCC->D3CFGR, RCC_D3CFGR_D3PPRE,  RCC_D3CFGR_D3PPRE_DIV4);

So, that is it for now.

Regardless, HW SPI looks like working.
HW I2C with lower peripheral clock also looks correct.

Okay, I did slightly more tries, it works at ~4.7Mhz.
Not sure, why MCU base frequency affecting that.

https://www.st.com/resource/en/reference_manual/rm0433-stm32h742-stm32h743753-and-stm32h750-value-line-advanced-armbased-32bit-mcus-stmicroelectronics.pdf
https://www.st.com/resource/en/reference_manual/rm0468-stm32h723733-stm32h725735-and-stm32h730-value-line-advanced-armbased-32bit-mcus-stmicroelectronics.pdf

Topic		Replies	Views
MCU 'mcu' shutdown: Rescheduled timer in the past - Host Buffer 100% General Discussion	22	1723	August 16, 2024
Porting some idea's to Klipper, freq / amp modulated output shaper, (can someone suggest a proper name for this methodology?) Developers	6	1718	January 24, 2022
Hc32f460 MCU Clock Speed Issue/Bug General Discussion	12	1192	April 11, 2024
"Timer too close", always at the same point in gcode General Discussion	28	536	June 28, 2024
MCU 'mcu' shutdown: Timer too close with SKR Mini E3 1.2 and Debian VM General Discussion	31	8794	April 24, 2024

Klipper 400MHz limitations

Related topics