Klipper 400MHz limitations

550mhz may reqire different voltage scaling or flash latency, you can search VOS or flash latency in the stm32h7.c file.

also it could mess up some other part of the clock scaling system where the frequency of a peripheral is now too high, so it could require some different clock dividers etc

550mhz may reqire different voltage scaling or flash latency, you can search VOS or flash latency in the stm32h7.c file.

IDK, from the high-level view there is a handling for that case (higher frequency).
It is switched to VOS0 (unconditionally), changes flash latency and for some reason enables Power ByPass.
That is the reason, why I easily can just crank up the frequency and it still boots and mostly works.

Honestly, I’m not experienced with that to say something meaningful or try to debug it.
I can only add to anamnesis:

  • that all errors related to Software SPI, for some time it was just faster than hardware in stm32h7. Even with software rate limiting, I got IO errors.
  • I2C can be fixed by recalculating scalers
  • HW SPI, looks like, will be better off with a lower main frequency: 510_000_000 / 128 = 3984375.0 because of TMC limitations.
  • it looks like on higher frequency GPIO goes weaker, and with that, my 1 meter i2c starts to produce errors sometimes (weak pull-ups), this is speculation, cause it just worked fine before and works fine now, if I do not try to overload the board.

Or there should be changes to switch the peripheral clock source to a different PLL, and/or OSPEED increase, I’m not sure.

Also, this code is used for STM32H743, which has a limit of 480 MHz.
Which is also can be worked out with more ifdefs, but I’m not sure it is worth it.

For now, I think there should be some goals.
For me, it is not like going as high as possible, it is more like unlocking some limits:

  • Like for pulse stepping which is substantially slower right now
  • It was interesting to unlock the edge event, and I hoped in my heart, to get over 10000k with 3 steppers :smiley:
  • Higher overall step rate, like for some corner cases (crazy deltas with 256 microstepping & awd).

I truly love Octopus Pro for the possibility and freedom it gives.

As a matter of bonus: It is really cool to query motor encoder with 50kHz (or higher). It can be sampled faster than an accelerometer, but it is niche and can only matter for developing/testing stuff.

FWIW, I think, that I found all the places where klipper needs to be changed to support higher frequencies, if there is an interest in that - they can be upstreamed and may be changed to work slightly differently. They are mostly tested.
That only matters if there is a will to allow STM32H7* to go higher and the request for that.

So, this can be only profitable right now, if there are setups like AWD:

1000 mm/s limit, awd, 256 micro steps.
(1000/40) * 200 * 256 * 4 = 5120000 steps per second peak.
(1440/40) * 200 * 256 * 2 = 3686400 steps for CoreXY and diagonal movements.

Or for good enough servos, which will use pulse infrastructure and will get a much lower step rate, because of software limitations.

1000/40 * 80000 = 2000000 steps per second
one servo is enough to get 'Stepper too far in past'

Alas, from my experience, people sometimes get confused and think that TTC events come from slow MCU, and not from slow SBC or because of connection issues.

Can I ask, what problem are you trying to solve here?

As I understand it, FDM 3D printing is limited by material and mechanical considerations.

With main controller boards running at significantly less than 400MHz, you can get 64 or more microsteps running on multiple steppers without any issues. Other than that, you don’t have any significant cycle sinks that you have to provide for - even with more complex architectures (ie IDEX or CoreXY-UV) I haven’t heard of any problems that additional main controller MCU MIPs will fix.

It’s interesting that people really don’t appreciate how much computing power we have at a remarkably low cost. Don’t forget that Klipper runs very well on ATMega 8 bit systems running at 16MHz. ARM Cortex processors (which is what is being discussed here) are significantly more efficient at processing data (due to optimized instruction sets, pipelining and 32bit data sizes) as well as being many times faster than the basic main controller board MCUs.

I get that it’s cool to push processors as fast as possible but if they’re driving mechanical systems that are limited by physics, there’s no obvious benefit to “crank up the frequency” especially when you end up with something that “mostly works”.

This is really where you lose me.

1 Like

My apologies, looks like I should prefer rewriting posts, instead of appending information to them, which in the end creates some confusion.

I will quote myself.

prehistory: a buddy of mine runs servos with really high microstepping because of some servo controller limitation (like 80000 per rotation). Servo controllers use pulses instead of edges for stepping, so it is easily got Stepper too far in past

So, for reference, with “normal” stepper drivers (pulse limit is 44 ticks ~ 100ns) the limit is:

  1. 171 ticks or 2339K (400Mhz) → 2544K (435Mhz)
  2. 361 ticks or 2216K → 2410K
  3. 535 ticks or 2242K → 2439K

And quote again:

Or for good enough servos, which will use pulse infrastructure and will get a much lower step rate, because of software limitations.

1000/40 * 80000 = 2000000 steps per second
one servo is enough to get 'Stepper too far in past'

Alas, I can’t provide information why servos can prefer higher microstepping levels. I do not own at least one.


as well as being many times faster than the basic main controller board MCUs.

If I understood you correctly, this is not factually correct.
SBC can be faster, by a high margin


I get that it’s cool to push processors as fast as possible but if they’re driving mechanical systems that are limited by physics, there’s no obvious benefit to “crank up the frequency” especially when you end up with something that “mostly works”.
This is really where you lose me.

Fair enough, I expected someone would stick to that mostly works phrase.
If you spend a little more time reading the posts above, you will notice that I described my initial reasons. Also, I described when I had problems, and what at 435MHz, where timing issues should arrive it works, cause it is pretty simple to fix.
Same with 480Mhz, it works, because it looks like the current initialization code is suitable for that frequency.
And finally, with 550Mhz, there are issues.

Alas, I have my own knowledge limitations, so I can’t provide a one-line fix for that.
And because of sanity reasons, I said that moving farther may not be worth the time investment, because of the same reason you mentioned, it is already “fast enough”, a peripheral clock & etc.

Is this an actual problem or a theoretical one? It’s not clear from what you’ve written.

What is the expected speed of the printer? What is the speed at which you’re trying to move the toolhead at?

The approach you’re taking seems to be taken from here:
https://www.klipper3d.org/Benchmarks.html

Maybe it’s buried above but I don’t see a correlation between ticks and stepper/servo turn rate. Ideally, I’d like to understand what speeds are we talking about with a 2GT and a 16/20 tooth pulley.

So, what does this mean in the real world? What will be the speed of the toolhead running at this speed?

I don’t think you do - I’m talking about the difference between a main controller board running an ATMega and one that is running an STM32F466

Again, what is the real world problem that you’re trying to solve?

If it’s your buddy running a printer with 80k step/rotation servos can’t run at 200mm/s (which I believe is the practical maximum speed for FDM) then I get it but right now I don’t see anything that tells me that there is a practical application for this work versus it just being a thought exercise.

This is completely depends on your point of view. On CoreXY where we limit tool head motion, but not the motor, motors can go higher up to sqrt(2) = 1.41.

I can agree on the point, like if you can’t go that speed - do not use it. 500-700 mm/s is still a pretty high limit.

So, it is practical in the sense that you can encounter it. Calculations are provided below.

Yes, but it is for TMC drivers with step-on edge enabled.
There are small modifications needed to test the general pulse code.
invert_step=0 step_pulse_ticks=44

General widespread gt2 with 20 teeth pulley, will give you 40 mm per rotation.
Then you work with your servos and try to overcome servo-specific issues (like salmon skin, lags & etc), crank up resolution, and get “Stepper too far in past”, because you tried to move the toolhead too fast, like 1000 mm/s mentioned above.

Got confused, checked benchmarks, did calculations, realized something was off, and you didn’t get close to theoretical board limits.

This is where I come into play because I got confused too, and start to dig, into what causes that and what can be done with that. (like speedup stepper code & to speedup MCU)

My mistake. Got it.

This is where I can’t agree. I don’t think this is the right topic to argue about printing speeds, in the sense like “This is practical and this is not, because I think so”.
If there is a will to argue about that in any sense, I can suggest moving to the general category.

For the current 3d printing epoch running higher than 200mm/s is a general recommendation because it allows us to completely avoid issues with VFA and motor resonances because they mostly happen at speeds below that, but I should notice, that it makes sense only for steppers.

Just as a personal example, I’m printing strongly faster > 200mm/s, up to 500 mm/s, with a travel speed of 1000 mm/s. It is totally normal and works normally on my pretty heavy/large 400 Ratrig. I like 0.1-layer thickness, nice overhangs, the perfect surface at that speed, and large prints. I like that, and can suggest anyone to try if your situation allows that.

So, I suggest avoiding arguing in that sense, there is obviously someone who will say 30mm/s is fine for everyone. The same is true for the other side of the spectrum.

From a practical perspective, we know there are hot ends with high flow capacity like >50 mm^3/s. It is pretty easy to use 0.1 layers, whereas, with a 0.4 nozzle, you get 50 / 0.4 / 0.1 = 1250 mm/s. (Mine is ~25mm^3/s btw).

So, I think 1000 mm/s is a pretty reasonable limit if we want to define one.

Hope it helps to understand what I’m talking about.

I don’t think there’s any reason to discuss printer speeds but I did (and do) think it’s important to talk about what kind of speeds we are talking about here.

Thank you for giving me that background.

It sounds like you have a pretty impressive printer - you should post specs/photos of it.

I appreciate this initiative, because the MagnetoX suffers from the same effect.

Its linear motors are also driven by a step pulse interface with a step_pulse_duration: 0.0000002. If you command a movement between 15k and 20k acceleration, it is quite easy to run into a "Stepper too far in past" error.

It seems exactly like @nefelim4ag explained: What we commonly know may be true for the “traditional” setup and TMCs with their stepping both on rising and falling edges. As soon as an “uncommon” setups come into play, things may look different.

3 Likes

480mhz should work, its just that the 550mhz variants didn’t exist when the smt34h7.c code was written

FWIW, we can certainly update the code to avoid the 400 Mhz limitation (or more precisely, the 2**32/10 hz limitation). There are a few things I’m aware of:

  1. There are various places in the code that attempt to schedule timers in the mcu up to 5 seconds in the future - they would need to be reworked to a time that would fit in a signed 32-bit integer.
  2. Any really fast mcus that could conceivably step more than ~10Million times per second would need to disable the CONFIG_HAVE_STEPPER_BOTH_EDGE optimization.

FWIW, it may not actually be an overall improvement for the majority of users, so we’d have to weight the overall pros/cons prior to merging in the mainline Klipper branch. It might still be worthwhile though.

Cheers,
-Kevin

1 Like

So, then, if it may be considered worthwhile,
I opened PR with the proposed changes and things that I’m aware of.

I humbly argue to not going that way:

  1. That will break data on the benchmarks page.
  2. It is substantially slower than the current approach. Ticks do not scale linearly and decrease with the amount of steppers.
    Like 400->480Mhz (stm32h743) +20% (total ticks).
    Disabling edge optimization (3 steppers) 171 → 535 -70% (ticks per step).
    Enable inlining +25% (ticks per step).
    Which probably leaves us with ~ 480_000_000 / (535 * 0.75) = 480_000_000 / (535 * 0.75) = 1196K per stepper (totally 1196 * 3 = 3588K). That will break someone’s setup (and mine :sweat_smile:, if I enable 256ms).
  3. I hope the disabling of edge inlining (which may add enough overhead) or just a conditional busy loop (which I proposed above and in the PR) can be an acceptable solution here.

Thanks.


With that said, I may suggest leaving this topic mostly for stm32h7-related changes, if someone is willing to participate and change the code to support >480Mhz, I may also provide some assistance.

BTW, STM32H7 because of the architecture of the chip and architecture of the Klipper will be theoretically limited to ~24Mhz pins toggle, then there is ODR overhead and then scheduler.

So, this is why I saw ~5.6Mhz * 4 = 22.4Mhz pins access for SW SPI on 550 MHz. This is high, but it is a known limit now.

1 Like

I spent some time trying to find what is wrong with STM32H723 at 550 MHz.
Everything is still working at 512 MHz.

What is wrong? At least Software SPI is broken, there is too much zeroes:

// GCONF:      00000004 en_pwm_mode=1
// GSTAT:      0000000c uv_cp=1(Undervoltage!) register_reset=1
// IOIN:       0000101c encb=1 enca=1 drv_enn=1 output=1 version=0x0
// DRV_CONF:   00000000
// GLOBALSCALER: 000000e1 globalscaler=225
// IHOLD_IRUN: 00020f0f ihold=15 irun=15 iholddelay=2
// TPOWERDOWN: 00000000
// TSTEP:      0007ffff tstep=524287
// TPWMTHRS:   00000000
// TCOOLTHRS:  00000000
// THIGH:      00000000
// ADC_VSUPPLY_AIN: 00e60003 adc_vsupply=0x0003(0.029V) adc_ain=0x00e6(70.196mV)
// ADC_TEMP:   000000c8 adc_temp=0x00c8(-238.7C)
// MSCNT:      000001e2 mscnt=482
// MSCURACT:   007000c8 cur_a=200 cur_b=112
// CHOPCONF:   20000030 hstrt=3 mres=0(256usteps) dedge=1
// COOLCONF:   00000000
// DRV_STATUS: 80000000 cs_actual=0(Reset?) stst=1
// PWMCONF:    c004000c pwm_ofs=12 pwm_autoscale=1 pwm_lim=12
// PWM_SCALE:  00000000
// PWM_AUTO:   0000000c pwm_ofs_auto=12
// SG4_THRS:   00000000
// SG4_RESULT: 00000000
// SG4_IND:    00000000

This is with all the patches needed to make klipper work at higher timer frequencies.
So TMC SCK is correct <5MHz (I’m testing 5MHz for a few weeks).
I checked with an oscilloscope that it is true.
It does work with lower MCU clock, but does not with 550Mhz.

With 4Mhz as the target it is working, which also seems strange.

I was unable to find a fix.

What is probably wrong so far, but has no effect on this issue:

  1. H723 has maximum VOC frequency of 836Mhz vs 960Mhz on H743. If I understood the code correctly, right now VOC Frequency is equal to x2 of the PLL1 frequency.

  2. I2C buses has a limit of 125Mhz peripheral frequency. Right now peripheral clock is 1/4 of PLL1. So, 120Mhz at 480Mhz and 128Mhz at 512Mhz. My bad, reference manuals are too similar.

So, I tried to make VOC equal to PLL1, no effect.

    uint32_t pll_freq = CONFIG_CLOCK_FREQ * 2;
    // H723 has a different VOC limit 836Mhz
    if (CONFIG_MACH_STM32H723 && pll_freq > 836000000) {
        pll_freq = CONFIG_CLOCK_FREQ;
    }

Tried to increase dividers x2, so all peripherials and memory buses should have a lower base clock.
No effect.

#define PDIV (CONFIG_CLOCK_FREQ > 500000000 ? 8 : 4)
#define FREQ_PERIPH (CONFIG_CLOCK_FREQ / PDIV)
...
    MODIFY_REG(RCC->D1CFGR, RCC_D1CFGR_HPRE,    RCC_D1CFGR_HPRE_3);
    MODIFY_REG(RCC->D1CFGR, RCC_D1CFGR_D1PPRE,  RCC_D1CFGR_D1PPRE_DIV4);
    MODIFY_REG(RCC->D2CFGR, RCC_D2CFGR_D2PPRE1, RCC_D2CFGR_D2PPRE1_DIV4);
    MODIFY_REG(RCC->D2CFGR, RCC_D2CFGR_D2PPRE2, RCC_D2CFGR_D2PPRE2_DIV4);
    MODIFY_REG(RCC->D3CFGR, RCC_D3CFGR_D3PPRE,  RCC_D3CFGR_D3PPRE_DIV4);

So, that is it for now.


Regardless, HW SPI looks like working.
HW I2C with lower peripheral clock also looks correct.


Okay, I did slightly more tries, it works at ~4.7Mhz.
Not sure, why MCU base frequency affecting that.


https://www.st.com/resource/en/reference_manual/rm0433-stm32h742-stm32h743753-and-stm32h750-value-line-advanced-armbased-32bit-mcus-stmicroelectronics.pdf
https://www.st.com/resource/en/reference_manual/rm0468-stm32h723733-stm32h725735-and-stm32h730-value-line-advanced-armbased-32bit-mcus-stmicroelectronics.pdf