MCU 'mcu' shutdown: Rescheduled timer in the past - Host Buffer 100%

Basic Information:

Printer Model: PrincoreONE
MCU / Printerboard: SKR3
klippy.log
klippy (1).zip (1.6 MB)

I had a “timer in the past” mcu shutdown. regarding the load graph i assume it was the 100% peak of the host buffer

Q: What is the host buffer and how / where can i increase this buffer?
Q2: Is it possible to increase the bandwith? What is meant with beandwith?

Thank you!

Run the load graph with -s to also see your host’s CPU load. I rather suspect this will be pretty high and causing this error.

Side note:
Personally I find, for such a printer, your acceleration and velocity settings quite strange.

indeed, the host load was very high, thank you!

regarding the acceleration and velocity settings:

1400mm/s² acceleration is for print moves. i print all lines with 80mm/s
6000mm/s² acceleration is for travel moves, 400mm/s velocity is also only for travel moves
i use my macro for “detecting” travel moves and change the acceleration settings for that travel move, then switch back to printer settings acceleration ( 1400mm/s² ) if the next task is a print move

with these acceleration / velocity settings, it prints pretty good so far:

all printed with a 0.6 nozzle and 0.6 lw

Nice prints indeed. :+1: but …
A printer with this price tag and build quality (at least judging from the pics), I would slot in somewhere in the range of 20k accel.
Note that:

  • Acceleration contributes to print time much more than velocity
  • High velocities rather lead to issues with the extrusion system, since you will not push enough filament through the nozzle (unless you use something like Volcano or CHT nozzles)
  • With your 256 native microstepping you are putting undue strain on your system. This likely is shown in your first graph with the high buffer saturation. This resolution will give you no benefit unless maybe a bit in stepper noise. I’d not go higher than 128 (Resolution of 0.00078125 mm)
  • System load for a 4 core CPU like the RPi 3 is not an issue according to your graph. This can go as high as 400%
  • With such a high potential acceleration (at least in my opinion) you should work on the resonance. 3hump_ei at 40 Hz is really bad and unworthy for such a printer. Will also lead to extreme smoothing even at low accelerations. Makes no sense to print at 256 microstepping with a theoretical resolution of 0.000390625 mm and have it all smoothed away (but would not make sense in any other case as well)

actually im playing around to find the perfect settings.i know that 3hump is really bad, but with these setting i can actually get rid of such surface print artefacts:

if i go higher with the print acceleration, i get more ghosting / ringing.
1400 is fine for me for printing, and travel moves have already a really fast acceleration.
actually, unfortenatelly i need the 3hump_ei to get rid of print artefacts on the surface but im testing further to get rid of the artefacts - actually im testing the tmc5160 with fullstep ( chm1 ) instead of the tmc2209 and i seems the surface artefacts are tmc2209 related.

if the surface artefacts are gone with the tmc5160 the next step would be to go back to the suggested input shaper values with damping ratio of 0.1 or 0.125


If the host Load Was also Not the Problem, what caused the timer in the past error?

Most likely you are creating more steps than your board can cope with. This is caused by the extreme microstepping setting combined with 0.9 steppers.
If and when this error will hit you is of course depending on what you print. Long straight lines will only cause a few steps (especially on 45° angles where only one stepper is active or a CoreXY), fine details a lot of steps

I know I am aware of these Things. But the log aka graph cant Tell US why it happens, right?

As far as I’m aware, no. It just provides hints where to search.
Well, OK, if the graph would show 400% to 500% CPU load on a 4 core CPU things are pretty clear. But even then you do not know which process was causing it.

A “rescheduled timer in the past” error indicates the micro-controller become overloaded. Host buffers would not result in that kind of error (and your host buffers actually look okay - the single spike is likely due to a “one off” event).

The bandwidth is also not an indicator of a problem here - the graphs use a simple heuristic that assumes one is using 250000 baud serial - you are using USB which has 10+ times the capacity.

It seems something went awry in the micro-controller - possibly too high a step rate - although the rate at the time of the failure (350 clock ticks between steps with two steppers running) is not particularly high. Unfortunately the log doesn’t seem to provide an indication of the root cause of the issue.

No one has yet published step rate benchmarks (nor command benchmarks) for the stm32h7 mcus. If you have time and are technically inclined, it may be helpful to run those benchmarks to see how they compare to the step rate you had during the fault. The benchmarks are described at Benchmarks - Klipper documentation

Otherwise, you could try reducing the microstep rate and/or try to see if the issue reproduces (to gather more logs that may indicate the root cause).

-Kevin

I already did it in another thread:

allocate_oids count=5
config_stepper oid=0 step_pin=PD4 dir_pin=PD3 invert_step=-1 step_pulse_ticks=0
config_stepper oid=1 step_pin=PA15 dir_pin=PA8 invert_step=-1 step_pulse_ticks=0
config_stepper oid=2 step_pin=PE2 dir_pin=PE3 invert_step=-1 step_pulse_ticks=0
config_stepper oid=3 step_pin=PD11 dir_pin=PD10 invert_step=-1 step_pulse_ticks=0
config_stepper oid=4 step_pin=PD15 dir_pin=PD14 invert_step=-1 step_pulse_ticks=0
finalize_config crc=0

SET start_clock {clock+freq}
SET ticks 900

reset_step_clock oid=0 clock={start_clock}
set_next_step_dir oid=0 dir=0
queue_step oid=0 interval={ticks} count=60000 add=0
set_next_step_dir oid=0 dir=1
queue_step oid=0 interval=3000 count=1 add=0

reset_step_clock oid=1 clock={start_clock}
set_next_step_dir oid=1 dir=0
queue_step oid=1 interval={ticks} count=60000 add=0
set_next_step_dir oid=1 dir=1
queue_step oid=1 interval=3000 count=1 add=0

reset_step_clock oid=2 clock={start_clock}
set_next_step_dir oid=2 dir=0
queue_step oid=2 intEval: SET start_clock 58989182113
erval={ticks} count=60000 add=0
set_next_step_dir oid=2 dir=1
queue_step oid=2 interval=3000 count=1 add=0

reset_step_clock oid=3 clock={start_clock}
set_next_step_dir oid=3 dir=0
queue_step oid=3 interval={ticks} count=60000 add=0
set_next_step_dir oid=3 dir=1
queue_step oiEval: reset_step_clock oid=0 clock=58989182113
d=3 interval=3000 count=1 add=0

reset_step_clock oid=4 clock={start_clock}
set_next_step_dir oid=4 dir=0
queue_step oid=4 interval={ticks} count=60000 add=0
set_next_step_dir oid=4 dir=1
queue_step oid=4 interval=3000 count=1 add=0


ECHO Test result is: {"%.0fK" % (5. * freq / ticks / 1000.)}

Eval: queue_step oid=0 interval=900 count=60000 add=0
Eval: reset_step_clock oid=1 clock=58989182113
Eval: queue_step oid=1 interval=900 count=60000 add=0
Eval: reset_step_clock oid=2 clock=58989182113
Eval: queue_step oid=2 interval=900 count=60000 add=0
Eval: reset_step_clock oid=3 clock=58989182113
Eval: queue_step oid=3 interval=900 count=60000 add=0
Eval: reset_step_clock oid=4 clock=58989182113
Eval: queue_step oid=4 interval=900 count=60000 add=0
Eval: ECHO Test result is: 2222K
Error: Unknown command: ECHO
169.915: stats count=117 sum=451500 sumsq=50549677
175.015: stats count=62 sum=85896 sumsq=731521
180.115: stats count=62 sum=85740 sumsq=728163
185.193: stats count=62 sum=89630 sumsq=817191
190.215: stats count=63 sum=88160 sumsq=755611
195.315: stats count=62 sum=85892 sumsq=734045
200.315: stats count=61 sum=84784 sumsq=724506

Thanks. Alas, I’m not understanding the information in your previous post. To gather the benchmarks, I need to know the mcu model, the mcu configuration sequence, the gcc version (run arm-none-eabi-gcc --version), the software version (run git describe --tags --dirty), and the smallest value you were able to successfully set TICKS to for the 1 stepper case and for the 3 stepper case.

So, it should look something like the information at https://www.klipper3d.org/Benchmarks.html#stm32f103-step-rate-benchmark

-Kevin

board:
skr3 - 32-Bit 480 MHz ARM Cortex-M7 STM32H743VIT6

image

commit:
v0.10.0-608-g638303b3

gcc:
arm-none-eabi-gcc (15:8-2019-q3-1+b1) 8.3.1 20190703 (release) [gcc-8-branch revision 273027]

1 steppers testcode:

SET start_clock {clock+freq}
SET ticks 118

allocate_oids count=1
config_stepper oid=0 step_pin=PD4 dir_pin=PD3 invert_step=-1 step_pulse_ticks=0
finalize_config crc=0


reset_step_clock oid=0 clock={start_clock}
set_next_step_dir oid=0 dir=0
queue_step oid=0 interval={ticks} count=60000 add=0
set_next_step_dir oid=0 dir=1
queue_step oid=0 interval=3000 count=1 add=0

ECHO Test result is: {"%.0fK" % (1. * freq / ticks / 1000.)}

lowest ticks:
1 stepper: 118
117 gives me timer in the past error

result:
1 stepper: Eval: ECHO Test result is: 3390K

3 steppers testcode:

SET start_clock {clock+freq}
SET ticks 570

allocate_oids count=3
config_stepper oid=0 step_pin=PD4 dir_pin=PD3 invert_step=-1 step_pulse_ticks=0
config_stepper oid=1 step_pin=PA15 dir_pin=PA8 invert_step=-1 step_pulse_ticks=0
config_stepper oid=2 step_pin=PE2 dir_pin=PE3 invert_step=-1 step_pulse_ticks=0
finalize_config crc=0

reset_step_clock oid=0 clock={start_clock}
set_next_step_dir oid=0 dir=0
queue_step oid=0 interval={ticks} count=60000 add=0
set_next_step_dir oid=0 dir=1
queue_step oid=0 interval=3000 count=1 add=0

reset_step_clock oid=1 clock={start_clock}
set_next_step_dir oid=1 dir=0
queue_step oid=1 interval={ticks} count=60000 add=0
set_next_step_dir oid=1 dir=1
queue_step oid=1 interval=3000 count=1 add=0

reset_step_clock oid=2 clock={start_clock}
set_next_step_dir oid=2 dir=0
queue_step oid=2 interval={ticks} count=60000 add=0
set_next_step_dir oid=2 dir=1
queue_step oid=2 interval=3000 count=1 add=0

ECHO Test result is: {"%.0fK" % (3. * freq / ticks / 1000.)}

lowest ticks:
3 stepper: 570
569 gives me timer in the past error

result:
3 stepper: Eval: ECHO Test result is: 2105K

Okay, thanks for providing the test results.

It does explain the issue you are having though. Your board can’t keep up with the requested step rate.

Your benchmark indicates that you can drive 3 steppers at a minimum interval of 570. That is basically equivalent to a minimum interval of 380 with two steppers. During the failure, there were two steppers running at an interval as low as 347. That overloaded the mcu.

It is odd that the stm32h7 board is slower than an stm32f4 board. I’m not sure why that would be. I didn’t develop the stm32h7 code; I’ll try to ping the authors.

In any case, you’ll need to use a slower maximum velocity or reduce microsteps.

Cheers,
-Kevin

EDIT: Just to word the above a little differently, at the failure your board was requested to step at a rate as high as 2305K steps/s, which is higher than your multi-stepper benchmark rate of 2105K steps/s.

1 Like

i also noticed that the mcu “only” runs at 400 mhz instead of 480mhz

regarding the doc, revision “V” can handle 480 mhz, and this is revision “V”

Do i have to set the clock reference to internal ( 64mhz oscillator ? ) clock instead of the 25mhz external oscillator to setup the mcu for 480 mhz?

image

as i read the comment in the startup code, he tries to setup for 480 mhz, but runs / reported ? only 400 mhz:

No, Klipper doesn’t support an MCU speed of 480Mhz today.

What you can do is try the code at PR 2.9x speedup for stm32h7 by enabling cpu cache by D4SK · Pull Request #5832 · Klipper3d/klipper · GitHub . It seems enabling the cache on the h7 improves performance notably.

Cheers,
-Kevin

thanks for the info.

with the instruction + data cache enabled and these compiler flags, im able to achieve following results:

optimized compiler flags for cortex-m7 arm cpu + O3 optimizations:
-mtune=cortex-m7 -march=armv7e-m+fp.dp -mcpu=cortex-m7 -O3

1 stepper:
lowest ticks: 42
stepper rate: 9524K

3 stepper:
lowest ticks: 192
stepper rate: 6250K

/edit: Is there any reason to limit the frequency to 400mhz?

1 Like

Thank you for the Information :+1:

Thanks. What are the benchmarks if just the cache is enabled and no other changes?

Just 42 ticks per stepper is troublesome as that is only 105ns for time between steps - which is very close to the minimum supported by the trinamic drivers (~103ns). It may be necessary to force Klipper to implement explicit stepper timing checks on the stm32h7 (which will reduce the total steps per second but avoid lost steps between mcu and driver).

As indicated by Jake, various Klipper code assumes a 32-bit counter will not overflow within 10s, which would not be true with a 480Mhz mcu.

-Kevin

EDIT: if you want to test with explicit Klipper step timing (slower but potentially safer) it is the following change (rerun make menuconfig, make, and make flash to apply):

--- a/src/stm32/Kconfig
+++ b/src/stm32/Kconfig
@@ -13,7 +13,7 @@ config STM32_SELECT
     select HAVE_GPIO_BITBANGING if !MACH_STM32F031
     select HAVE_STRICT_TIMING
     select HAVE_CHIPID
-    select HAVE_STEPPER_BOTH_EDGE
+    select HAVE_STEPPER_BOTH_EDGE if !MACH_STM32H7
     select SERIAL_BOOTLOADER_SIDECHANNEL
 
 config BOARD_DIRECTORY
1 Like