Move queue overflow - Only With Some Models

Basic Information:

Printer Model: Peopoly Magneto X
MCU / Printerboard: BTT Octopus Pro V1.1 STM32H723
Host / SBC: Orange Pi Zero2 / Debian Bookworm Linux 5.16.17-sun50iw9
klippy.log

Describe your issue:

I am unable to print long/large items on my Magneto X. They consistently cause a “MCU ‘mcu’ shutdown: Move queue overflow” at a specific part of the print.

Other discussion threads on this issue are locked

Thoughts

Searching other topics, the BTT Octopus board seems to be a common factor. However, it could also be a speed / model size issue.

@Sineos and anyone else with a Magneto X, can you replicate this behavior?

The attached 3MF file should contain the profile settings and model in question. I have also included the raw gcode which triggers this error.

Attachments

mcu.json contains information about all the devices Klipper connects to. Above and beyond what the logs provide.

Line 7447 of Klippy.log shows the MCU responding with the error.

Further Information

Edits with more information.

MCU Data

Obtained by: ~/klippy-env/bin/python3 ./console.py /dev/serial/by-id/usb-Klipper_stm32h723xx_0C003A001751313431393536-if00

get_config
048.882: config is_config=1 crc=3679341702 is_shutdown=1 move_count=1024

move_free

Something about this command just seems off, but I can’t pinpoint what.

False Trail (IRQ)

The first thing which stood out to me was the comment “Caller must disable irqs.”

That seems to be called in the different _event functions. Which are called by sched_timer_dispatch. However, those seem to be all called by functions which then lead to the IRQ being properly disabled.

Separate move queues / configurable total queue size.

Random thoughts, but I don’t have anything concrete at this point.

Interesting.
This does not seem like a complete Klipper log, there is no Klipper version.
Also, from the versions string of the MCU, they seem modified.

I can only guess that there are some changes ported to the MCU that disable the “Stepper too far in past” error. Which, in the end, make you get the Move queue overflow.

queue_step 18: t=719684377358 p=-16236 i=473 c=775 a=0
queue_step 19: t=719684743934 p=-15461 i=474 c=341 a=0
queue_step 20: t=719684905567 p=-15120 i=473 c=774 a=0
...
queue_step 34: t=719684591390 p=-126689 i=323 c=710 a=0
queue_step 35: t=719684820719 p=-125979 i=322 c=209 a=0
queue_step 36: t=719684888018 p=-125770 i=323 c=709 a=0

1/400_000_000 * 473 = 0.000_001_183
1 / 0.000001183 = 845_308
1/400_000_000 * 323 = 0.000_000_807
1 / 0.000000807 = 1_239_157
845_308 + 1_239_157 = 2_084_465

Which should cause errors for this specific configuration, or at least it is pretty close.

Well, if it is a correct assumption:

  • You want the latest Klipper code flashed to theSTM32H7 (You don’t want to remove the sanity checks this time. Leave them in place, please.).
  • You want to do the make menuconfig to update the configuration.

Otherwise, it is hard to guess without knowing your local modifications.

Hope that helps.

I probably cycled the log file at a strange time, and truncated several days worth of extraneous logs.

Good catch on the “Stepper too far in past” error. I thought I was running with that code enabled, but it looks like I was incorrect. Here’s the Code that was running on the MCU. It’s
related to this feature.

How do you preform those calculations? The docs weren’t quite clear, and even seem to be misleading. Specifically, the hard-coded 1024 move_nodes are shared between all the move_queue_head objects.

I’ve been going down the path of allowing more run time debugging. Just in case GPIO or PWM outputs are consuming far too many slots for some reason.

@Sineos I remember you mentioning there was a good reason the Magneto X has the “Stepper too far in past” error disabled. Would you care to comment?

1 Like

@nefelim4ag is already the perfect contact for this.
In a nutshell, the Octopus step creation is too slow for the high requirements of this printer. If you go at full acceleration and high velocity, you will run into this error inadvertently. It only rarely happens during printing, but for “unconstrained” moves, it occurs more frequently.
@nefelim4ag has already introduced some great improvements.

1 Like

Thanks! That’s a great explanation.

Sometimes the obvious is staring me in the face, and I miss it.

Well, 400_000_000 is a timer resolution from the log. i, and a are also in the same type of numbers (ticks). Interval of steps, Count of steps, Add (per step) this amount to the interval, per step.
We talk about the same MCU there, so STM32H7.
If I know the available ticks per second and the expected interval of steps at the same time, I can estimate the steps per second value from the queue data.

And it is available in the docs:

To explain my estimations, there is just too much context to add to be able to explain.
This one Klipper 400MHz limitations - #17 by nefelim4ag
Maybe useful, also, there is a GH link above, there is also plenty of related information.

It should be enough. MCU only stores less than 300 ms of future data.
PWM and GPIO use even less, probably 1 item per 300ms interval.

Link to the docs: It is the responsibility of the host to ensure that there is available space in the queue before sending a <mark>queue_step</mark> command. The host does this by calculating when each queue_step command completes and scheduling new queue_step commands accordingly.

If you want to invest time here, it will make sense that you do that on a pristine firmware or at least without disabled sanity checks.

Most probably, you will get “Stepper too far in the past”.

Now, you probably want to update (as suggested above), compile, and disable the TMC’s step on both edge optimizations.

[ ] Optimize stepper code for 'step on both edges'

That should make it fast enough.

Then, if you are still seeing that “Stepper too far in the past”.

(I doubt that)
[printer]
kinematics = cartesian
max_velocity = 1500

[stepper_x]
microsteps = 16
rotation_distance = 3.2
step_pulse_duration = 0.0000002

1500 / 3.2 = 468.75 rot/s
200 * 16 * 468.75 = 1_500_000 steps/s

For diagonal moves: 1_500_000 * 2 * 0.707 = 2_121_000
I think the currently available performance should be enough.

There are some additional things, that can be done and which are not yet merged.

Hope that helps.

1 Like

You’ve been more than helpful. Thanks! I especially appreciate you taking the time to explain the calculations.

After some digging during my breaks at work today, and will probably switch over to your fast MCU branch. The clue bat this morning was what I needed.

Edit: And the GPIO optimization and the other modifications needed.

Edit2: That worked!!!

Rambling About The Magneto X Motion System Follows

Here’s some interesting tidbits I’ve uncovered about the Magneto X motion system. Feel free to ignore if you’d like.

Steps are just driving a motion control loop. Which works with a magnetic linear encoder with 1 micron accuracy, along with some tuning parameters. Interestingly, the position control loop only runs at 10KHz. Changing that value is a bit painful too.

My understanding could be wrong, but that seems to mean you can have accuracy, speed, or a “fun” time modifying firmware parameters using mostly translated Chinese software.

Pushing Things Further

Of course, not being willing to leave well enough alone, I’ll be trying to bump the speed even further! Going to the rated max should give an even value of 2_000_000 steps/s. 2_828_000 for diagonal moves.

Probably terrible, yet hilarious, settings
[printer]
kinematics: cartesian
max_velocity: 2000

[stepper_x]
# 20000 step resolution / 20mm magnet distance = 1000 steps/mm
microsteps: 1000
rotation_distance: 20
full_steps_per_rotation: 20 # Must be divisible by 4

Unfortunately, said system also means the open-loop method to find max acceleration doesn’t work. Overshoots are corrected, and just result in printing artifacts.

FWIW,
A fast MCU branch only makes sense if you are trying to run STM32H7 faster.
For example, at 480 MHz.

Because there are only prerequisite changes for that.
It should work, like it works on my machine, but testing is appreciated.

1 Like

That’s exactly what I’m doing. Thanks for the heads up though.

1 Like