Intermittent stepcompress "Invalid sequence" on CoreXY+input_shaper+probe_eddy_current — bed_mesh fade strongly affects MTBF

ChristianK · May 15, 2026, 7:30am

Basic Information:

Printer Model: Ender 5 Max (CoreXY)
MCU / Printerboard: Creality CR4NS200323C10 mainboard (120 MHz STM32) + CR-NOZZLE_V21 (120 MHz) + BTT Eddy USB (RP2040)
Host / SBC: Creality Nebula Pad (Ingenic T31X, MIPS XBurst2, single-core)
Klipper version: v0.13.0-628-g373f200ca + 1 local commit (see note below)
klippy.log: attached (4 crashes consolidated)

Issue summary

Intermittent stepcompress oid=X i=<negative> c=1 a=0: Invalid sequence shutdowns
during long prints, with the same C-level signature across 4 occurrences but
randomized magnitudes. The error always hits a CoreXY stepper (X or Y), never Z
or E. Reducing bed_mesh fade_end from 20 to 10 multiplied mean-time-to-crash
by roughly 8× without eliminating the bug.

Related (now closed) thread:

The other report there is on Voron 2.4 + Cartographer; same C-level signature,
different setup, same oid=8 hint that XY motors are specifically affected.

Setup

Kinematics: corexy
Steppers: TMC2209, interpolate: True, microsteps 16 (reduced from 32 as mitigation)
Input shaper: ZV @ 50.6 Hz on X, MZV @ 41.8 Hz on Y
Probe: [probe_eddy_current] (BTT Eddy USB), tap_threshold: 28
Bed mesh: rapid_scan, 15×15, fade_start: 1.0, currently fade_end: 10
max_velocity: 1000, max_accel: 4700 (reduced from 6500)

“Dirty” disclosure

I’m aware of the policy. My setup deviates from pristine in two ways:

One local commit on top of master that filters a STEPPER_STEP_BOTH_EDGE
warning for an older Creality MCU firmware. Pure cosmetics in
klippy/configfile.py, 4 lines, no functional change.
One untracked module (klippy/extras/probe_eddy_auto_calibrate.py),
which is only invoked during calibration commands — never loaded or
touched during the prints that crash.

I can rebuild and test without the cosmetic patch on request. The untracked
module is inert during prints by design.

Crash signature

All 4 crashes share the identical stack trace:
b’stepcompress o=X i= c=1 a=0: Invalid sequence’
b"Error in syncemitter ‘stepper_<X|Y>’ step generation"
Exception in flush_handler
Traceback (most recent call last):
File “/usr/data/klipper/klippy/extras/motion_queuing.py”, line 198, in _flush_handler
self._advance_flush_time(0., want_sg_time)
File “/usr/data/klipper/klippy/extras/motion_queuing.py”, line 156, in _advance_flush_time
raise self.mcu.error(“Internal error in stepcompress”)
mcu.error: Internal error in stepcompress

#	Stepper (oid)	`i` (signed)	`i` (uint32 hex)	Print elapsed	Notes
1	x (oid=5)	(negative)	-–	~7 h	microsteps=32, accel 6500, fade_end=20
2	y (oid=8)	-60092	0xFFFF1A04	2 h 41	microsteps=16, accel 4700, fade_end=20
3	x (oid=5)	-16778133	0xFEFFFC2B	1 h 53	microsteps=16, accel 4700, fade_end=20
4	y (oid=8)	-2092144	0xFFE01650	17 h 11	microsteps=16, accel 4700, fade_end=10

Key observations

Always X or Y, never Z or E. 4/4. With CoreXY, X and Y are the two motors
driven by the input shaper’s joint convolution; Z and E use independent trapqs.
i values look like data corruption, not deterministic overflow. The hex
values 0xFFFF1A04, 0xFEFFFC2B, 0xFFE01650 don’t share a modular structure
or a common bit pattern. They look like a uint32 read that captured a partially
updated state.
bed_mesh fade_end is in the causal chain. Reducing fade_end from 20
to 10 changed average time-to-crash from ~3.5 h (3 crashes) to 17 h (1 crash),
roughly an 8× improvement, but did not eliminate the bug.
Other mitigations reduce frequency but don’t fix the bug:
- microsteps 32 → 16
- max_accel 6500 → 4700
- jerks lowered to 5
  None eliminated it. mcu_awake is consistently 0.000–0.001 across all four
  crashes — the host CPU is nowhere near saturation.
memavail is healthy at crash time (127 MB+ free). Not a memory issue.

Hypothesis (offered for discussion, not as a conclusion)

After reading through chelper/stepcompress.c, itersolve.c, kin_shaper.c,
kin_corexy.c, steppersync.c and motion_queuing.py, I couldn’t find a
single-threaded path that produces move.interval >= 0x80000000 in check_line.
The iterative solver maintains low_time monotonically within an invocation,
and last_flush_time is monotone across invocations.

The fact that:

the issue is statistical and only triggers on shared-trapq steppers (X/Y on
CoreXY)
Z and E (separate trapqs) never trigger
fade_end (more shared bed_mesh state per stepper-frame) strongly affects MTBF
the corruption pattern looks like a partial read of pos->clock32 - (uint32_t)last_step_clock

…is consistent with a race in the per-stepper step generation introduced by
PR #6992 (a89694ac6, Sep 3, 2025). The follow-up fix 3c01f71d9
(Sep 24, 2025) is present in my build, plus 220 days of additional commits.

I freely admit this is speculation. The Voron+Cartographer case in the closed
thread points the same way (oid=8, same stack, post-multi-thread Klipper),
and Sineos there did mention “it could be some race condition that may hit
home or not, depending on circumstances not fully understood”.

What I’d find most useful

Diagnostic instrumentation. Is there a recommended way to instrument
check_line() or compress_bisect_add() to dump the queue state and the
most recent moves at error time? My setup is reliably reproducible (one crash
per 3–17 h of printing), so capturing a structured dump on the next occurrence
would be much more informative than further code reading.
Sequential step generation as a diagnostic. A build flag (or even a
one-off patch) that interleaves se_start_gen_steps and se_finalize_gen_steps
per syncemitter in steppersyncmgr_gen_steps() would conclusively answer the
multi-thread-race vs other-cause question. Has anyone tried this?
Cross-check from another reporter. If anyone else here is hitting this
on a CoreXY + input_shaper + probe_eddy_current/cartographer/beacon setup,
I’d love a confirmation that fade_end also affects the MTBF on their side.
That would strongly support the trapq-sharing hypothesis.

Happy to provide additional logs, extracts, run any test (including a vanilla
rebuild to drop the dirty flag) and report back. Targeting the underlying issue,
not asking for a hand-holding workaround.

cc @nefelim4ag @koconnor

ChristianK · May 15, 2026, 7:32am

crash4_extract.log (63.9 KB)

nefelim4ag · May 15, 2026, 11:14am

Alas, AI is not helpful.
If you would like, I can generate you a longer answer without a word of a meaning.

Otherwise, we need a full log that contains the error/reproduction in the first place.
From the start of the machine to the crash, which was not modified.
You can zip it if it is too large.

-Timofey

ChristianK · May 15, 2026, 11:40am

Thank you for answering

I understand IA is not always the way and you are right. But usefull sometimes ^^

My english is not always good enough to chat about technical things. So I used Claude to translate and synthetise.

klippy_log.zip (2.5 MB)

You may find the full log attached in zip.

nefelim4ag · May 15, 2026, 2:48pm

Hmmm, if I ignore all of the modifications.
I guess the only clue that I have is a frequent update of SCV.
IIRC, I’ve seen a similar log and a similar issue.

For now, I can only suggest eliminating frequent SCV updates.
It is possible that there is a bug, but I’m unable to reproduce it even if I do SCV updates every other line of G-code.

ChristianK · May 15, 2026, 3:26pm

Thank you for taking the time !

I checked the source gcode : SET_VELOCITY_LIMIT has 110717 occurencies. Only 5 distinct values (5/7/9/15/20).

Does that sounds normal for you ?

Do you think a cached wrapper macro for SET_VELOCITY_LIMIT would makes a difference ?

May I ask you another point related to this issue : reducing bed_mesh fade_end from 20 to 10 made the print able to go from ~3.5h to 17h. May it be an additional clue for you ?

ChristianK · May 18, 2026, 9:45am

Thanks by the way, your commit 4cc47cf56 validates the approach we had in our custom module

koconnor · May 21, 2026, 7:03pm

Alas, these types of errors are quite hard to debug. There have been sporadic reports of individuals running into the issue, but I’ve not seen any widespread reports of problems.

It’s possible that a “race condition” could corrupt memory like that, but I don’t see any issues with the code. In particular the time updating looks okay to me.

I’ve added a new debugging PR at Improved debugging on "Internal error in stepcompress" by KevinOConnor · Pull Request #7271 · Klipper3d/klipper · GitHub for additional debugging during one of these events (it should dump the trapq and past stepper moves).

Other than that, one can add errorf() calls to the any part of the C code.

Well, I guess you could change steppersyncmgr_gen_steps() to call se_finalize_gen_steps() immediately after se_start_gen_steps(). It’d still be multi-threaded, but at least it would be sequential.

For what it is worth, it seems very unusual that you can repeatedly get this error, when so many other machines don’t ever get the error. If I had to guess, it’d be due to something on your host machine (maybe something quirky in the architecture, something in gcc exposing an issue, or similar).

I can’t think of any way that a fade_end or many SET_VELOCITY_LIMIT commands would cause an internal step compress error.

Finally, if you can produce a log using pristine code then I’ll try to take a look at it. (It’d have to be pristine for me to look at it, because errors like this require many hours of debugging and I can’t afford to do that if there is any possibility of unknown code.)

Maybe that helps a little,
-Kevin

Topic		Replies	Views
Error in stepcompress General Discussion	4	1592	September 5, 2022
Stepper.error: Internal error in stepcompress General Discussion	34	538	August 16, 2025
Random/new Stepcompress errors General Discussion	10	696	July 29, 2023
Regular random "Internal error in stepcompress" last 2 months General Discussion	10	162	November 10, 2024
Improved stepcompress implementation Developers	50	4953	February 19, 2026