Basic Information:
Printer Model: Ender 5 Max (CoreXY)
MCU / Printerboard: Creality CR4NS200323C10 mainboard (120 MHz STM32) + CR-NOZZLE_V21 (120 MHz) + BTT Eddy USB (RP2040)
Host / SBC: Creality Nebula Pad (Ingenic T31X, MIPS XBurst2, single-core)
Klipper version: v0.13.0-628-g373f200ca + 1 local commit (see note below)
klippy.log: attached (4 crashes consolidated)
Issue summary
Intermittent stepcompress oid=X i=<negative> c=1 a=0: Invalid sequence shutdowns
during long prints, with the same C-level signature across 4 occurrences but
randomized magnitudes. The error always hits a CoreXY stepper (X or Y), never Z
or E. Reducing bed_mesh fade_end from 20 to 10 multiplied mean-time-to-crash
by roughly 8× without eliminating the bug.
Related (now closed) thread:
The other report there is on Voron 2.4 + Cartographer; same C-level signature,
different setup, same oid=8 hint that XY motors are specifically affected.
Setup
- Kinematics:
corexy - Steppers: TMC2209,
interpolate: True, microsteps 16 (reduced from 32 as mitigation) - Input shaper: ZV @ 50.6 Hz on X, MZV @ 41.8 Hz on Y
- Probe:
[probe_eddy_current](BTT Eddy USB),tap_threshold: 28 - Bed mesh:
rapid_scan, 15×15,fade_start: 1.0, currentlyfade_end: 10 max_velocity: 1000,max_accel: 4700(reduced from 6500)
“Dirty” disclosure
I’m aware of the policy. My setup deviates from pristine in two ways:
- One local commit on top of master that filters a
STEPPER_STEP_BOTH_EDGE
warning for an older Creality MCU firmware. Pure cosmetics in
klippy/configfile.py, 4 lines, no functional change. - One untracked module (
klippy/extras/probe_eddy_auto_calibrate.py),
which is only invoked during calibration commands — never loaded or
touched during the prints that crash.
I can rebuild and test without the cosmetic patch on request. The untracked
module is inert during prints by design.
Crash signature
All 4 crashes share the identical stack trace:
b’stepcompress o=X i= c=1 a=0: Invalid sequence’
b"Error in syncemitter ‘stepper_<X|Y>’ step generation"
Exception in flush_handler
Traceback (most recent call last):
File “/usr/data/klipper/klippy/extras/motion_queuing.py”, line 198, in _flush_handler
self._advance_flush_time(0., want_sg_time)
File “/usr/data/klipper/klippy/extras/motion_queuing.py”, line 156, in _advance_flush_time
raise self.mcu.error(“Internal error in stepcompress”)
mcu.error: Internal error in stepcompress
| # | Stepper (oid) | i (signed) |
i (uint32 hex) |
Print elapsed | Notes |
|---|---|---|---|---|---|
| 1 | x (oid=5) | (negative) | -– | ~7 h | microsteps=32, accel 6500, fade_end=20 |
| 2 | y (oid=8) | -60092 | 0xFFFF1A04 | 2 h 41 | microsteps=16, accel 4700, fade_end=20 |
| 3 | x (oid=5) | -16778133 | 0xFEFFFC2B | 1 h 53 | microsteps=16, accel 4700, fade_end=20 |
| 4 | y (oid=8) | -2092144 | 0xFFE01650 | 17 h 11 | microsteps=16, accel 4700, fade_end=10 |
Key observations
-
Always X or Y, never Z or E. 4/4. With CoreXY, X and Y are the two motors
driven by the input shaper’s joint convolution; Z and E use independent trapqs. -
ivalues look like data corruption, not deterministic overflow. The hex
values0xFFFF1A04,0xFEFFFC2B,0xFFE01650don’t share a modular structure
or a common bit pattern. They look like a uint32 read that captured a partially
updated state. -
bed_mesh fade_endis in the causal chain. Reducingfade_endfrom 20
to 10 changed average time-to-crash from ~3.5 h (3 crashes) to 17 h (1 crash),
roughly an 8× improvement, but did not eliminate the bug. -
Other mitigations reduce frequency but don’t fix the bug:
- microsteps 32 → 16
- max_accel 6500 → 4700
- jerks lowered to 5
None eliminated it.mcu_awakeis consistently 0.000–0.001 across all four
crashes — the host CPU is nowhere near saturation.
-
memavailis healthy at crash time (127 MB+ free). Not a memory issue.
Hypothesis (offered for discussion, not as a conclusion)
After reading through chelper/stepcompress.c, itersolve.c, kin_shaper.c,
kin_corexy.c, steppersync.c and motion_queuing.py, I couldn’t find a
single-threaded path that produces move.interval >= 0x80000000 in check_line.
The iterative solver maintains low_time monotonically within an invocation,
and last_flush_time is monotone across invocations.
The fact that:
- the issue is statistical and only triggers on shared-trapq steppers (X/Y on
CoreXY) - Z and E (separate trapqs) never trigger
fade_end(more shared bed_mesh state per stepper-frame) strongly affects MTBF- the corruption pattern looks like a partial read of
pos->clock32 - (uint32_t)last_step_clock
…is consistent with a race in the per-stepper step generation introduced by
PR #6992 (a89694ac6, Sep 3, 2025). The follow-up fix 3c01f71d9
(Sep 24, 2025) is present in my build, plus 220 days of additional commits.
I freely admit this is speculation. The Voron+Cartographer case in the closed
thread points the same way (oid=8, same stack, post-multi-thread Klipper),
and Sineos there did mention “it could be some race condition that may hit
home or not, depending on circumstances not fully understood”.
What I’d find most useful
-
Diagnostic instrumentation. Is there a recommended way to instrument
check_line()orcompress_bisect_add()to dump the queue state and the
most recent moves at error time? My setup is reliably reproducible (one crash
per 3–17 h of printing), so capturing a structured dump on the next occurrence
would be much more informative than further code reading. -
Sequential step generation as a diagnostic. A build flag (or even a
one-off patch) that interleavesse_start_gen_stepsandse_finalize_gen_steps
per syncemitter insteppersyncmgr_gen_steps()would conclusively answer the
multi-thread-race vs other-cause question. Has anyone tried this? -
Cross-check from another reporter. If anyone else here is hitting this
on a CoreXY + input_shaper + probe_eddy_current/cartographer/beacon setup,
I’d love a confirmation thatfade_endalso affects the MTBF on their side.
That would strongly support the trapq-sharing hypothesis.
Happy to provide additional logs, extracts, run any test (including a vanilla
rebuild to drop the dirty flag) and report back. Targeting the underlying issue,
not asking for a hand-holding workaround.