MCU 'mcu' shutdown: Timer too close during BED_MESH_CALIBRATE

Basic Information:

Printer Model: Custom CoreXY
MCU / Printerboard: Custom STM32H7
klippy.log (118.4 KB)

Describe your issue:

We updated our controller board from one with an Atmel ATSAM3 to a new board based on an STM32H723 running at 480 Mhz and we started getting MCU 'mcu' shutdown issues when executing BED_MESH_CALIBRATION.

The general configuration of the printer is unchanged (same speed, same acceleration, same Klipper config), we just updated the pinout definitions.

The host is is a Celeron PC, which should have plenty of power. We run Repetier server and the latest klipper version cloned from Github (master branch).

Time to time we get more than 200% of CPU load, but even at lower load (such as about 80% as in the attached log) we get MCU 'mcu' shutdown.

The board is connected by USB (USB-CDC) and we do not get any USB error from the linux Kernel.

With the previos board we had no issues until now.

Any ideas?


Do you use a VM?

Apart from that, refer to Timer too close

No VM, checked the checklist: what we found (as written in original message) is abnormal activity.
Usb: we tried ferrite, slowing down speed and accels, no other USB device…no way.
SBC is supplied with a good industrial PSU (24vdc).
As soon as mesh calibration starts system goes amok: timer too close.
Brought mesh points to minimal (6x4) but no change…
Running out of ideas.
That’s why we are seeking help.

What does this mean?

Does not seem so:

Git version: 'v0.11.0-257-ged66982b-dirty'
Modified files: src/stm32/gpio.c, src/stm32/stm32f0_i2c.c, src/stm32/stm32h7.c

I have never seen a log like this:

cu 'mcu': Starting serial connect

Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.

mcu 'mcu': got {'#receive_time': 3963.727281821, u'next_clock': 1147228928, u'oid': 33, u'value': 30703, '#name': u'analog_in_state', '#sent_time': 3963.681299046}
Resetting prediction variance 3963.733: freq=400000000 diff=4129047 stddev=400000.000

Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.

mcu 'mcu': got {'#receive_time': 3963.982026986, u'next_clock': 1267228928, u'oid': 33, u'value': 30720, '#name': u'analog_in_state', '#sent_time': 3963.938483521}
Loaded MCU 'mcu' 107 commands (v0.11.0-257-ged66982b-dirty-20230802_121848-3ntrdhcp / gcc: (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] binutils: (2.34-4ubuntu1+13ubuntu1) 2.34)
MCU 'mcu' config: BUS_PINS_i2c2_PF1_PF0=PF1,PF0 BUS_PINS_spi3a=PC11,PC12,PC10 BUS_PINS_i2c1_PB6_PB7=PB6,PB7 BUS_PINS_spi2a=PC2,PC3,PB10 BUS_PINS_spi5a=PH7,PF11,PH6 RESERVE_PINS_USB=PA11,PA12 CLOCK_FREQ=400000000 BUS_PINS_i2c1_PB8_PB9=PB8,PB9 BUS_PINS_spi1a=PB4,PB5,PB3 STATS_SUMSQ_BASE=256 BUS_PINS_i2c2_PB10_PB11=PB10,PB11 BUS_PINS_spi5=PF8,PF9,PF7 STEPPER_BOTH_EDGE=1 BUS_PINS_spi6=PG12,PG14,PG13 ADC_MAX=4095 BUS_PINS_spi4=PE13,PE14,PE12 PWM_MAX=255 BUS_PINS_spi2=PB14,PB15,PB13 BUS_PINS_spi1=PA6,PA7,PA5 RESERVE_PINS_crystal=PH0 MCU=stm32h723xx
Configured MCU 'mcu' (1024 moves)

Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.


Printer is not ready
The klippy host software is attempting to connect.  Please
retry in a few moments.

bed_mesh: generated points
Index |  Tool Adjusted  |   Probe
  0   | (23.0, 84.0)    | (10.0, 40.0)
  1   | (63.0, 84.0)    | (50.0, 40.0)

....

Without wanting to offend you, I think this is a pretty home-made problem.

MCU: we have been working on a custom motherboard, based on a STM32H7.

Home made problem: can you elaborate ? Not offended, just trying to understand what is going on.

Brgds

Well, we moved from

The general configuration of the printer is unchanged (same speed, same acceleration, same Klipper config), we just updated the pinout definitions.

to: We have stuffed in a (probably untested) custom board and modified the Klipper sources and now it is not working like expected.

I cannot provide any meaningful advice here: It could be anything from mistakes in your code changes over broken cables to design errors in your custom board.

1 Like

to: We have stuffed in a (probably untested) custom board and modified the Klipper sources and now it is not working like expected.

The hardware as been tested validated, and we are able to print if we skip BED_MESH_CALIBRATE.

The source changes are rather simple and we can exclude any issue caused by them:

  • gpio.c: we need a couple of pin to be configured as open drain to drive some PMOS, as this can’t be done in the Klipper config we hardcoded it. Next board release will add level shifter NMOS so don’t need open drain outputs
  • stm32f0_i2c.c: we added support for I2C2 on PF0 and PF1
  • stm32h7.c: we added support for HSE_BYPASS as we use a clock generator and not an oscillator, and we needed one more pin on the MCU
    As you can see non of the above can have any role in the error we get when running BED_MESH_CALIBRATE.

I cannot provide any meaningful advice here: It could be anything from mistakes in your code changes over broken cables to design errors in your custom board.

We can safely exclude code changes and design errors. The USB cable is 50cm long, runs away from stepper cables and mains, we tried adding ferrites, and EMI issues are generally reported by the Linux kernel, so this is unlikely too. Also, I would expect the host buffer to increase if there are communication issues, but this does not seem the case.

We went trough the list in the knowledge base, the only two things which we can’t really exclude are:

  • High system load of the host
  • High disk activity of the host

Given that the printer configuration has the same speeds and accelerations as before, when we were using a much slower MCU, and the host is the very same, we can’t understand why now we get these errors.

Open questions:

  • is it possible that a (much) higher clock MCU leads to the host to be overloaded, even with the very same step rates?
  • can you confirm from the log that this is actually a “MCU timer too late” (i.e., received commands are in the past or too close to current time) and there are no prior issues?
  • why do you say “I have never seen a log like this”? What is unusual?
  • is there anything else that can cause “MCU timer too late”, apart what is listed in the knowledge base, that we can investigate?

Thanks for your support.

You can try to set the TRSYNC_TIMEOUT in mcu.py to .050 from .025, line 218, its a hack but appears to do the job.

1 Like

Thank you Gruntt. Sadly, no change- BED_MESH_CALIBRATE still crashes the party.
:frowning:

See the paste in my above comment. I have never seen such a pattern before

If this would be the case the error would be Rescheduled timer in the past

What do you want me to confirm? See the log.
This type of error occurs when the host schedules a command that is (from the MCU perspective) already in the past. Thats why the typical reason is an overloaded / too slow host.

In addition you have a

mcu 'mcu': got {'#receive_time': 3963.727281821, u'next_clock': 1147228928, u'oid': 33, u'value': 30703, '#name': u'analog_in_state', '#sent_time': 3963.681299046}
Resetting prediction variance 3963.733: freq=400000000 diff=4129047 stddev=400000.000

which indicates the host software was unable to make accurate predictions of the micro-controller clock. Usually only a warning but in combination with a shutdown points to some serious issues like failing hardware.

Personally I would expect that in your case the timing is rather messed up on the MCU side but as stated before, I’m neither a hard- nor software developer.

Found the problem: 480mhz is too fast for Klipper. Slowing down the board to 400Mhz fixed it for good.

1 Like

Might be the reason why the Octopus Pro with H7 CPUs are capped at 400 MHz by default.

2 Likes

True, this is a known limitation. Glad you found it.

1 Like