We updated our controller board from one with an Atmel ATSAM3 to a new board based on an STM32H723 running at 480 Mhz and we started getting MCU 'mcu' shutdown issues when executing BED_MESH_CALIBRATION.
The general configuration of the printer is unchanged (same speed, same acceleration, same Klipper config), we just updated the pinout definitions.
The host is is a Celeron PC, which should have plenty of power. We run Repetier server and the latest klipper version cloned from Github (master branch).
Time to time we get more than 200% of CPU load, but even at lower load (such as about 80% as in the attached log) we get MCU 'mcu' shutdown.
The board is connected by USB (USB-CDC) and we do not get any USB error from the linux Kernel.
With the previos board we had no issues until now.
No VM, checked the checklist: what we found (as written in original message) is abnormal activity.
Usb: we tried ferrite, slowing down speed and accels, no other USB device…no way.
SBC is supplied with a good industrial PSU (24vdc).
As soon as mesh calibration starts system goes amok: timer too close.
Brought mesh points to minimal (6x4) but no change…
Running out of ideas.
That’s why we are seeking help.
cu 'mcu': Starting serial connect
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
mcu 'mcu': got {'#receive_time': 3963.727281821, u'next_clock': 1147228928, u'oid': 33, u'value': 30703, '#name': u'analog_in_state', '#sent_time': 3963.681299046}
Resetting prediction variance 3963.733: freq=400000000 diff=4129047 stddev=400000.000
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
mcu 'mcu': got {'#receive_time': 3963.982026986, u'next_clock': 1267228928, u'oid': 33, u'value': 30720, '#name': u'analog_in_state', '#sent_time': 3963.938483521}
Loaded MCU 'mcu' 107 commands (v0.11.0-257-ged66982b-dirty-20230802_121848-3ntrdhcp / gcc: (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] binutils: (2.34-4ubuntu1+13ubuntu1) 2.34)
MCU 'mcu' config: BUS_PINS_i2c2_PF1_PF0=PF1,PF0 BUS_PINS_spi3a=PC11,PC12,PC10 BUS_PINS_i2c1_PB6_PB7=PB6,PB7 BUS_PINS_spi2a=PC2,PC3,PB10 BUS_PINS_spi5a=PH7,PF11,PH6 RESERVE_PINS_USB=PA11,PA12 CLOCK_FREQ=400000000 BUS_PINS_i2c1_PB8_PB9=PB8,PB9 BUS_PINS_spi1a=PB4,PB5,PB3 STATS_SUMSQ_BASE=256 BUS_PINS_i2c2_PB10_PB11=PB10,PB11 BUS_PINS_spi5=PF8,PF9,PF7 STEPPER_BOTH_EDGE=1 BUS_PINS_spi6=PG12,PG14,PG13 ADC_MAX=4095 BUS_PINS_spi4=PE13,PE14,PE12 PWM_MAX=255 BUS_PINS_spi2=PB14,PB15,PB13 BUS_PINS_spi1=PA6,PA7,PA5 RESERVE_PINS_crystal=PH0 MCU=stm32h723xx
Configured MCU 'mcu' (1024 moves)
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
Printer is not ready
The klippy host software is attempting to connect. Please
retry in a few moments.
bed_mesh: generated points
Index | Tool Adjusted | Probe
0 | (23.0, 84.0) | (10.0, 40.0)
1 | (63.0, 84.0) | (50.0, 40.0)
....
Without wanting to offend you, I think this is a pretty home-made problem.
The general configuration of the printer is unchanged (same speed, same acceleration, same Klipper config), we just updated the pinout definitions.
to: We have stuffed in a (probably untested) custom board and modified the Klipper sources and now it is not working like expected.
I cannot provide any meaningful advice here: It could be anything from mistakes in your code changes over broken cables to design errors in your custom board.
to: We have stuffed in a (probably untested) custom board and modified the Klipper sources and now it is not working like expected.
The hardware as been tested validated, and we are able to print if we skip BED_MESH_CALIBRATE.
The source changes are rather simple and we can exclude any issue caused by them:
gpio.c: we need a couple of pin to be configured as open drain to drive some PMOS, as this can’t be done in the Klipper config we hardcoded it. Next board release will add level shifter NMOS so don’t need open drain outputs
stm32f0_i2c.c: we added support for I2C2 on PF0 and PF1
stm32h7.c: we added support for HSE_BYPASS as we use a clock generator and not an oscillator, and we needed one more pin on the MCU
As you can see non of the above can have any role in the error we get when running BED_MESH_CALIBRATE.
I cannot provide any meaningful advice here: It could be anything from mistakes in your code changes over broken cables to design errors in your custom board.
We can safely exclude code changes and design errors. The USB cable is 50cm long, runs away from stepper cables and mains, we tried adding ferrites, and EMI issues are generally reported by the Linux kernel, so this is unlikely too. Also, I would expect the host buffer to increase if there are communication issues, but this does not seem the case.
We went trough the list in the knowledge base, the only two things which we can’t really exclude are:
High system load of the host
High disk activity of the host
Given that the printer configuration has the same speeds and accelerations as before, when we were using a much slower MCU, and the host is the very same, we can’t understand why now we get these errors.
Open questions:
is it possible that a (much) higher clock MCU leads to the host to be overloaded, even with the very same step rates?
can you confirm from the log that this is actually a “MCU timer too late” (i.e., received commands are in the past or too close to current time) and there are no prior issues?
why do you say “I have never seen a log like this”? What is unusual?
is there anything else that can cause “MCU timer too late”, apart what is listed in the knowledge base, that we can investigate?
What do you want me to confirm? See the log.
This type of error occurs when the host schedules a command that is (from the MCU perspective) already in the past. Thats why the typical reason is an overloaded / too slow host.
which indicates the host software was unable to make accurate predictions of the micro-controller clock. Usually only a warning but in combination with a shutdown points to some serious issues like failing hardware.
Personally I would expect that in your case the timing is rather messed up on the MCU side but as stated before, I’m neither a hard- nor software developer.