Timer too close shutdown on 1000mm+ moves

Basic Information:

Printer Model: Qidi Q1 Pro/ armbian -klipper 12 via kiauh
MCU / Printerboard: MKS skipr/Qidi custom stm32f401xc
Host / SBC MKS skipr/Qidi custom Cortex A53@1.5ghz
klippy.log attached
klippy (90).log (119.6 KB)

Battling MCU shutdowns on this system. Through lots of trial and error it has been pinpointed to move commands that exceed 1000mm in length. This thread in Armored Turtle discord has lots and lots of testing results(Discord). As pointed out by (Improve Clock Synchronization in Multi-MCU Environments by mfeldheim · Pull Request #6753 · Klipper3d/klipper · GitHub) my real issue seems to be a usb package drop issue. The attached klippy.log shows a fresh boot with nothing more than a FORCE_MOVE STEPPER=extruder DISTANCE=1500 VELOCITY=150 ACCEL=250 command given to cause the shutdown. The shutdown occurs the instant the command is given. Info about the system are: MCU th is the toolhead rp2040 board. It is connected to the host/mainboard via a usb connection serial: /dev/ttyS2 to be specific. I can also cause this failure when using Boxturtle and commanding a move of more than 1000mm. Boxturtle is using an external MCU(stm32h723xx) connected via USB to the host/mainboard. I have attempted use of different USB ports, cables, and eliminating other items on the USB network down to the minimum required for operation. I never see the host at greater than 45% load.
My question is, and maybe this is more Armbian related, can I get help diagnosing the USB dropped packets if that truly is the root cause? Is there something else I could be looking for? Advise and input welcome.

Edit: Here is a dmesg printout. I see stuff that seems strange but I really have no idea what I’m looking at.
dmesg.txt (105.1 KB)

  1. The known reasons for this kind of error are described in Timer too close, but
  2. With your “rooting,” you might have introduced the error yourself, as it is not guaranteed that you can simply update to the latest Klipper and bring over the OEM changes. For a bit more detail on this, see 3D Printers with Preinstalled and Modified Klipper Versions.
  3. FORCE_MOVE STEPPER=extruder DISTANCE=1500 VELOCITY=150 ACCEL=250 does not seem like something that happens during printing. I did not test it, but it may well be that this command is blocking for so long that the host errors out.
  4. The MMU extensions are known to cause or at least contribute to TTC errors.

Overall, this means that with your setup and extensive modifications, you are pretty much on your own to diagnose and find the main contributors to this instability.

Thanks. I totally get that this setup, mainly the armbian image, is likely the cause. I guess what I’m asking is to leverage others knowledge for what commands i can run in terminal to see things that will point me in a direction? Maybe it’s a hardware issue with the physical usb on the board but what do i look at to determine that? What about klipper asking for a long move like the example above could cause a shutdown, is there some way to log things that would allow me to look back and see details? The ONLY issue i have at all is when any extrusion move is greater than 1000mm in length, be it a force move, or a toolhead load from the mmu. I can produce this error with no mmu connected. So what about that length of move causes an issue, is it too much to compute, too long of a string of data that the timer gets off? Is there anyway to log and see this? I guess what I’m saying is how can i find the proof, not the solution.

Edit: and i think maybe you misunderstand. The “rooting” is simply flashing a mks pi armbian image to the soc, klipper isn’t modified in any way. Klipper was installed via kiauh in normal fashion.

No, I have not misunderstood and no, this is not a standard Klipper install due to:

This printer is using a modified Klipper version (source: GitHub - QIDITECH/QIDI_Q1_Pro) which is about 7 months old. As it is absolutely possible that you cannot just “copy over” these changes to the current Klipper sources, I pointed to:

As indicated, commands like FORCE_MOVE STEPPER=extruder DISTANCE=1500 VELOCITY=150 ACCEL=250 are likely to fail, since they will block Klipper’s queue for so long that the host eventually errors out.

Likely you will have to break them into shorter segments.

Unlikely. Often the Armbian images are more stable than the original OEM images.

If you follow these reasons, you will notice that the TTC errors are “symptom errors” and not “root cause” errors, i.e. they describe an unwanted consequence, whose reason might be somewhere totally different. As such, there is unfortunately no way to easily determine it.

That is not what i use. From open q1 i use only the armbian image, not the provided klipper with it. I purposefully removed all klipper from open q1. Klipper was installed with kiauh and is not modified, i made a specific choice to do that so there would never be a point that my klipper was different from mainline. And just for clarity OpenQ1 doesn’t change klipper, it just adds in the printer.cfg for the q1, fluidd, moonraker, etc all into a prebuilt bundle for people who may not know how to do all that. The extra files you reference are for mmu use only, not related to openq1, and are currently commented out in config and unused. I will remove them from the klippy/extras folder though and test again to see if just their presence on the system could be causing an issue, i didn’t think of that. Will update later today on that.

Why? I have another very custom printer on mainline klipper and it can perform that same command with no issues. It has more mcu’s and even canbus to deal with and all the mcu’s go into a usb hub then to the host which honestly cannot be ideal. It has a much more powerful host, an Intel nuc i5 using Linux mint, is the difference i see. Is “blocking the que for so long” simply a matter of processing power? I really feel like that’s the issue, but i want to prove it just don’t know how.

I just feel like if i know exactly what command can cause the issue, and no other command causes the issue, surely there must be a way to watch the system and determine what about that command the system doesn’t like? Finding that cause may lead to better optimization to prevent future issues.

Removed mmu related extras and still crashes trying the above mentioned force move.


I did find this utility armbianmonitor -m to watch system loads. In the screenshot at timestamp 15:09:39 I commanded the force move. Klipper TTC shutdown instantly. At timestamp 15:09:50 is when I restarted klipper. So it doesn’t seem at first glance to be a cpu load issue.

Is there anyways to watch this que? See what it’s doing?

And just as proof that my klipper is not modified, I am reinstalling everything from the OS image to klipper. Here is proof that I am not using a prebuilt modified klipper, because I’m honestly tired of the “you are using modified klipper” line I get every time I talk about this printer. No, I am not. I AM using a modified armbian image. Currently reinstalling everything and putting in minimal config to test as bare bones as possible.

Fully wiped and reinstalled everything as minimally as possible.
Still TTC on FORCE_MOVE STEPPER=extruder DISTANCE=1500 VELOCITY=150 ACCEL=250.
klippy (99).log (128.7 KB)

So something interesting. Let me explain this picture. I qued a bunch of force moves at 650mm and watched the monitor as they ran. CPU stayed around 408mhz. I qued a bunch of force moves at 680mm and cpu went to max at 1296mhz and stayed while running the 680mm moves. At 700mm I TTC crash. So at 680mm travel distance the cpu runs at max frequency but the load is still very low. At 670mm the cpu stays low at 408mhz

I think what this tells me, is this is the limit of what the host can do without errors.

Generally:

  • Every change to the Klipper code can lead to or foster TTC errors. There are numerous examples.
  • MMU code is known to cause this, especially Happy Hare. There are also numerous reports.
  • The Q1 seems to run with some Klipper modifications. I do not know if they are necessary for proper hardware support or if they are of proper quality.

Not sure if this test really allows the conclusions you are drawing.

  • The switching of the CPU frequency is done by the respective CPU governor on OS level
  • At no point the CPU the CPU seemed to even break a sweat
  • Looking at the overall CPU load is inclusive, since single cores could max out
  • The relationsship between CPU load and TTC is not clear in my opinion (if even existing)

FWIW, I have tried the command FORCE_MOVE STEPPER=extruder DISTANCE=1500 VELOCITY=150 ACCEL=250 on 3 printers and all error out with a TTC

I’m not sure you’re understanding. Sineos is trying to point out that the modifications that the manufacturer made to Klipper might be necessary for proper functioning of the printer.

While it’s a frustrating situation to be in, buying a printer that ships with a modified version of Klipper means that you can’t rely on community support for the use of Klipper with that printer, and will probably get pushback no matter what you do. If you use the Klipper that comes with the printer people will tell you that we can’t support a modified version and you should contact the manufacturer, while if you install vanilla Klipper people will tell you that the issues you encounter might be because you’re not using the modified version that comes with the printer. Both things are true.

4 Likes

Your log indicates that the FORCE_MOVE command overloaded the host cpu. That command is intended as a debugging command - if you send in extreme commands you’ll likely get unexpected behavior. In particular, by requesting the extruder to move 1500mm it required the host cpu to attempt to schedule ~1.6million steps, which causes significant cpu overhead.

To avoid that error, I recommend not issuing commands like that.

-Kevin

Thanks. For context, I am using the FORCE_MOVE as diagnostic because it creates the same condition as the script that also causes the issue. I am no python expert and could be way wrong, but the ultimate goal I think is to be able to perform this line:
CUR_LANE.move(CUR_HUB.afc_bowden_length, self.long_moves_speed, self.long_moves_accel, True) where the afc_bowden_length is 1000mm or greater.
Further testing on other printers has lead me to be fully convinced it is just a processing power issue. Printers using a host that is not a SBC/Pi can perform this action while every printer I’ve tried with a SBC/Pi host will fail. Though a limited sample size of two SBC/Pi and three with a traditional PC based host.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.