Clock rollover and fan commands causes intermittent "Timer too close" issues

Alex-Pantheon · July 27, 2023, 1:36am

Basic Information:

Printer Model: Pantheon HS3
MCU / Printerboard: BTT Manta M8P v1.1 / EBB42 v1.2
klippy log and gcode.zip (26.6 KB)

M8P connected to Host via USB, running in CAN bridge mode, EBB42 in CAN mode

Issue:

Certain kinds of geometry, mainly support interfaces, can cause the slicer to generate very short move commands bracketed by fan commands:

G1 X175.138 Y74.424 Z14
M106 S102
M106 S255
G1 X175.538 Y74.424 Z14
M106 S102
M106 S255
G1 X175.938 Y74.424 Z13.6 E0.00957
M106 S102
M106 S255

This can go on for 10k+ lines.

This leads to 2 problems:

Fan scheduling weirdness

/extras/fan.py adds 100ms between fan commands if they’re too close together, which in extreme cases can lead to fan commands being several hours out of sync with move commands, and the fan cycling up and down at 100ms intervals well after the print has ended. This impacts print quality because cooling doesn’t happen where expected.

Timer too close errors

The much more concerning issue is that this leads to some sort of clock error (rollover/race condition?) eventually causing a “Timer too close” error on the EBB, killing the print.
This seems to have some sort of random-chance-ness to it, with failures taking 12-18 hours in real-world prints, though always in a region of dense-moves-with-fan

I’ve attached a synthetic gcode file that always fails instantly for me, I’m wondering if anyone else can reproduce the issue.

Workarounds

Obviously getting the slicer to calm down with the fan commands will mask the problem
Setting FAN_MIN_TIME = 0.0 doesn’t seem to fix the timer too close error issue, but does resolve the fan scheduling weirdness

mykepredko · July 27, 2023, 2:23am

Great explanation of your problem with almost all the required information.

What slicer are you using and what is the print profile (and options)?

Sineos · July 27, 2023, 4:14am

I’d rather consider this a bug in Orcaslicer since no other slicer is causing such behavior. Also ref: Klipper "timer too close" error on specific model, but Cura works fine .. · Issue #804 · SoftFever/OrcaSlicer · GitHub

Mard0 · July 27, 2023, 6:35am

That’s a bit of an oversimplification.
Let’s first have a discussion whether it’s a good idea for any slicer to be changing the fan speed this often (it’s not hard to imagine that this could improve print quality).

Even if it’s decided that this is slicer behaviour is not needed or incorrect then this est case is still a good opportunity to improve the current code.

Alex-Pantheon · July 27, 2023, 4:41pm

@mykepredko Dang I was shooting for a perfect score! Right now I can reproduce this in PrusaSlicer and PantheonSlicer, haven’t tried with OrcaSlicer. Profiles are here.

@Sineos Thanks for pointing me to that issue. Seems like it could be the same thing.

@Mard0 I agree this fan cycling has no benefit to the parts, it’s probably making support interfaces worse.

I totally agree that the slicer is the immediate fix.

My bigger concern is that this timer issue kills prints randomly, and isn’t one of the host-hardware-related causes. Could be present in a lot of the mystery Timer too close failures that pop up.

If there is an underlying issue in serialqueue or similar I’d definitely think we should address it.

Sineos · July 27, 2023, 4:44pm

There are surely a lot of areas where code can be improved. Ultimately, it comes down to the question of whether to spend developer resources to mitigate questionable gcode.

Looking at an offending gcode file, you will find a lot of areas that look like:

M106 S25
M106 S204
G1 X117.434 Y135.618 E.00739
M106 S25
G1 F6359.991
M106 S204
G1 X117.343 Y135.636 E.0027
M106 S25
M106 S204
G1 F6720.764

I seriously doubt that this has any positive effect on print quality. On the contrary, it seems rather pointless and IMO the slicer is simply not doing its job: Simplify a stl into gcode commands that have a practical meaning. Instead, it just pumps out commands that stress the entire system, including the fan hardware.

Edit:
Just to translate this snippet: For a printing move of 0.017 mm the fan speed is changed 6 times

The gcode file is from the above-mentioned GH issue and sliced with latest Orca Slicer.

mykepredko · July 27, 2023, 5:12pm

Hey Alex,

I think you only loaded the PantheonSlicer profile in the GitHub link but I think it has a good clue as to what is happening.

Going back to your example gcode, you’re bouncing between 40% (40% = 102/255) and 100% fan speeds.

Looking through the profile you provided, I see:

max_fan_speed = 40
min_fan_speed = 40

at lines 496 (and others)

Along with:

overhang_fan_speed_0 = 100
overhang_fan_speed_1 = 95
overhang_fan_speed_2 = 90
overhang_fan_speed_3 = 80

starting at line 459.

To me, it looks like there is a conflict between the minimum (or maximum) fan_speed and the “overhang” fan_speed, which I’m guessing is the fan speed when you need support (ie for the “overhang”).

I don’t know PantheonSlicer but it seems to me that this is where you should be looking as I think you have parameters that are in conflict along with a software bug that adds the gcode for the default fan speed before changing to the required fan speed for the overhang/support.

LifeOfBrian · July 27, 2023, 5:48pm

IMHO the nozzle moved faster than the fan accelerated to the higher RPM…
This is a nice feature but it is quite useless for such tiny segments.
Does not make sense to work on that just to say it is resolved.

Sineos · July 27, 2023, 5:53pm

This is exactly the point:
Typically, the slicer’s job is to “optimize” / simplify the gcode in a way that hardware can follow the changes. It is comparable to gcode at a resolution of 0.0001 mm → Just a lot of traffic that will make the command queue collapse without any benefit as no hardware can follow such segments

mykepredko · July 27, 2023, 6:16pm

I just saw your shout out to Kevin on this issue - I’m not sure it’s appropriate at this time as there’s still information required about the printer and the set up that @Alex-Pantheon is running.

I think we’re agreement that the PantheonSlicer is providing code which is sub-optimal and it seems like that is the case in a number of areas. However, I don’t think @Alex-Pantheon is doing anything unusual in terms of using the printer and when I look at the Pantheon printer web page, a couple of things activate my Spidey-Sense:

What I found interesting on the page is:

We use a fork of SuperSlicer/PrusaSlic3r that include tuned profiles for our materials in a range of print settings.

The printer is also advertised to use Klipper but the company doesn’t say whether or not it is also forked. My question back to @Alex-Pantheon is what modifications has he made to the Klipper that was installed in the printer; are you using the base Klipper that came with the prrinter? Have you made any other changes to the software that you are using?

I suspect this is a case where the slicer and the firmware have been tweaked to work together and accommodate each other’s quirks but @Alex-Pantheon has changed something which doesn’t include the accommodations. To be fair, maybe @Alex-Pantheon hasn’t changed anything but has updated something in the software chain that wasn’t properly checked out by Pantheon.

There needs to be a bit more understanding of the system before bringing in development.

EDIT:

Sorry, one more thing: @Alex-Pantheon have you contacted Pantheon about this issue?

Alex-Pantheon · July 27, 2023, 7:30pm

@mykepredko lol I am Pantheon, I’m asking for input on an issue a bunch of our customers have run into.

Good catch on the profile stuff, I’m working on finding a permutation of settings that reduces the fan weirdness.

We’re running stock Klipper, I’ll let marketing know they should be more clear on that. At this point the slicer behavior is unchanged aside from where profile updates get pulled from, and some UI stuff.

The fan weirdness definitely exists upstream in PrusaSlicer, and apparently in OrcaSlicer as noted by @Sineos in the linked GitHub issue.

mykepredko · July 27, 2023, 7:49pm

Sorry - I thought your handle indicated the account for your Pantheon printer.

The only suggestion that I have is to go back to SuperSlicer, since your slicer is a fork, and see if you can recreate and resolve the issue.

Alex-Pantheon · July 27, 2023, 8:53pm

@mykepredko Definitely will work on the slicer side.

I’ve brought this issue up because I think this slicer behavior has exposed an underlying issue in multi-MCU timing/scheduling in Klippy, which I think affects all users’ reliability.

As far as developer resources go, I’m happy to contribute to fixing the issue with our team. I’m here to reach out to the community and see if anyone else has experienced this and start talking about how to fix it.

koconnor · July 27, 2023, 11:37pm

This is intentional. It is due to the way Klipper schedules events on the micro-controller. Fan updates (and pwm pins in general) are not currently queued on the micro-controller. So, in order to ensure each pin update has time to be sent from the host to the micro-controller, the code needs to ensure each pin update has a minimum application time.

It’s possible to rework the code to “collate the fan updates”, but it would be a lot of work and there would be no real-world gain from doing that work. Real world fans don’t change speed in less than a 100ms, so sending many fast fan updates doesn’t provide any value.

Another possibility would be to change the code so that it raises an error if fan updates become wildly out of sync with the toolhead movement (eg, 30+ seconds out of sync). That’s probably a good “sanity check” and would provide better debugging information to users, but I fear users may not like getting the resulting errors. I’ll have to think about that.

Your log indicates that the host CPU got overloaded:

Your log is missing the version and cpu information (I’m not sure why) so I don’t have any guesses as why the host got overloaded. It could be the result of many small commands that don’t consume much “toolhead movement time”. The log doesn’t seem to indicate anything wrong with the fan logic per say (other than the possibility that processing all those fan commands overloaded the host).

Separate from this log, I have seen cases where scheduling of fan commands way in the future (due to mandatory fan delays) could potentially cause timing issues because the host/mcu time synchronization code is not intended to schedule events a long time into the future. However, the log does not indicate that occurred here (the log indicates the host cpu got overburdened).

Finally, I’d recommend you update the slicer to avoid sending the command sequences causing this issue. Ultimately, if the slicer is sending crazy commands to Klipper then Klipper is going to struggle to get good results - the best fix is likely to get a sane command stream into Klipper.

Cheers,
-Kevin

mykepredko · July 28, 2023, 3:14am

I would be hesitant to agree with a hypothesis that states there could be any issue with multi-MCU operations based on what’s happening here.

Personally, I have never experienced a “Timer too close” error but, based on what I have seen here on this list server, it indicates that there is a problem in the system that doesn’t involve Klipper. It’s usually an overloaded host system (usually occurring in a system that has less computational power than a Raspberry Pi 3+), serial communications (USB, NRZ serial, CAN, etc.) that doesn’t have the necessary bandwidth or adding accessories to the host (a webcam being a big culprit).

Sending multiple fan and other gcode commands seems to be a new error type and needs to be resolved before looking to see if there are any issues with Klipper.

LifeOfBrian · July 28, 2023, 6:41am

I would second this as I had similar behaviors where I was wondering why the fan was so late on changing.
But I had the other way round as well that the fan already changed RPM before it should.

Alex-Pantheon · July 28, 2023, 5:11pm

Yeah, it was pretty clear to me that was the intention. I totally agree with that choice.

I’d want a warning maybe, but no shutdown. Killing prints sucks, especially when it’s caused by a recoverable problem that isn’t a safety issue.

The version info is my error, I trimmed the log down to start at machine reboot but evidently overshot. This test was run with on master at commit d725dfd on the host and both MCUs, will grab the complete log when I have a moment.

Thanks for the design insight @koconnor.
Seems like this file doesn’t actually test the issue. Organic, free-range gcode is the the way to go! I’ll keep an eye on that going forward.

Yep, at this point the problem feels like “hand hurts when grabbing thorns, devs pls patch”.
I’ll put my energy into fixing the slicer, thanks for clearing that up!

Topic		Replies	Views
Timer too close SB2040 Pro and Manta M8P V2.0 General Discussion	9	152	July 21, 2024
Timer too close, multiple fans General Discussion	23	205	November 9, 2024
New Timer Too Close... Need Help General Discussion	6	123	August 30, 2024
"Timer Too Close" at a consistent time in file General Discussion	7	669	April 21, 2024
Timer too close on ebb36 General Discussion	2	75	December 24, 2024