Not everything is happening as you described, it have other architecture and function flows, you can read about how it’s working with MCU here and here
This statement is almost completely wrong, Klipper doesn’t send any useless commands to MCU, klipper host is fully controling MCU and telling it to do some job in exact future moment, MCU will get that request and schedule future execution of it. When time will come - MCU will execute command and report results. Results will be analyzed and interpreted by klipper host. There is a posibility to overload MCU but the lowest MCU hardware can execute 18k commands per second, Atmega2560 is capable to execute 23k commands
@gaolst “ Klipper doesn’t send any useless commands to MCU”
So you are saying that klipper analyzes every Gcode command to determine if it really needs to be processed like M107 multiple times in different locations in your Gcode but never any M106 commands, so Klipper only uses one?
I really think you are giving the programmers more credit of designing a defense. I am a programmer, and I try to figure out how issues can cause my software to malfunction and defend against it happening.
But spending time on something that should never happen don’t waste your time on.
config_digital_out oid=%c pin=%u value=%c default_value=%c max_duration=%u : This command creates an internal micro-controller object for the given GPIO ‘pin’. The pin will be configured in digital output mode and set to an initial value as specified by ‘value’ (0 for low, 1 for high). Creating a digital_out object allows the host to schedule GPIO updates for the given pin at specified times (see the queue_digital_out command described below). Should the micro-controller software go into shutdown mode then all configured digital_out objects will be set to ‘default_value’. The ‘max_duration’ parameter is used to implement a safety check - if it is non-zero then it is the maximum number of clock ticks that the host may set the given GPIO to a non-default value without further updates. For example, if the default_value is zero and the max_duration is 16000 then if the host sets the gpio to a value of one then it must schedule another update to the gpio pin (to either zero or one) within 16000 clock ticks. This safety feature can be used with heater pins to ensure the host does not enable the heater and then go off-line.
config_pwm_out oid=%c pin=%u cycle_ticks=%u value=%hu default_value=%hu max_duration=%u : This command creates an internal object for hardware based PWM pins that the host may schedule updates for. Its usage is analogous to config_digital_out - see the description of the ‘set_pwm_out’ and ‘config_digital_out’ commands for parameter description.
Setting up a timer takes anywhere between 4-6 clock cycles in an IDEAL situation. But there is multiple things going on at once while 3d printing so that’s very much an ideal.
So 73 calls to turn on and off a fan is at a MINIMUM 292 clock cycles which granted is next to nothing but thats without taking into account the communication overhead, other bus transactions, other code things I’m not taking into account, signal delays, bus contention etc. etc.
All while Klipper tries to maintain a 25ms sync.
I think the important point here though is…
The GCODE was erroneous and shouldn’t have generated like that.
Coupled with the fact that adding in code to analyze WHY a shutdown happened involves so many variables you’d be coding forever and introduces potential safety questionable delays.
The Klipper mindset is “Fail safe and figure it out later”
I agree there could be some EXTERNAL tools that help with that. I’m actually working on a few for that very reason.
Edit: Gaolst is right, I forgot that SET_FAN_SPEED only changes if there is a sufficiently large change.
Yes, if you analyze klipper code you will see that vanilla M107, M106 and SET_FAN_SPEED all of them will use same function set_speed_from_command() from fan.py
In this exact case user did customized M107 and M106 but in the end they still call vanilla SET_FAN_SPEED command.
if you go deeper in set_speed_from_command you will see that speed logic will compare new value with last one, and if they match - it will just return - ignore it, because speed was not changed.
Issue doesn’t have direct relation to MCU at all, you can check Dump log and you will see there are no pwm commands in MCU Sent logs.
I think the issue (klipper host delay) is caused by callback call itself which can flush toolhead lokahead buffer and do print_time recalculation in some cases.
Gaolst is correct in this circumstance. Klipper will ignore that because the Fan was already set to 0.
See the code below:
I marked the relevant areas with >>
def set_speed(self, print_time, value):
if value < self.off_below:
value = 0.
value = max(0., min(self.max_power, value * self.max_power))
if value == self.last_fan_value:
return
print_time = max(self.last_fan_time + FAN_MIN_TIME, print_time)
>> if self.enable_pin:
>> if value > 0 and self.last_fan_value == 0:
>> self.enable_pin.set_digital(print_time, 1)
>> elif value == 0 and self.last_fan_value > 0:
>> self.enable_pin.set_digital(print_time, 0)
if (value and value < self.max_power and self.kick_start_time
>> and (not self.last_fan_value or value - self.last_fan_value > .5)):
# Run fan at full speed for specified kick_start_time
self.mcu_fan.set_pwm(print_time, self.max_power)
print_time += self.kick_start_time
self.mcu_fan.set_pwm(print_time, value)
self.last_fan_time = print_time
self.last_fan_value = value
If the the value is 0 and the last value wasn’t greater than 0 it ignores it.
It also has a hysteresis around the around the set speed value to not trigger for small changes.
With a lot of other GCode commands I’m with you and my intuition is you’d be right, but in the case of Fan Speed it does have code built in to prevent rapid changes.
Just for clarity, a “Timer too close” error occurs when the mcu attempts to schedule a timer at a target time that is in the past (see sched_add_timer() in src/sched.c). It almost always occurs as a result of a message from the host that requests an action at a time that is now in the past. (And that is what occurred in both of the logs that I looked at in this thread.) Typically, the host schedules actions at least 100ms in the future (and often 2+ seconds in the future), so getting a request to schedule something in the past is a severe error.
In the first log above, the host cpu got delayed by about 4 seconds (you can look at the timing of the Stats messages - they should appear once a second but at the time of the event there are multi-second gaps). In the second log, there were severe processing issues over about 7 seconds. It appears in both cases the delay is due to processing time within the klipper process itself (as the cputime statistic in the Stats messages during the event increase by multiple seconds). Indeed the provided graphs indicate a severe issue, as the process_time stat should typically be no more than around 20% - but the graphs show it regularly spiking above that. Certainly a process_time spike near 100% is a severe issue.
From the troubleshooting done earlier, it seems that something with the M107 messages caused high load within Klipper. It is not immediately clear to me why. It is possible it is related to the elaborate macros being used to override the default M107 behavior. Certainly, Klipper should be able to handle many thousands of M107 requests per second (Python isn’t that slow). I suspect it will take a developer to profile the code to see what caused high internal processing time.
I guess it’s also possible there was some odd interaction between the gcode processing and operating system (or system hardware) on that particular machine, and the M107 changes tickled the behavior.
I’d say the best short term workaround (as has already been done) is to eliminate the excessive M107 commands and to continue to monitor the process_time reports.