Klippy.log - Statistics and Interpretation

klippy.log - Statistics and Interpretation

Table of Contents

Introduction

Klipper’s log file (klippy.log) periodically outputs a “Stats” line containing a wealth of information about the printer’s performance and state. This information is key to any troubleshooting or diagnostic initiatives. In particular, the lines directly preceding an unexpected and unwanted event can contain valuable information to understand the root cause and derive appropriate corrective actions.

  • Each “Stats” line begins with a timestamp indicating the uptime of the Klipper host computer in seconds (e.g., Stats 1670.9:).

  • These statistics are typically generated and logged every second during printing (and at longer intervals during idle phases).

    :warning: Critical Indicator: If there are gaps larger than one second between consecutive “Stats” lines during printing, it may indicate serious performance issues with the Klipper host software or an overloaded host system, warranting further investigation.

  • The “Stats” lines provide a snapshot of various subsystems:

    • MCU (Microcontroller Unit) Data
    • Host Data
    • Print Process Data
    • Temperature Sensor Data
    • CAN-bus Data

Statistics Explained

The available statistics are explained below. In the log, the statistics need to be reviewed carefully to properly determine which “main” subsystem they belong to. The following article attempts to group them logically but not sequentially (as appearing in the log).
The effective content of the “Stats” line depends on the reporting devices. Some elements are optional and are only displayed when the respective hardware is present and configured.

Classification

In the following text, and wherever possible, certain ranges or limits are given. These are not necessarily the ultimate truth and should be understood considering the big picture:

  • Are there constant changes over time or sharp (de-)increases?
  • Is an out-of-bound value a one-off event?
  • Are certain critical values present before an unexpected or unwanted event occurs?
  • Is the suspicious value connected to the event or rather coincidental?

1. Host System Statistics

These values reflect the status of the computer running the Klipper host software (e.g., Raspberry Pi).

  • sysload: The 1-minute system load average on the host computer. For example, a value of 1.0 on a single-core system means the CPU was fully utilized on average over the past minute. On a multi-core system, you can assess the relative load by considering the number of CPU cores; e.g., a sysload of 2.0 on a 4-core CPU indicates an average utilization of 50% (2.0/4 cores=0.5).

    :light_bulb: Note: Consistently high values (e.g., approaching or exceeding the number of CPU cores) might indicate the host is overloaded, potentially impacting print performance.

  • cputime: Cumulative CPU time consumed by the Klipper host process (klippy.py) in seconds. It shows how much processing time Klipper itself is using.

  • memavail: Available memory on the host system in kilobytes (KB), which includes buffers and cache that can be freed by the kernel if needed. This is a more practical measure of available memory than just “free” memory.

    :light_bulb: Note: Persistently low available memory (e.g., consistently <25 MB or 25000 KB) might risk Out-of-Memory (OOM) errors, leading to Klipper instability or crashes.


2. Print Progress Statistics

These statistics relate to the execution of a print job.

  • gcodein: Reports the number of bytes read from the /tmp/printer pseudo-tty input. It will grow when using OctoPrint, but for most other uses, it will be zero.

  • print_time: This time represents the last scheduled action that was processed using the “lookahead queue”. It is a timestamp relative to the main MCU’s ([mcu]) last restart time. It is normal for this value to increment during a print. It typically does not increment when not in a print.

  • buffer_time: The duration, in seconds, of future moves or commands that are fully planned out, and ahead of the current action being processed.

    :right_arrow: Healthy values: Generally, intervals between 2 to 5 seconds during a print are considered normal. If this interval were to drop to zero during normal printing, it would indicate that the microcontrollers are processing actions faster than the host software can schedule them.

  • print_stall: A counter indicating the number of times the printer had to pause because Klipper’s buffer (see buffer_time) ran empty.

    :right_arrow: Healthy values: Should ideally be 0 throughout a print. Any stalls can cause blobs or other print quality artifacts. Note that stalls can also occur due to excessive host system latency (e.g., high sysload preventing timely command processing).


3. MCU Specific Statistics

For each microcontroller connected to Klipper (e.g., [mcu], [mcu toolboard], etc.), a set of statistics is reported.

MCU Load & Task Timing

  • mcu_awake: The time in seconds that the microcontroller’s scheduler was busy executing tasks during the last 5 seconds, rather than idling in its main loop. For example, a value of 0.05 means the MCU spent 50 ms of the 5-second sample period executing tasks.

  • mcu_task_avg: The average time, in seconds, that the microcontroller spends executing each internal “task” or command. This value is also relative to a 5-second sample period.

  • mcu_task_stddev: The standard deviation of the time spent per microcontroller task. This indicates the variability in task execution times (relative to a 5-second sample period). A low standard deviation is preferred.

Communication Health & Sequencing (Host to MCU)

  • bytes_write: The total number of bytes written (sent) by the host to this MCU.

  • bytes_read: The total number of bytes read (received) by the host from this MCU.

  • bytes_retransmit: The number of bytes that had to be resent from the host to the MCU because an acknowledgment was not received in time.

    :warning: Critical Indicator: This value should ideally be 0 or extremely close to it. While some retransmissions are normal during the start-up phase, a consistently climbing bytes_retransmit count indicates an unstable communication link.

  • bytes_invalid: The number of invalid or corrupted data bytes received by either the host or the MCU.

    :warning: Critical Indicator: This value should ideally be 0 or extremely close to it. While some invalid bytes are normal during the start-up phase, a substantially increasing value signifies severe data corruption in the communication link and requires investigation. Outdated CAN adapter firmware or bugs in the Linux kernel (below v6.6) are common causes of increasing bytes_invalid counters for CAN connections.

  • send_seq: The sequence number of the last message sent from the host to the MCU.

  • receive_seq: The sequence number of the last message sent from the host to the MCU that has been acknowledged by the MCU.

    :light_bulb: Note: send_seq and receive_seq should track each other closely. If receive_seq consistently lags send_seq, or if retransmit_seq increments, it points to communication problems.

  • retransmit_seq: The sequence number related to retransmissions. An increasing value shows that messages are actively being resent.

  • srtt (Smoothed Round-Trip Time): An estimate of the average time it takes for a message to travel from the host to the MCU and for its acknowledgment to return. This is a key indicator of communication link latency and health.

    :right_arrow: Healthy values: Should be low and stable.

    • USB and UART are typically the most stable connections with the least jitter.
    • Klipper retransmits the packet if the round-trip time exceeds a threshold around 25 ms.
  • rttvar (Round-Trip Time Variance): A measure of the variation or jitter in the round-trip times. A stable connection has very low variance.

    :right_arrow: Healthy values: Typically around 1 ms (0.001 s) or less. Increased or constantly fluctuating values can indicate an unstable connection.

  • rto (Retransmission Timeout): The calculated timeout value for retransmissions. If an acknowledgment for a sent message isn’t received within this duration, Klipper attempts to retransmit the message. This value is dynamically adjusted based on srtt and rttvar.

Command Buffering (Host to MCU Transmit Queues)

  • ready_bytes: These are command bytes that Klipper has fully processed and are ready for immediate transmission to the MCU, currently residing in the host’s send queue for that specific MCU. Small, fluctuating values are normal as data is batched and sent. However, if this value grows large and stays elevated, it suggests the communication link cannot keep up with the data Klipper wants to send, or the MCU is not processing commands quickly enough to clear its own internal buffers. These are commands that, from Klipper’s host software perspective, could be sent immediately.

  • upcoming_bytes: These are command bytes that are scheduled for transmission at a specific future time, primarily due to available space in the MCU buffer or other mechanisms. These commands are just not yet due for transmission. It is normal for this value to grow and it does not represent an issue.

MCU Clock Frequency

  • freq: The clock frequency of the microcontroller, in Hertz (Hz), as measured by the Klipper host software relative to the host clock.

    :light_bulb: Note: This should align closely with the nominal clock speed of the given controller, as per its datasheet or product description, and be set internally during the compilation of the firmware. At start-up, Klipper will issue the warning MCU xxx configured for yyy MHz but running at zzz MHz! if the deviation exceeds 1%.

  • adj (Adjusted Frequency): The adjusted clock frequency of the microcontroller based on Klipper’s internal clock calibration and synchronization mechanisms. The “reference clock” is the main MCU [mcu].

    :warning: Critical Indicator: This value should be very close to the nominal freq. Significant deviations or fluctuations could indicate MCU clock stability or synchronization issues.


4. CAN-bus Interface Statistics (Optional)

If the Klipper setup uses CAN-bus for communication with one or more MCUs, the stats show a canstat_DEVICE_NAME: section (e.g., canstat_TOOLBOARD:). This reports statistics for the CANbus interface itself (often on the host or a primary MCU acting as a CAN gateway), not the Klipper application-level communication to the CAN-connected MCU.

:light_bulb: Note: These statistics are from the perspective of the relevant CAN-connected microcontroller and may vary from the host-side Linux interfaces (like can0).

  • bus_state: The current operational state of the CAN bus.

    • active: Normal operation. This is the desired state.
    • warn: Error counters on the CAN interface have reached a warning level.
    • passive: The CAN node has detected significant errors and will operate in a restricted mode (e.g., not actively driving error flags).
    • off: The CAN node has detected severe errors and has taken itself off the bus to prevent disruption. Essentially, only a USB to CAN-bus Bridge can report this, as other nodes simply drop off the bus.

    :warning: Critical Indicator: Any state other than active indicates CAN bus problems (e.g., wiring issues like bad crimps or incorrect termination, faulty device, bus overload, EMI).

  • rx_error: A counter for receive errors detected by this specific CAN interface hardware. Should ideally be 0.

  • tx_error: A counter for transmit errors detected by this specific CAN interface hardware. Should ideally be 0.

  • tx_retries: The number of times CAN frames had to be retransmitted at the low-level CAN bus hardware layer by this interface.

    :right_arrow: Healthy values: While occasional, very infrequent retries might occur on a noisy bus, this value should be 0 or extremely low. Consistently high or rapidly increasing tx_retries points to significant CAN bus communication issues, potentially caused by hardware, configuration or wiring issues.

    :light_bulb: Note:

    • This tx_retries for the canstat interface is different from an MCU’s bytes_retransmit statistic. canstat tx_retries refers to CAN-frame retransmissions on the physical bus layer, while an MCU’s bytes_retransmit refers to Klipper protocol message retransmissions over its established communication link layer.
    • Certain microcontrollers report only some of these stats. The STM32 MCUs don’t report tx_retries, and the RP2040 MCUs don’t report tx_error.

5. Temperature Sensor Statistics

This section shows data from configured heaters and various temperature sensors.

  • For heaters or sensors associated with a control loop (e.g., heater_bed:, extruder:):

    • target: The target temperature set for the heater in Celsius (°C).
    • temp: The current measured temperature in Celsius (°C).
    • pwm: The current PWM (Pulse Width Modulation) duty cycle being applied to the heater, ranging from 0.0 (0%, off) to 1.0 (100%, full power). This shows how hard the heater is working. A pwm value of 0.0 indicates 0% power is being applied at that specific instant. However, during active PID control, brief 0% PWM cycles are normal as the heater coasts to the target temperature or as the PID algorithm makes fine adjustments (including temporarily cutting power) to maintain the setpoint accurately.
  • For other temperature sensors without control loop (e.g., [temperature_sensor], host temperature sensor, MCU temperature sensors):

    • temp: The current measured temperature in Celsius (°C). These can be MCU internal temperatures, ambient temperatures in enclosures, or other monitored points relevant to the printer’s operation.
4 Likes