Advanced Trouble-Shooting / Graphing Klipper
The klippy.log
is key for trouble-shooting and diagnosing errors, containing an abundance of information and also low-level details. In Klipper’s tool-box are additional tools to facilitate such a diagnosis.
They are often targeted at developers with an in-depth knowledge about Klipper’s internal working, but the graphstats.py tool is also very valuable to diagnose common errors.
General Idea
- Sharp spikes in such graphs in one or the other direction could point to an issue or “event”, especially when at the same time-stamp an error is recorded in the log
- Exceeding given limits shortly or even over a longer period is a more strong indication that something is going on
- To facilitate the graphing we have created a little webpage that allows uploading a
klippy.log
and creating the graphs: https://sineos.github.io/- Currently quite basic, e.g. no error handling etc.
- If you are a HTML guru, any improvement is of course highly welcome
Graphing the Host
This section will demonstrate how to graphically evaluate the information about the Host that Klipper collected in its log.
Command
~/klipper/scripts/graphstats.py /tmp/klippy.log -o loadgraph_host.png
- The
klippy.log
file either is in~/printer_data/logs/klippy.log
or/tmp/klippy.log
- The
loadgraph_host.png
is generated in the same folder from which you are calling the script, when no dedicated path is provided, e.g.-o /tmp/loadgraph_host.png
Result
Interpretation
-
System Load
or CPU load is shown on the left y-axis of the graph. It is relative to one CPU core, this means a 4 core CPU like the Raspberry Pi 3 is overloaded when reaching / exceeding 400% -
System Memory
is the free RAM of the host and shown on the right y-axis -
Process Time
is the CPU usage of the Klipper host process alone and is also relative to the left y-axis. For example 5% means that the Klipper process is using 5% of one CPU core
Graphing the MCU
This section will demonstrate how to graphically evaluate the information about the MCU that Klipper collected in its log.
Command
~/klipper/scripts/graphstats.py /tmp/klippy.log -o loadgraph_mcu.png -s
- The klippy.log file either is in
~/printer_data/logs/klippy.log
or/tmp/klippy.log
- The
loadgraph_mcu.png
is generated in the same folder from which you are calling the script, when no dedicated path is provided, e.g.-o /tmp/loadgraph_mcu.png
Result
Interpretation
-
Bandwidth
denotes the saturation of the communication line between Host and MCU. Scaling is based on a 250000 baud serial line. This means 100% here, when using a much faster USB connection, is not necessarily an indication for problems -
MCU load
is the processor load of the MCU. Values around 100% or even over do indicate an issue, for example a too high step rate -
Host Buffer
is an indication of the amount of data buffered in the serial buffer or the MCU buffer. Klipper tries to buffer movement data 2 seconds ahead. So, if during a print the buffer time dropped to only 1 second of data, it would show up as a spike to 50%. Most often this occurs with OctoPrint when it is not able to send commands fast enough. -
Awake Time
is the time the MCU is spending in “task handlers”. This is a very low level information targeted at developers
Graphing a Heater
This section will demonstrate how to graphically evaluate the information about the different heaters that Klipper collected in its log.
Command
~/klipper/scripts/graphstats.py /tmp/klippy.log -o loadgraph_extruder.png -t extruder
- The klippy.log file either is in
~/printer_data/logs/klippy.log
or/tmp/klippy.log
- The
loadgraph_extruder.png
is generated in the same folder from which you are calling the script, when no dedicated path is provided, e.g.-o /tmp/loadgraph_extruder.png
- The
-t
argument takes the name of the heater to graph, e.g. extruder or heater_bed
Result
Interpretation
-
XX Temp
is the actually measured temperature of the heater. Printed on the left y-axis of the graph. Sudden drops, wild swings or gradual drops might indicate an issue with the heating system. If the temperature change follows a target change, then it is no issue but wanted control behavior.- Sudden drops often related to wiring issues or dead heating cartridges
- Wild swings might point to improper PID tuning
- Gradual drops could indicate insufficient heating power or high heat loss, e.g. through fans
-
XX Target
is the commanded target temperature at the given time. Also printed on the left y-axis of the graph. -
XX PWM
is the PWM value with which Klipper is controlling the heating power to achieve the target temperature. Printed on the right y-axis of the graph.
Graphing MCU Frequency
This section will demonstrate how to graphically evaluate the information about the MCU frequency that Klipper collected in its log.
This already is a very low level information and usually not needed for generic trouble-shooting. For multi-MCU systems it provides an indication, if there are synchronization issues between the different MCUs.
Command
~/klipper/scripts/graphstats.py /tmp/klippy.log -o loadgraph_freq.png -f
- The klippy.log file either is in
~/printer_data/logs/klippy.log
or/tmp/klippy.log
- The
loadgraph_freq.png
is generated in the same folder from which you are calling the script, when no dedicated path is provided, e.g.-o /tmp/loadgraph_freq.png
Result
Interpretation
-
freq
is the nominal clock frequency of the MCU, as defined in the[mcu]
setting (not to be confused with the CPU MHz rating / clock speed) and the graph shows how much the MCU is deviating from this nominal clock speed (from the perspective of the Klipper host and his clock). Spikes of a few hundred microseconds in less than a minute may indicate a problem. -
z freq
is the same as above but for a second MCU that is defined in the config as[mcu z]
. It clearly shows a much higher jitter than the main MCU but if there are no spikes of a few hundred microseconds in less than a minute no issue is to be expected -
z adj
is the Klipper Host’s adjustment for this MCU to make it aligned / synchronized with the main MCU