KB: Klipper systemd service. CPU Nice/Weight & etc

These are just my thoughts. I think, in general, there should be no problem with CPU congestion on modern Linux.
So, there are tools to allow Klippy to use the CPU as needed, regardless of other services on the SBC.

Test setup: RPI 5, Debian 13 (trixie).
Dummy service:

# /etc/systemd/system/klipper-test.service
[Unit]
After=network-online.target
Wants=udev.target

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
User=user
WorkingDirectory=/tmp
ExecStart=/usr/bin/stress-ng --cpu 1 -t 0
SystemCallFilter=@known

So, when I run it, I expect it to use 100% of one CPU Core.

When I run CPU intensive process as the ssh user.

Systemd modifies CPU weights somewhere, so we could expect that occasional make -j4 should not crash Klipper with TTC.

Let’s run it as a systemd service, assume it is a WebCam stream, KlipperScreen or anything.

$ sudo systemd-run stress-ng --cpu 4
Running as unit: run-p16511-i16811.service

Now they share CPU time.

Let’s renice: sudo renice -n 5 -p 16513 (repeat for each process)

I’m a little surprised, it is works, but okay

Surprised because

because it may not work, depending on the autogroup decision.

/proc/16416/autogroup:/autogroup-4154 nice 0
/proc/16417/autogroup:/autogroup-4154 nice 0
/proc/16512/autogroup:/autogroup-4178 nice 0
/proc/16513/autogroup:/autogroup-4178 nice 0
/proc/16514/autogroup:/autogroup-4178 nice 0
/proc/16515/autogroup:/autogroup-4178 nice 0
/proc/16516/autogroup:/autogroup-4178 nice 0

So, last time when I worked around this, this almost never worked predictably.
Without disabling autogroups: sysctl kernel.sched_autogroup_enabled=0

Like, according to Autogroup, it should split time equally between groups.
But this does not happen.
When we add a nice, it should renice only inside the autogroup.
So, nothing should happen, but it does right now.

That sort of thing. On the desktop it is much more tricky - more services, more processes, more groups.

Let’s try another approach. Now, we will use the CGroups integration of systemd.

[Service]
...
CPUWeight=1000

The basic idea is simple: services have weights, and CPU is shared equally according to weights; the default is 100.
If the CPU is congested, it will basically sum all weights and give each service the slice equal to the part of the sum of weights in the CGroup tree.
If not congested, then the process can use whatever it wants (no CPUQuota).

sudo systemctl stop run-p16511-i16811.service
<edit> /etc/systemd/system/klipper-test.service
sudo systemctl daemon-reload
sudo systemctl restart klipper-test.service
sudo systemd-run stress-ng --cpu 4
Running as unit: run-p17082-i17382.service;

And it again has enough time to do its own job.
To get some idea of how the tree is looking: systemd-cgtop can provide a view.

CGroup                                                           Tasks   %CPU   Memory  Input/s Output/s
/                                                                  223  400.6   445.4M        -        -
system.slice                                                        58  398.9        -        -        -
system.slice/run-p17082-i17382.service                               5  297.9        -        -        -
system.slice/klipper-test.service                                    2   99.9        -        -        -
user.slice                                                          11    0.7        -        -        -
user.slice/user-1000.slice                                          11    0.7        -        -        -
user.slice/user-1000.slice/session-84.scope                          9    0.7        -        -        -
system.slice/klipper.service                                        14    0.7        -        -        -
system.slice/moonraker.service                                      11    0.4        -        -        -
system.slice/klipper-mcu.service                                     1    0.0        -        -        -
system.slice/NetworkManager.service                                  3    0.0        -        -        -
init.scope                                                           1      -        -        -        -

So, we can expect that services in system.slice will share their CPU time according to weight.
System slice has the same weight as the user slice (where the ssh user runs commands).
So, if it tries to use more than 50% of the CPU, when the user tries to do the same. Time will be equally split.

$ grep . /sys/fs/cgroup/{user,system}.slice/cpu.weight
/sys/fs/cgroup/user.slice/cpu.weight:100
/sys/fs/cgroup/system.slice/cpu.weight:100

Because in the first scenario, our dummy service is trying to use only 25% of the CPU, it does not experience any congestion.
I expect Klipper to use less than 1 core time under normal circumstances, so I hope it is a fair test.

My summary, I think it would be useful to set weight to the klipper.service in the SBC setups, where several heavy things can use all available resources.
Maybe, if one specialized SBC per printer is expected setup, so there is a klipper and other low priorities processes. It can make sense to set it to some high value by default.

Hope that helps someone.

You can set the priority, and it works very well currently. But it requires the kernel parameter isolcpus to cooperate
The commented two-line configuration effect will be more aggressive and needs to be used with a real-time kernel.

[Service]
Type=simple
CPUAffinity=3
Nice=-18
IOSchedulingPriority=2
# CPUSchedulingPolicy=fifo
# CPUSchedulingPriority=99
1 Like

Thanks for the suggestion, I appreciate that.

Well, I suggest ignoring IO priorities here.
Under normal circumstances, there should be no disk IO-related issue with Klipper, AFAIK.
Maybe only if it is unable to read GCode, but then its reactor should stall, probably?
virtual_sdcard.py

About CPU nice, as I said above, it is a little broken.
Let’s do another simple test:

[Service]
...
ExecStart=/usr/bin/stress-ng --cpu 3 -t 0
Nice=-10

And I run stress-ng --cpu 3 in the SSH terminal.

It would be expected that services, having higher priorities, have higher CPU share.


But it does not

If I run load in a system slice sudo systemd-run stress-ng --cpu 3, it does.

I think CGroups override autogroup behaviour in some circumstances.

I do not like how the Nice behaved in the past years and now. It feels wrong to guess the behaviour under different circumstances.

There is no need to use priorities if we bind the process to the core and isolate that core from the scheduler.

About CPUAffinity, a simple check, similar to above:

[Service]
...
ExecStart=/usr/bin/stress-ng --cpu 3 -t 0
CPUAffinity=3

It does behave as expected, by binding the process to the CPU, with isolcpus, it would give it one core completely.

Klipper has a low thread count, but they exist. It will be wrong to make threads fight each other on one CPU Core if others are still available and at the same time, remove one core from the system with isolcpus.

I agree there is a place for that solution, like it is done in the PR for Linux MCU MMAP GPIO.
If, in some time, in the future, there will be an SBC which will work instead of the MCU, and there will be some solution to make GPIO work fast from userspace (like it is done with MMAP).
Then it definitely would make sense to isolate one core specifically for that process. So, it will work as a sort of Coprocessor.
(I forgot last time I had to touch it, but then it can make sense to play with preemption and IRQs in the kernel to further isolate that CPU).

FIFO does not require RT patchset.
It has already been used in Linux MCU since 2017:

static int
realtime_setup(void)
{
    struct sched_param sp;
    memset(&sp, 0, sizeof(sp));
    sp.sched_priority = sched_get_priority_max(SCHED_FIFO) / 2;
    int ret = sched_setscheduler(0, SCHED_FIFO, &sp);

It is just a different, very high-priority process queue.

Thanks.


Thinking about it further, it could be interesting if Klippy could utilize the SCHED_DEADLINE.
Because it technically has time deadlines and a sort of expected CPU usage.

But it feels like too much for what can be mostly accomplished with the CPU weights.