Tune TIMER_MIN_TRY_TICKS

After the neopixel discussion: Not all WS2812Bs light up under certain configurations (not a power problem) - #14 by nefelim4ag

I just brought my attention to the timer dispatch again, and realized there are AVR-specific optimizations: klipper/src/avr/timer.c at master · Klipper3d/klipper · GitHub

Namely, a more “precise” min try ticks time.
I would expect that the actual timer enter/exit time on most MCUs would be less than 2us.
And if it is significantly less, it may be useful to adjust it. (If it is larger, I would be puzzled a little).

I sketched timer measurements: GitHub - nefelim4ag/klipper at mcu-timer-measurment

RP2040 ~750ns
$ python3 ./klippy/console.py /dev/serial/by-id/usb-Klipper_rp2040_E66368254F36922C-if00
...
measure_timer
measure_timer
measure_timer
009.529: #output: Starting measure
009.529: #output: enter_time: 4, exit_time: 5, total: 9
009.529: #output: Starting measure
009.529: #output: enter_time: 4, exit_time: 5, total: 9
009.529: #output: Starting measure
009.529: #output: enter_time: 4, exit_time: 4, total: 8

9 / 12_000_000 = 0.000000750 ~= 750ns

I’ve tried to copy the vector table to RAM (for STM32G0), like in the STM32F0, but it only shaves 2 cycles.

STM32G0B1 ~2800ns
~/klippy-env/bin/python3 ./klippy/console.py /dev/serial/by-id/usb-Klipper_stm32g0b1xx_4D003A000C50425539393020-if00
...
measure_timer
measure_timer
measure_timer
measure_timer
017.287: #output: Starting measure
017.287: #output: enter_time: 79, exit_time: 99, total: 178
017.287: #output: Starting measure
017.287: #output: enter_time: 79, exit_time: 99, total: 178
017.287: #output: Starting measure
017.287: #output: enter_time: 79, exit_time: 99, total: 178
017.287: #output: Starting measure
017.287: #output: enter_time: 79, exit_time: 99, total: 178
...

178 / 64_000_000 = 0.000002781 = 2.8us

STM32F042 ~3.5us
~/klippy-env/bin/python3 ./klippy/console.py /dev/serial/by-id/usb-Klipper_stm32g0b1xx_4D003A000C50425539393020-if00
...
measure_timer
006.892: #output: Starting measure
006.892: #output: enter_time: 65, exit_time: 104, total: 169
measure_timer
009.520: #output: Starting measure
009.520: #output: enter_time: 65, exit_time: 104, total: 169
measure_timer
012.095: #output: Starting measure
012.095: #output: enter_time: 65, exit_time: 104, total: 169
measure_timer
014.372: #output: Starting measure
014.372: #output: enter_time: 65, exit_time: 104, total: 169
...

169 / 48_000_000 = 0.000003521 = 3.5us

STM32H7 ~220ns
~/klippy-env/bin/python3 ./klippy/console.py /dev/serial/by-id/usb-Klipper_stm32h723xx_0C0024001951313434373135-if00
...
measure_timer
measure_timer
measure_timer
measure_timer
measure_timer035.576: #output: Starting measure
035.576: #output: enter_time: 44, exit_time: 71, total: 115
035.576: #output: Starting measure
035.576: #output: enter_time: 39, exit_time: 53, total: 92
035.576: #output: Starting measure
035.576: #output: enter_time: 41, exit_time: 69, total: 110
035.576: #output: Starting measure
035.576: #output: enter_time: 41, exit_time: 69, total: 110

035.838: #output: Starting measure
035.838: #output: enter_time: 44, exit_time: 71, total: 115

115 / 520_000_000 = 0.000000221 ~= 221ns

Unfortunately, that’s all the MCUs that I have.

I was somewhat surprised by the STM32G0. I suspect nothing critical, but it probably does try to exit from the timer code and then receives the next one.

Thanks.

The idea of TIMER_MIN_TRY_TICKS is to allow multiple “timers” to run back-to-back when multiple timers are pending.

It has to be large enough to cover the time needed to check and register the next timer. That is, we need to make sure the timer_set(next) code always registers a time in the future. If it registered a time in the past, the timer execution would stall and we’d get a watchdog reboot.

We also don’t want it to be too large, as otherwise we could waste cpu cycles waiting for the next timer when we would be better off using those cycles running “task” code.

My thinking when I wrote that code was that the ideal TIMER_MIN_TRY_TICKS would be just a little larger than the time it takes to register the next timer, exit the timer irq handler, and reenter the timer irq handler. This would be mostly pointless cpu work, but as long as we ran a few “task” instructions then it would be a net positive over just wasting instructions in “busy” looping.

All that said, I’d be surprised that tuning it would improve overall performance. It definitely can’t be too small that it causes reboots, but other than that I’d be surprised it had much impact. Afterall, It should not be common to schedule many timers in close proximity to each other.

As for the stm32g0 - the results do look surprising and I’m not sure why that is. One thing to consider is that timer_read_time() on cortex-m0 processors can be slow. So, I guess it’s possible that your measurement code itself is inflating the results. In the past, I’ve done cpu measurements of timer_read_time() itself and found large variances.

Cheers,
-Kevin

We are on the same page here.
Yes, this test code would show a little “pessimistic” result. I just wanted to have an estimation.

My general thoughts here, was, that maybe it would make sense to decrese value or set it per MCU.

But the main reason was 2 reports of issues with long neopixel chains (tens of pixels) with STM32G0B1, and I thought that maybe it is the MIN_TRY_TICKS in pair with the slow enough MCU that does that. (Like 2 close timers + busy loop = large pause inside long pulse).
And the thing is, the 2+us per interrupt is the answer to why Neopixel chains could update unreliably.
The more sophisticated solutions to make Neopixel shine seem like overkill.

All other MCUs have enough horsepower, so decreasing it would provide low practical benefit.

Also, I suspect, because all of those controllers are armcm_*, they all would work fine. On exit they store the diff and as long as diff is positive everything should be fine. It seems to just make the next interrupt pending and waste cycles here, and maybe add a little lag to the next interrupt.

So, to sum up. Tune value per MCU or autotune in the Init would be “cool”. It would not solve the STM32G0 weak performance.
It is possible to maybe shave cycles by utilizing the interrupt priorities (no assembly irq_en/dis calls). But that would break the logic to allow the run of other interrupts in a busy cycle. So, suboptimal.

Hope I was able to explain myself.
- Timofey

1 Like