Well, there are 2 caveats that I can think of:
- This is a generic code; it would be overkill to reimplement neopixel for every MCU type.
- Klipper is 3d printer firmware, not the LED driver firmware

So, ultimately, someone can hack their MCU to offload work to some HW controller. That would ultimately “fix” the problem.
But every time I do think about it, and try to argue with myself. The argument is like:
“Let’s mess with architecture and add complexity, so on a 3d printer, the long neopixel string would work better!”
It sounds a bit off to me.
I can imagine how important or useful they could be. I did see the examples like:
Where it is used to indicate machines on the farm.
The generic fix that I could imagine without fancy HW accelerators:
Extend the irq_poll(), so we can disable IRQs in the time-sensitive section (I would imagine that we do have ~5us 1 bit width, and probably 40us pause between bits). Then we can call irq_poll() inside the busy wait loop, 1 timer per cycle. That would generally decrease the likelihood of timeouts and increase overall throughput, I think (as long as irq_poll() < 5us).
Similar idea, but in a different way, signal to the timer dispatch that we need control back for a short period of time before time t.
Or even more generic probably, there should be a way to tune the:
#define TIMER_MIN_TRY_TICKS timer_from_us(2), so it would not be a constant, but the sum of time needed to exit the timer code + interrupt + reenter.
If the MCU could do that in less than 2us, that would mean that the task code (neopixel in this case), would progress further more frequently and timeouts would be less lickely.
Like it is done for AVR.
I suspect it would be a bit of work to write the scheduler’s autotune or test each MCU and define a time for it.
To sum up, I’m not against changes here; they probably can be done. But I’m leery of blind use of external libraries or overcomplicating the code to simply drive LEDs.
Thanks.
