Not all WS2812Bs light up under certain configurations (not a power problem)

Basic Information:

Printer Model: Ender 3 heavily modified
MCU / Printerboard: BTT Manta E3EZ
Host / SBC: Raspberry CM4 / BTT CB1
klippy.log
klippy.log (93.6 KB)

Describe your issue:


The saga is quite long, spanning over several days, but I want to spare you your time, so here’s the short version.
I wanted to use some long “Neopixels” for my printer (yes, I know, it’s a printer not a Christmas tree, but bare with an old child) using Julian Schill’s library, etc. But I’ve noticed that only a part of the string was driven.
Built a similar test setup, with different components (same board model, albeit with a CB1 instead of CM4, similar PSUs, different brand leds, etc). Wiring is ok, a separate 5V PSU is powering the LEDs at multiple points, with common ground to the 24V one.
On a bare minimal configuration, clean Klipper install, it’s working. Setting

initial_red = 0.5
initial_green = 0.0
initial_blue = 0.0

the whole strip lights red on boot. Changing the colour, e.g.


SET_LED LED=led_strip GREEN=0 RED=0 BLUE=0.502

changes the whole strip to blue. That’s the configuration from the first klippy.log

But adding two configurations makes things not working. Issuing the same as above makes only the first 40-50 LEDs to change to blue, the others stay red.

  1. I had a self-made filament motion sensor, which I don’t use for now, but the configuration remained in printer.cfg. If this is present, then the LEDs are not working, ligthing only partially:
[filament_motion_sensor filament_sensor]
pause_on_runout = True
detection_length = 1.25
extruder = extruder
switch_pin = PC5
runout_gcode = 
	M117 Runout Detected
insert_gcode = 
	M117 Insert Detected

Here’s a log with this inserted in printer.cfg: klippy_filamentsensor.log (84.9 KB)

  1. The Ender 3 had originally a Z-endstop, which I was not using because I have a BL-Touch. I thought it would be nice to use it as an “emergency button” on that connector, so I had
[gcode_button ESTOP_BUTTON]
pin = ^PC6
press_gcode = 
	{action_emergency_stop("'Emergency button pressed!'")}
	RESPOND MSG="Button pressed"

in printer.cfg. If it’s there, again, not all LEDs are working.
Again, here’s a log with this activated in printer.cfg:
klippy_estop.log (71.7 KB)

Electrical connections to the board are the usual ones:

I'm wondering why all this happens. Does anyone have an idea?

Could be a timing issue due to the long chain. The second log contains a warning that the NeoPixel update did not succeed.

You can try modifying the klippy/extras/neopixel.py file and playing with the BIT_MAX_TIME parameter, for example, setting it to .00003.
After the modification, you will need to restart the entire Klipper host with
sudo service klipper restart

1 Like

Mmm, well.
The neopixel code can be interrupted by the interrupts.
That could mess up the timings, and this is what this message is about.

Mmm, if it was a dedicated MCU or if there is no other activity (homing, probing, moving)
It should work.

The disabling of the filament sensor just decreases the number of interrupts that should be handled per unit time.

Hmmm…

114 neopixels.
1250ns per bit, ~800kHz
24 bits per chip.
114 * 24 / 800_000 = 0.00342s

Hmm… well, it could be problematic.

I think it is possible to hack the MCU code, to rebalance the scheduler code, so it will switch back to the “tasks” sooner, where the neopixel handler is executed, and that could allow it to work under load.

Anyway, right now it should work if there is no other activity on the printer.

@Sineos : In my journey I’ve seen at some point also @koconnor suggesting that, so I’ve tried it. No change.
The “Neopixel update did not succeed” message is probably at the moment I’ve issued the SET_LED command.

@nefelim4ag : As you can see, there is no activity on the printer, there is only the bare minimal defined in printer.cfg so that Klipper can boot. The only things that are active, in my opinion, are temperature readings (couldn’t start Klipper without an [extruder] section) and CAN-Bus communication.

Try disabling the gcode_button.
It runs timers much more often then the ADC.

Well, because there’s no explanation for the behavior, I don’t even know if it’s a bug or feature, the workaround was to attach a Pi Zero and let him alone handle the LEDs.

[mcu RP2040]
serial: /dev/serial/by-id/usb-Klipper_rp2040_45533065778AE48A-if00

[neopixel led_strip]
pin: RP2040:gpio12

Well, I did my best above to explain what the problem is.
The code runs, and the protocol is time sensitive.
Interrupt happens, transaction is corrupted.

It is a consequence of the design choices. Not a bug or feature. It is just not intended to actually drive ultra-long neopixels, which require mcu to basically do nothing except drive the neopixel for several milliseconds.
To make it support such long transactions here, the code should be reworked, or the scheduler hacked.
But I am, personally, not sure it is easily possible to hack the scheduler in a way that would make it work on the low-end MCUs (STM32G0B1) in a way that would be unnoticeable to other parts of the code.

On RP2040, on the other hand, you probably would experience it less often, just because it is much faster, and a situation where it would not have enough time between interrupts is less likely.

Hope that clarifies something.

That I understood. But why would a single input change that so dramatically? That’s what puzzles me.

If I understood you correctly, you do not grasp why any additional thing on the board would mess with the neopixel data transmission.

Well, basically, it is a single wire protocol. We have a max bit time and width of the zero and one pulses.
As you may notice, it is around 1.25us.
From the klippy code and neopixel.py, you could notice there is a 4us max bit time .000004s

From the Features/Benchmark page, you do know that one timer (step pulse) could be executed ~1.1 million times per second (1100k) on the STM32G0B1.

That basically means that in the best case, with one, highly optimized scenario timer would take 0.9us.
In case of the arbitrary, random timer, it depends on the environment; it would take more, and this time would be more than 1us.

So, it is a probabilistic thing, but that could happen that at the end of the pulse or between these:
There would be a button timer, for example, +>1us, then the CAN IRQ +>1us, then the ADC timer +>1us.

And we already are pretty close to the default abortion time of 4us.

Buttons are queried with a frequency of at least 500Hz (every 2 ms).
ADC is 3.33Hz, runs the 1000Hz timer for 8 times. (3 times a second, query ADC consequently 8 times with pauses for 1 ms).
CAN IRQ (I don’t really know).

In the case of your example, where you have 114 neopixels and around 24 * 114 = 2736 bits/pulses, which are spread out for 0.00342s (actually more, there are IRQs which will increase the overall time), the probabilities are high enough.

Hope that clarifies things a little.

1 Like

Wow! Thank you for your thoroughly and extensive explanation, and for your time! Wish that my teachers in school had such patience :slight_smile:
My test setup is still on the table at my office, I will try tomorrow to activate more inputs, to see how it behaves. The current strip caps at 51 neopixels, let’s see if with more timers it gets lower. Just for fun.

Made the just for fun tests. Activated in the config two thermistors and four buttons. No change.
Interestingly enough: I set the strip initial to be red. If I set it to be blue, as I said, ~51 LEDs change color. The same with setting to green. But if I try to put them in two R / G / B values

SET_LED LED=led_strip GREEN=0 RED=1 BLUE=1

(yellow, magenta, cyan) only about 40 change colours.