You should run at lot more samples and a lot more data if you want any sort of useful numbers. 10000 for the latency test seems to yield consistent results, less than that and the run variation is huge.

Those are both running USB (STM32G0B1 and STM32H743).
Similarly, here’s using 10k for data (noticed how you use 500 for the rp2040 and 1000 for the others, which gives further misleading results):

That’s still way slower than it should be however, as native USB isn’t bandwidth limited (baud rate is ignored). I’m guessing it’s severely bottlenecked by host side code (the klipper processes are using 100% CPU on my Pi4 test machine while running these tests), at least when compared to the results posted here: Benchmarks - Klipper documentation (klipper3d.org).
I tried running that, but the test commands didn’t seem to do anything. Suppressing the analog_in_state reports worked fine though, not sure what i’m doing wrong.