For what it is worth, I attempted the second option in the experimental mechaduino code ( Mechaduino experiment ). That is, I translated the step pulses into a “virtual position” on the micro-controller and then used a fixed frequency loop in the micro-controller to set the actual stepper position based on that “virtual position”.
In the more general case, I think the most important question is - what is the desired response time? Anything under 50ms and the logic likely needs to be written in C code on the micro-controller. A response time between 50-100ms may be possible to implement in C code on the host (eg, the “trsync” code used during “multi-mcu homing” is implemented as a mix of C code on the host and micro-controller - it maintains a maximum of 25ms latency for overshoot). For response times above 100ms there is a good chance regular host Python code can be used.
The lookahead queue currently tries to stay 2 seconds ahead. It should be possible to reduce that - 1 second should be fine, and 500ms is likely achievable with a reasonable host computer (eg, rpi3 or faster).
So, there are options depending on the required response time and the amount of development effort one is willing to invest.
Cheers,
-Kevin