There is some blur line when and which code should be implemented in the MCU.
It seems to be cumbersome to implement support for every additional Accelerometer, Angle sensor, display, whatever.
There are somewhat 2 issues:
- Code bloat and limited MCU flash/ram memory
- MCU code should be as simple as possible
So, I’ve thought about this idea for a while, and it seems that basically, the addition of any other language to the existing C, Python, Jinja2 would increase complexity and toolchain even further.
I had hopes in Forth initially, but it would be too much; it is another family of languages.
It is already hard enough to ask to work with C and Python
But, there is Linux eBPF and related infrastructure, even a Clang compiler backend.
Where one can write “arbitrary” code on a subset of C.
So, the only limitation it is Linux only.
Well, ISA is open and pretty simple to some degree.
So, I went ahead and implemented a portable version:
From the host side view, one can implement basic sensor support like so:
#define timer_read_time() (BPF_CALL_N(1))
#define debug(a) (BPF_CALL_N1(2, a)
#define i2c_dev_write(a, b, c) (BPF_CALL_N3(3, a, b, c))
#define i2c_dev_read(a, b, c, d, e) (BPF_CALL_N5(4, a, b, c, d, e))
// Other shared definitions
__section("prog")
int32_t
task(struct i2cdev_s *i2c)
{
uint8_t reg_len = 1;
uint8_t reg[2] = {0x0, 0x1};
uint8_t read_len = 6;
uint8_t resp[read_len]
int ret = i2c_dev_read(i2c, reg_len, reg, read_len, resp);
// Do something, call bulk report & etc
return ret;
}
Then Python should compile that:
clang -O0 -Wall --target=bpf -c bpf.c -o bpf.elf
It is possible to debug, disassemble, and run on the host; it only requires mocking the platform calls
Then we can allocate memory on the MCU and submit the program.
There can be 2 places where it can/should be able to work
TX event hook:
- Sophisticated do something (read TMC SPI, verify, respond, for example)
- Reencoding display data: display: Add gc9a01 by Delsian · Pull Request #7105 · Klipper3d/klipper · GitHub
Task hook:
- To not compile in accelerometers or angle sensors, it is possible to implement them that way
Basically, that is it.
Limitations, caveats:
Original ISA is 64-bit, and registers are 64-bit; every pointer manipulation is 64-bit.
Right now, I somewhat lean toward simply ignoring that, because there are no alternatives or a way to ask clang to emit only 32-bit ALU pointer arithmetic.
It is possible, though, to map ALU64 to ALU32 operations on a 32-bit target, and by so reduce the register size and overall overhead.
But I guess, for initial implementation and interoperability with Linux MCU, it is better not to do that at this stage.
Thanks!
-Timofey
P.S. I think it is possible to convert ISA to 32 bit/64 bit, for byte code memory store efficiency
Because it seems that often the immediate value will be unused and thus zeroed.
opcode src_r dst_r offset immediate value assembly
0x63 1 10 0xfffc 0x00000000 STX MEM *(u32 *) (r10 + -4) = r1
0x63 2 10 0xfff8 0x00000000 STX MEM *(u32 *) (r10 + -8) = r2
0x61 10 1 0xfffc 0x00000000 LDX MEM r1 = *(u32 *) (r10 + -4)
0x61 10 3 0xfff8 0x00000000 LDX MEM r3 = *(u32 *) (r10 + -8)
0xbc 1 2 0x0000 0x00000000 ALU MOV r2 = r1
0x0c 3 2 0x0000 0x00000000 ALU ADD r2 += r3
0x04 0 2 0x0000 0x00000001 ALU ADD r2 += 1
0x04 0 1 0x0000 0xffffffff ALU ADD r1 += 4294967295
0xbc 1 4 0x0000 0x00000000 ALU MOV r4 = r1
0x67 0 4 0x0000 0x00000020 ALU64 LSH r4 <<= ((1 << 64 - 1) & 32
0xc7 0 4 0x0000 0x00000020 ALU64 ARSH r4 >>= ((1 << 64 - 1) & 32
0xbf 10 1 0x0000 0x00000000 ALU64 MOV r1 = r10
...