Arbitrary width SW SPI interface

I think an ability to write/read 16/24/32/… bit registers over SPI might be useful to many sensors.
In my case, I wanted to make work YHCB2004 display that uses 9 bits SPI.

I started to implement it and the change to src/spi_software.c was straight forward:

(my first attempt to work on open source, created draft PR in my fork, hope it is a right way)

I changed data type to support up to 32 bits:

- spi_software_transfer(struct spi_software *ss, uint8_t receive_data
-                                     , uint8_t len, uint8_t *data)
+ spi_software_transfer(const struct spi_software *ss, uint8_t receive_data
+                                    ,uint8_t len, uint32_t *data)

but then to propagate this change to Python-MCU I’m not sure what is the best change to spicmds.c without breaking something (e.g. HW SPI or reduce speed due too more memory usage):

void
spidev_transfer(struct spidev_s *spi, uint8_t receive_data
                , uint8_t data_len, uint8_t *data)
{
    uint_fast8_t flags = spi->flags;
    if (!(flags & (SF_SOFTWARE|SF_HARDWARE)))
        // Not yet initialized
        return;

    if (CONFIG_HAVE_GPIO_BITBANGING && flags & SF_SOFTWARE)
        spi_software_prepare(spi->spi_software);
    else
        spi_prepare(spi->spi_config);

    if (flags & SF_HAVE_PIN)
        gpio_out_write(spi->pin, !!(flags & SF_CS_ACTIVE_HIGH));

    if (CONFIG_HAVE_GPIO_BITBANGING && flags & SF_SOFTWARE)
        spi_software_transfer(spi->spi_software, receive_data, data_len, data);
    else
        spi_transfer(spi->spi_config, receive_data, data_len, data);

    if (flags & SF_HAVE_PIN)
        gpio_out_write(spi->pin, !(flags & SF_CS_ACTIVE_HIGH));
}

any thoughts would be helpful.

(and is it right place? or put it into github issue/PR?)