My pull request broke the ar100 build

I just created my first pull request including a firmware module: https://github.com/Klipper3d/klipper/pull/6553
Now the autobuild in github broke for the ar100 platform: https://github.com/Klipper3d/klipper/actions/runs/8591422603/job/23540172318?pr=6553


2024-04-07T20:32:02.2868306Z   Linking out/klipper.elf
2024-04-07T20:32:02.2878326Z or1k-linux-musl-gcc  out/src/sched.o  out/src/command.o  out/src/basecmd.o  out/src/debugcmds.o  out/src/initial_pins.o  out/src/gpiocmds.o  out/src/stepper.o  out/src/endstop.o  out/src/trsync.o  out/src/spicmds.o  out/src/buttons.o  out/src/tmcuart.o  out/src/neopixel.o  out/src/pulse_counter.o  out/src/ads1118.o  out/src/thermocouple.o  out/src/sensor_adxl345.o  out/src/sensor_angle.o  out/src/sensor_bulk.o  out/src/ar100/main.o  out/src/ar100/gpio.o  out/src/ar100/serial.o  out/src/ar100/util.o  out/src/ar100/timer.o  out/src/generic/crc16_ccitt.o  out/src/generic/timer_irq.o out/compile_time_request.o out/lib/ar100/start.o out/lib/ar100/runtime.o -iquote out/ -iquote src -iquote out/board-generic/ -std=gnu11 -O2 -MD -Wall -Wold-style-definition -Wtype-limits -ffunction-sections -fdata-sections -fno-delete-null-pointer-checks -flto=auto -fwhole-program -fno-use-linker-plugin -ggdb3 -O3 -fno-builtin -fno-pie -ffreestanding -msfimm -mshftimm -msoft-div -msoft-mul -Ilib/ar100 -T src/ar100/ar100.ld -Wl,--gc-sections -static -Wl,--no-dynamic-linker -o out/klipper.elf
2024-04-07T20:32:02.2930824Z /home/runner/work/klipper/klipper/ci_build/or1k-linux-musl-cross/bin/../lib/gcc/or1k-linux-musl/10.2.1/../../../../or1k-linux-musl/bin/ld: /home/runner/work/klipper/klipper/ci_build/or1k-linux-musl-cross/bin/../lib/gcc/or1k-linux-musl/10.2.1/libgcc.a(__udivsi3.o): in function `__udivmodsi3_internal':
2024-04-07T20:32:02.2933297Z (.text+0x0): multiple definition of `__udivsi3'; out/lib/ar100/runtime.o:(.text.__udivsi3+0x0): first defined here
2024-04-07T20:32:02.3041627Z collect2: error: ld returned 1 exit status
2024-04-07T20:32:02.3043492Z make: *** [Makefile:73: out/klipper.elf] Error 1
2024-04-07T20:32:02.3063186Z ##[error]Process completed with exit code 2.
2024-04-07T20:32:02.3151649Z Post job cleanup.

I haven’t done any ar100 specific and all the objects in my source file src/ads1118.c are either static or collision-free prefixed. Only headers are included in the file and I am not using any function with div in its name.

Google finds a couple of issues related to __udivsi3, but neither seems related to my file. In some reports the recommendation is to use shifts instead of the divides, but I use only standard C shifts. So, I have no idea.

Can someone please have a look at this?

I didn’t try to figure out your c file, but I suspect you have an error in there that’s causing a build error.

The ar100 just happens to be at the top of the list in the src directory so is the first that attempts to build during the checks. So the error isn’t ar100 specific, it’s the the first one that broke.

I’d double check your ad1118.c file for any errors and try to do a local build first and see if you get any errors.

Without any relevance to the build issue:

There have been many discussions about implementing ADCs here Strain Gauge/Load Cell based Endstops - #412 by garethky now culminating in a PR by @garethky.

  1. Not sure if there are any synergies between your ADS1118 and the ADS1120
  2. This introduces yet another temp sensor nomenclature. Would it make sense to rather follow the MAXxxxxx temperature sensors scheme?

It’s not clear to me why this causes linking problems (I suspect that’s a bug in the build process), but by eliminating function bodies until compilation worked I was able to track down the culprit:

    uint8_t next_sensor = (current_sensor + 1) % spi->sensor_count;

CI for Hmm, that unexpectedly worked. Maybe this modulus? · flowerysong/klipper@01c0e3e · GitHub still failed, but with a Python issue much later in the process.

See Bulk ADC Sensors by garethky · Pull Request #6555 · Klipper3d/klipper · GitHub for a solution

Just adding a link to my repo where I implemented support for the ADS1118 with multiplex support. GitHub - dockterj/klipper: Adding support for MakerBot Replicator 2

It should be generic enough to use for purposes other than type k thermocouple support with minimal changes. I’m happy to update this to make it generically usable and get into upstream.

Just a thought here… Instead of everyone trying to push functionality for every chip in the world into Klipper… We may need to make a rule/guidelines for it before it gets out of hand.

In other words, If people need that chip to make their one off probe work for testing purposes… That’s fine and make sense but does it need to be in the klipper mainline?

Once they’ve proven that it works I’d think they’d implement it with it’s own custom control mcu (as in, not Klipper, just an mcu that does the work and provides a logic signal to Klipper).

So then it’s just like every other probe.

Klipper shouldn’t be replacing custom mcu functionality to make every possible type of probing solution a reality.

That’s just my two cents though.

Edit: Forgot to add before someone says “But open source!”, The maker can publish their schematics and files to get custom boards made. Or even make them and sell them at a small profit for users who don’t have those kinds of skills. More often than not the 3d printing stores end up making them and selling them as well. Then the bigger names make their own version.

It’s the circle of 3D Printing hardware life.

With respect to issue #2 - This breaks down with some of the printers that use this ADC - they are dual extruders and use type K thermocouples which require a separate measurement for the cold junction temperature. The ADC mux needs to be set for a certain channel then when the conversion is available the value is read out at the same time the next mux channel is set up for a conversion. This could possibly be done entirely in python, but because of the syncronization needed I implemented the mux switching in a C module.

I tried to make the configuration for this as klipper-like as I could and came up with using pins for the two different channels.

So the chip config is:
[ads1118]
sensor_pin: PE6
spi_software_miso_pin: PE7
spi_software_sclk_pin: PE2
spi_software_mosi_pin: PH2

and the extruder can reference this with:
[extruder]
sensor_type: ads1118_typek
sensor_pin: ads1118:pin_0
min_temp: 0
max_temp: 260

Just a note that the original PR and my repo add ADS1118 support for older printers that use type k thermocouples for the hotend. I’m maintaining this for Makerbot Rep2(x) printers (and any clones that can be supported) as there doesn’t seem to be enough demand to put this in upstream.

I didn’t necessarily mean this in particular, sorry for the confusion, I’ve just seen multiple threads were people were trying to add support for specific sensors and such where they would be more usable in a separate board.

I don’t see how that’s relevant? That PR had a completely different failure related to the size of the binary, not a linking problem.

Entirely possible that I overlooked something.

I think that each addition of such hardware is a big plus for tinkerers and hardware manufacturers.

Best example is @dmbutyugin’s input shaper. The functionality was implemented together with support for the ADXL345 and today a total of 3 IMUs are supported.

Also (my naive understanding) is that support for these hardware, be it IMUs, ADCs or whatever, is pretty isolated and does not impact a broader scope.

BUT, they will be the foundation for future features that build upon this.
Again a recent example is the two PRs relating to surface scanning probes.
One is supplying the hardware support for the relevant chip and the other is implementing the needed bed-mesh support.
Once merged different hardware manufacturers (even with different chips) can build upon it.

IMO it is just important that is done:

  • controlled
  • harmonized
  • following a half-way aligned philosphy without reinventing the wheel and adding too many and too different concepts

If you haven’t looked into it, then please don’t give generic hints. I already checked that and I am running the same code on STM32F04.

There are not man synergies in the low-level code. The ADS1x1x familiy consists of I2C and SPI chips which needs different code and the ADS1120 again has a different SPI protocol. Don’t ask me why the keep doing this. I already ripped out the ADS1018 code which has the protocol as the ADS1118, but a lower resolution and would probably be just dead-weight.

Haven’t looked into dockterj’s Code yet, but i seems that the idea on firmware code was convergent evolution.

Honestly, if dockterj’s implementation was in the mainline, I would have not wasted my time with redoing it. And my solution is not limited to thermocouples, if you need Karma sensors, add them in the python file with a few lines. No need to touch the low-level code.
The thermocouple code could be shared, there are probably other devices using that. In that case, refactor and extract the code.

If the policy is to not include every or most or at least common drivers used in 3d printing, then everything fragments into repositories and forks and why even bother to give something back upstream?
I have my repository with my ADS1118 implementation and FFCP2 and Artillery X3 Plus support, dockertj has his own repository with his ADS1118 implementation and Makerbot Rep2(x) support.
And people barely capable of getting MainsailOS running need instructions how to change the repo URL and get a nice warning that this is not the official repo.
Hell, many can’t even apply a patch file.

That’s completly legal C code. I am currently working on getting a standalone container for running the regression tests as I did not even find a donwloadable or10 toolchain to compile the firmware on the MainsailOS Raspi I used for development. But I won’t have time in next 1-2 days.

I know it’s legal; I said that this feels like a bug in the build process. But still, it’s very clear that it’s the source of the error, since (as I demonstrated) getting rid of this division results in the error not occurring.

That function (and similar named ones) are created automatically when gcc has to perform integer division on a chip that does not have hardware support for division. Basically, the __udivXXX functions are software division code.

The above code is likely the spot that introduces the division (as modulo on arbitrary variables is calculated via division on most chips).

I do not know why the ar100 fails to build when the software division code is introduced. We’d probably need to reach out to Elias Bakken to ask. It might be a specific “build canary test” to purposely fail the build when software division is introduced, as software division bloats the code and is rarely intended.

In general, it’s preferable to avoid division on chips that do not have hardware support for division. Is it reasonable to rework the code to avoid arbitrary division?

-Kevin

1 Like

It’s an interesting question and it seems like the obvious answer is “yes” but I would think that research into how pervasive the use of division operations are in Klipper.

Klipper is unique in terms of the number of platforms it supports so I would think that there needs to be some unique coding rules that must be in place to ensure that nothing unexpected happens when it is built on different devices. There are a number of tricks that can be put in place to avoid division operations - in the cited example, if spi->sensor_count was a power of two, the express will use a bitwise operation rather than an arithmetic one.

Sorry, I don’t know your PR acceptance/testing process but do you do a code analysis of submissions? There are a number of free tools available that could be used to look at the code base and see if banning all division operations is feasible/reasonable.

I’m making this suggestion because code analysis tools normally have division operation checks in place for detection of divide by zero security exploits. So, it maybe easier to do the research on how often division is used in the code base than it first seems.

I removed all division from the common code a couple of years back. Last I checked, though, the software i2c PR inadvertently reintroduced division. I’ve been meaning to fix that up and add a build test to catch it going forward. Alas, not sure when I’ll get a chance to look at that.

It’s difficult to catch division by code analysis, and relatively easy to catch it using a build test. The problem is that certain divisions are fine (eg, some_variable / 2) while others are not (eg, some_variable / some_other_variable), but often the compiler will reliably convert the latter into the former - making it hard to be sure.

I’d guess this implementation could do something like next_sensor = current_sensor + 1 >= spi->sensor_count ? 0 : current_sensor + 1. Which would likely be smaller code and faster on all chips. (Which is why we generally try to avoid software division as it’s common that we didn’t intend the compiler to introduce slow and bloated division code.)

Cheers,
-Kevin

2 Likes