Anonymous usage statistics

I would find it incredibly intriguing to see which configurations are used to operate Klipper and approximately how many machines are out there.

To this end, an anonymous statistics collection could be setup that generates a unique ID string for each printer and collects general stats like:

  • Klipper / Moonraker / fluidd / Mainsail / etc. version
  • Type of the MCUs
  • MCU count
  • MCU connection method
  • OS version
  • Max acceleration and velocity
  • Maybe GeoIP?
  • anything else?

Of course, without any information that can be traced back (IP, etc.) or other user-related information like Linux usernames, etc.
And of course, with the option to opt out if you do not want such information to be shared.

The collected information shall be publicly available in the form of aggregated statistics.

@Arksine, I’d guess it would rather be the domain of Moonraker instead of Klipper?

6 Likes

Would this be visible for all users here or where would you make that data available?
I second that idea!

Are there any examples considered?

I personally like the way Nabu Casa is doing it with Home Assistant

foosel / Gina is also collecting some analytics if you opt in during first run setup but afaik it is not available to the public.

We have this with OctoPrint for some time now:

2 Likes

Which marketing firm are you working with @Sineos? Trying to collect all my datas!

Jokes aside, This would be very useful in judging useful improvements to future versions of Klipper.

As much as I was just joking about “metadata” even things like number of fans defined can help, for example if we see a rise in the increase of fan usage or certain types of fans.

Or certain an abundance of a certain type of MCU might warrant going back and making sure that mcu implementation is solid and utilizes it fully.

Dual extruder setups, kinematic types, print volume, etc.

Maybe even go as far as report gcode macro names so if it’s seen that a large amount of people are using a gcode macro someone put together it might be something that needs implementing in code (where applicable and doable)

1 Like

Very nice example, @EddyMI3D. Exactly the direction and representation I had in mind. The only thing I’d change, is probably have the top 5 to top 10 in the graphs and summarize the rest. The granularity seems a bit overwhelming.

Exactly. Besides quenching my thirst for your data on my way to world domination, this is what I was thinking about.

Thus, trends like the shift from 8bit boards to 32bit boards (ok, this is over I think) or from USB to CAN etc. could be seen in this data.

What kind of technical data to collect would need some careful consideration. Still, privacy should remain a top priority.

2 Likes

If that is guaranteed, I second it.

This brings up an interesting question.

Besides possibly your IP Address, or with some linux voodoo, trying to get your root password on your Pi…

Does anyone keep confidential information in their printer configuration?

Now… Since I’m me… One thing I did thing of which would be more funny to me than anything (not recommending we do it obviously)…

Pull the print history too and see if anyone has printed any “questionable” 3d models.
That’s the only thing I can think of that might reveal something you don’t want others to see.

Moonraker could collect telemetry for both itself and Klipper. If we were to add this functionality I think it would have to be opt-in, I don’t want to collect data from anyone unaware. The opt-in could be facilitated through the configuration. Granularity might be worth considering as well.

I do think that any data collected should be publicly available.

5 Likes

I agree that obtaining overall usage statistics would be useful to enhance development and overall project management. I’ve thought about it as well.

One challenge is that I think we would need to be transparent on what data is collected and why. I think we would need to be careful not to collect any data that could be sensitive.

I fear an explicit “opt-in” system could limit the utility, as I fear the set of users that explicitly opt-in may not be a good representation of the overall user base - thus skewing the gathered results. One thing I was thinking about would be to combine some kind of “future central plugin server” with “overall stats gathering”. Basically, some kind of two-way info exchange with the central server (simple usage statistics sent up, and available plugins/versions sent back). That may make it more likely that gathered statistics are more representative of the overall usage. Not sure.

If we publicly provide the statistics, I think we’d want a second layer of anonomization on the server side before making the data public. Just to be extra careful that collected data isn’t used maliciously.

Just some high-level ideas.

Cheers,
-Kevin

Are multiple opt-in tiers already considered?
For example:

  • basic: just all version (host os-release, python runtime, moonraker, klipper, mcu fw etc. Versions)
  • advanced: basic + moonraker.cfg update manager enabled reps and printer.cfg configured features

From my (technocratic) point of view, there are 3, maybe 5, pieces of information that should not be collected, respectively anonymized already on server / database level:

  • the external IP address
  • Linux usernames in paths
  • Print history, i.e. gcode filenames
  • (GeoIP: I would be curious about the distribution of Klipper around the globe, but it doesn’t serve any technical purpose, so it can easily be omitted.)
  • (Moonraker’s update manager settings: Not really sensitive, so TBC)

All the rest has a privacy value of a published cfg in Klipper’s GH. Even a klippy.log / moonraker.log, which is posted here, potentially contains more “sensitive” information as it includes information from the above list.

Being transparent with what information is collected is key. I like the products that allow me to review my information, e.g. by allowing to export what is transmitted.

Following the above logic, I agree with Kevin that it should be an “opt-out” for the few that might even disagree with this.

2 Likes

This I agree with, I see no value in collecting “user specific data” outside of their printer configuration.

I do think we need to do the opposite of what Sineos just did though and if this is implemented have a webpage with EXACTLY what data is collected.

Honestly though, There might be things I’m not considering but most information that I can think of that would be useful for development would be contained in the .cfg files. (barring Mainsail.cfg/Moonraker.conf/Fluidd.cfg etc. Only the ones that deal with Klipper).

Because at the top of the printer.cfg you have to add

[include mainsail.cfg]
or
[include fluidd.cfg]

So that tells us what interface they use.
(I don’t use fluidd but I believe that’s correct, dual interface might be different too)

Everything else should be defined in the .cfg files that Klipper uses and most, if not all, of that doesn’t have any identifying information barring some ridiculous level of digital forensics (I’m sure people have a “style” when making gcode macros or how they order their configuration, use of spaces, spelling/style of prefix naming etc.)

Granted there might be some use in knowing what type of things people print to see if they have commonalities in features to drill down on in future improvements. As in lots of zig zags, a lot of sharp corners, anything that is “out of the norm” of a straight line move that might make you stop and think “Hmm, I didn’t realize this came up THAT often, maybe I should go back and revisit that implementation”.

But I don’t see how we could gather that and keep it private and anonymized, that’d also require dumping the moves from the move queue constantly and would be a huge amount of data. So probably not a good idea in general.

The other factor is, finding a way to let people “opt in” and submit while staying anonymous. As in, the mere fact of uploading requires identification in a myriad of ways even assuming there is no login or anything required. Communications come FROM an IP address. Plus have to make sure that if we leave the opt-in path “open” it isn’t maliciously abused by bad actors.

So… Like anything, this would have to be rigorously defined and the different scenarios considered.

Question 1 in my mind would be… What information are we looking for EXACTLY?

Since klipper mentions “at least 100 users should be interested”, even knowing how many are already using a feature which is not yet introduced into mainline would help a lot.

I’m in for mandatory collection of printer specs and modules loaded.

Macros however should be regarded as sensitive information.

The collected data could be made public, and the source code of the collection module should be concise and linked directly from the main documentation. People should not need an effort to find out what is done.

A GCODE command should also be provided to print out the payload for the printer.

I’m curious as to why you say that. I’m sure there are things I’m not thinking of, so made me curious.

I could see the desire if you were a company and had what you considered proprietary technology either in the macro or connected to the printer and uses the macro for interfacing.

However, this would mean that the user of this technology (customer or developer) would not be allowed to seek support from this and other groups as the macros would be exposed in the klippy.log. Of course, Klipper could be modified to prevent showing the macro, but then the user couldn’t get support because they were running a modified version of Klipper…

Again, I can see the desire for wanting this but I don’t see it being very practical with the way Klipper works.

The best course of action would be to protect the macro through copyrights, registered marks or patents.

Think about Klipper as Python, macros as the Python scripts and printed parts as the work done by the Python scripts.

The printer configuration doesn’t involve secrecy, it’s something that is obtained by reading the manual and maybe doing a couple of tests everyone would be able to do (in the context of people knowing 3D printing).
In the IP world we would say it’s not a creative, inventive activity.

The macros however involve coding (unless people copy paste them but that’s only shifting the issue to someone else). Also, some print farm may optimise them to reduce the filament waste, or shorten time, and in both cases it’s something that may keas to a competitive advantage over other print farms.

Macros are copyrighted and private stuff.
Don’t send them anywhere or sooner or later a lawyer will kill the project with damages and lawsuit. GPL may not protect in this regard, even if in theory installing klipper involves accepting the licence and all the features it provides (since source code is open).

I’m not an expert on litigation but for sure macros are private stuff which must not leave the machine.

People publishing the log files explicitly decide to publish the macros, the issue is non existent in this regard.

So, what happens in your hypothetical print farm when there is a problem?

If they remove the macro then they will have to reset the klippy.log which may or may not eliminate the problem but they may have to run their machines without their enhancements for some length of time before they can put a request here.

I’m not trying to be argumentative and I know where you’re coming from but Klipper isn’t designed to allow what you’re asking for.

I would say as Klipper is GPL v3, so it is his config and GPL v3 does not allow any combination with copyrighted material, especially not if we are talking about an intrinsic part of the software and a macro surely is.

In any case, there has to be an opt-out possibility and the super-secret print-farm can make use of it if wanted.