We are a couple of people trying to make a nice and clean setup on the Creality CR-10 Smart Pro, the discussion is over on Reddit. Two different errors keep plaguing the installs, a “Move out of range” error and a “Timer too close” error. We are running both Mainsail and Fluidd on the internal WiFi board (OpenWRT) and we see the same type of errors. We are using the Creality installed 8GB SD card for storage.
For the former error, (at least) one of the coordinates will be completely out of whack, for instance the error may look like this: Move out of range: 122.074 3000.000 69.984 [6018.842]
in the startup script. Once it has happened, only saving&restart of the printer.cfg file will cure it. If only a new attempt at printing is made, it will fail with the same coordinates just after probing is done. A friend of mine with an Ender 3 S1 and an RPi3 also sees the same problem on occasion. This is NOT the same as the configuration mistake typically made when forgetting to account for the probe offset when probing the bed. It will work more than 9 out of 10 times.
The “Timer too close” has happened both when uploading a file from PrusaSlicer and during printing of a Benchy.
The fact that the errors are not exclusive to the Creality WiFi card, but happens more frequently along with the random data corruption suggests to me either a threading problem or a stray pointer somewhere, but I do not have the knowledge of python to quickly find any apparent issues. The two errors may be caused by the same issue or they could be completely separate matters.
Also when you say internal wifi board, is that the same as the Creality wi-fi box? If it is, you’re not going to have a good experience running Klipper on it. That only has a single core 580Mhz mips processor with 128MB of RAM. That’s half the speed of the original Pi Zero which is already too slow to handle Klipper and a webcam at the same time.
It is the same - not running a webcam on it though. The speed shouldn’t cause commanding random coordinates, although it could explain the MCU shutdowns.
Will capture a set of logs the next time I see the problem, could be today, could be in a week.
I guess the post was first and foremost to get reports of the problem out of the woodwork, of the three people I know running Klipper all have seen it. So I think it may be underreported and possibly a hint to a larger underlying problem.
I’m actively running Klipper on 3 different printers, and I don’t have this problem on any of them. I’m also a helper on Discord where we have 12,000+ members and I’m not seeing this reported by anyone running common hardware. I think it’s more likely something specific to your hardware and/or configuration.
With respect to the hardware being slow; yes, by today’s standards it is, 25 years ago the CPU spec on the Creality WiFi Box would have been found on a mid to high-end SGI workstation. It is plenty fast enough to keep up with the demands of a single 250kbps serial port, it is just a matter of priority. And AFAICT Klipper - at least in KlipperWRT form - does not make any priorities. Everything, hard realtime, soft realtime and non-realtime runs at the same system priority. Only klipper.py is hard realtime, everything else is just a nuisance if they are delayed.
I changed /etc/init.d/klipper to invoke klipper with nice --9, just randomly picking a priority higher than default, I could then successfully print with sustained 100% CPU utilization, a load factor >4 and five dashboards. Dashboards were lagging, of course, but the main thing was that the operation of the printer itself was unaffected. Before I did the change, running find / on it while printing would cause an immediate MCU shutdown.
There are some tools you can use to analyze the Klipper log (details at Debugging - Klipper documentation ). For example, your log shows ( scripts/graphstats.py klippy-3.log -s ):
The host cpu is clearly struggling to keep up during the entire duration (blue system load average is regularly above 100%). It’s actually fairly impressive that Klipper ran as long as it did. Ultimately, it looks like you also went into memory swap (note the decreasing yellow line) which finally killed it.
FWIW, low end host cpus aren’t a target for me. I don’t know of anyone currently working on it.
Interesting graph. I will admit to being very fresh to Klipper - like 8 days in…
Given a large enough print, that available memory line could become a problem for any CPU, it could look like a memory leak. It could, of course, also be a garbage collector not having time to run… Load above 1 isn’t a problem as long as the processes waiting are soft realtime or less, IMHO there is something very wrong if a sudden spike of incoming web interface requests will cause a disruption to the hard realtime application, it should have explicit priority. That was what I attempted with my change to the startup script, and it almost worked. I just didn’t foresee the swap issue. The Benchy Gcode is only 2.5MB, so not in anyway massive.
I have a NUC running Ubuntu and a few Atlassian applications sitting next to the printer, I guess the next step is to hook that up to the printer and try again.
Given a large enough print, that available memory line could become a problem for any CPU, it could look like a memory leak. It could, of course, also be a garbage collector not having time to run… Load above 1 isn’t a problem as long as the processes waiting are soft realtime or less, IMHO there is something very wrong if a sudden spike of incoming web interface requests will cause a disruption to the hard realtime application, it should have explicit priority. That was what I attempted with my change to the startup script, and it almost worked. I just didn’t foresee the swap issue. The Benchy Gcode is only 2.5MB, so not in anyway massive.
The graph Kevin posted clearly shows your system is heavily loaded. If you run some number crunching application that doesn’t care about latency and processes getting temporarily no computing time or has to wait on disk thats okay. With software like klipper which needs to calculate and send stuff out with not too much delay, that’s not a really good idea.
You can now spend days and weeks analyzing and improving Linux scheduling, nice, ionice, cgroups, running latencytop and whatnot, compile kernels, change schedulers and options, Hz numbers, ticks, noticks, disk cache settings, vm dirty ratio and whatnot. Maybe you get it right. Maybe not because you run into one of the many Linux scheduling issues, especially with direct attached sdcard or mmc storage (i.e. heavy disk access will stall everything no matter how much you optimize everything you can just mitigate it a bit, not fix it).
I’d do some more basic checks i.e. if it’s swapping etc. but if there’s nothing obvious I wouldn’t spend the time and just use a more powerful machine. Much easier and more reliable.
I have a NUC running Ubuntu and a few Atlassian applications sitting next to the printer, I guess the next step is to hook that up to the printer and try again.
You mean running klipper on the same machine that runs Atlassian applications? Those Atlassian Java memory and CPU hogs? That sounds like asking for trouble again
enderbender, I don´t disagree that it’s heavily loaded, I’m just saying it should be able to handle it because the fraction taken by the hard realtime system is manageable. Unless the chimney on the Benchy is very compute intensive, which I don’t think it is. I could be mistaken.
But I did do a few more checks and I found that logd was using a 51200kiB memory buffer. Reduced that by a factor 10 and instantly had a lot more RAM available. Printing the bench again with exactly the same Gcode now, so we’ll see how we fare. By the looks of things, it was RAM that took us out.
Well. This is an interesting coincidence. Now I just had that out of range error you reported subwooferbone. But I have plenty of procesing power, it’s a Pi4. It also happened inside the startscript. Usually first it homes XY, then goes to the front right. It only went to the right, but not to the front. The next move then gave the out of range error.
Edit:
It happened here:
[gcode_macro START_PRINT]
gcode:
{% set BED_TEMP = params.BED_TEMP|default(70)|float %}
{% set EXTRUDER_TEMP = params.EXTRUDER_TEMP|default(220)|float %}
M140 S{BED_TEMP}; Bett Temp setzen ohne warten auf Ziel Temp
M104 S150; Extruder Temp auf 150 Grad setzen ohne warten auf Ziel Temp
G28 X Y; Home XY Achse
G1 X260 Y0 F5000; Nach ganz vorne Rechts fahren
Instead of going to the front right, it only went to the right, not to the front, then it continued heating, after heating was done, the out of range error occured.
Are you sure the printer is in absolute mode (G90) and not relative (G91)?
Issuing G1 X260 Y0 F5000 while in relative mode would do exactly what you describe, and the subsequent G1 X259.5 Y0 F6000 would result in the printer attempting to move X to 519.5 which matches your error.
Looking at your config, you have G91 in your PAUSE macro, but I don’t see a G90 anywhere to put the printer back into absolute. You might want to add that to your START_PRINT macro if it’s not explicitly added by your slicer.