Printer going unresponsive mid print without any errors

Basic Information:

Printer Model: Modified Ender7
MCU / Printerboard: Manta M5P
Host / SBC: BTT CB2
klippy.logklippy-8.log (1.2 MB)

Recently upgraded my machine mainboard and host to a BTT manta M5P with a CB1. During calibration print the machine will stop moving, disabling the steppers, WebUI becomes unresponsive and when I restart the machine the log doesn’t show any errors. I have attached the log and I might have missed something.

edit1:
I thought it might have been something to do with the wifi so I switched to wired connection but the problem persists. I was able to observe a couple of times that when it stop with the following warning “Moonraker can’t connect to Klipper! Please check if the Klipper service is running and klippy_uds_address is correctly configured in the moonraker.conf.” At that point I can issue a firmware restart and it will work again but majority of times it will go completely unresponsive: no web ui and can’t ping. here is the latest log in case that helps, I can’t find any obvious errors or warnings.

klippy-17.log (7.1 MB)

edit2:
Connected a screen to the board and turns out it is a kernel panic… the board is running the latest image from btt and kllipper installed through KIAU

edit3:
It is something to do with BTT workaround to keep the wifi module from going to sleep, I switched to the Armbian image and it seem to be fixed but I need to run more tests.

First of all I am not a specialist in either klipper or linux. What I will tell you is the logic as a result of my troubleshooting past and I hope it will give you, or others, some push in the right direction. I apologise if you are aware of what is that kernel panic but I thought I had to mention it.
First of all a kernel panic is a boot problem of linux or unix. Like described here:

So this is even before you can print, however you managed to get past this sometimes, which means it is an intermittent problem.
Secondly, when it starts doing something and it simply goes unresponsive, you do not see anything wrong in klippy. I think because the problem isn’t with klipper at all, it is more likely to be either the pi pcb with some dry solder joint which sometimes happens to work or not, or other hardware like the MCU or the wiring.
You stated that you had just upgraded to a new board. So I would start looking at the wiring. Disconnect one side and measure them through while wriggling the wires. If that looks ok, including if they are truly in their correct spots, inspect the solder joints of the connectors on both sides of the wires. On the pi as well as on the MCU.
If you are fortunate enough to have another pi at hand, swap it out and see.
If all this fails you do have the option to go back to your previous setup and see if the problem persists. If it does not you can safely say the pi is ok, but if it does you might have rough handled the pi and broke a connection somewhere.
That is it from me I think. I hope you will get to the bottom somehow because this sounds like a stinker to have to deal with.
My brother, who is a red hat engineer, told me that a kernel panic is not something that an ordinary user should ever experience. It is highly unusual for a user to see this problem. Most of the time it is people who develop or who are system admins and who experiment with things. So that already tells you that the cause of this is also not likely to be something ordinary. If you are versed with linux, you can try to find the dump file that the kernel panic should have made and see if it tells you where to look.
Good luck and let us know how you get on.

Thank you very much for the link, I am familiar with linux system and what is a kernel panic even though this is the first time I am dealing with one.

Secondly, when it starts doing something and it simply goes unresponsive, you do not see anything wrong in klippy. I think because the problem isn’t with klipper at all, it is more likely to be either the pi pcb with some dry solder joint which sometimes happens to work or not, or other hardware like the MCU or the wiring.

You are right, the reason for complete lack of error in the logs is that the system would completely freeze, I was able to validate it by connecting a monitor and keyboard directly to the manta board. It would show an error message the moment the machine would stop and then be completely unresponsive requiring a reboot. I did some digging and it seems to have something to do with what BigTreeTech does to stop the wifi module on the CB1 from going to sleep.

I ended up going for an Arabian release of the OS image instead of the one provided by BTT and it seemed to resolve the issue, although I only had an hour to test it last night.

1 Like

Now you are forced to learn Arab?
:joy: :rofl:
I am happy you got somewhere, let’s hope it stays that way. Thumbs up.

Damn autocorretc :rofl: :rofl:

I meant Armbian.

fingers crossed the fix works

1 Like

Hello, greetings my dears…
a tip perhaps → Sonar - A WiFi Keepalive daemon
I think this came with kiaul

live long and prosper

Might want to send that suggestion to BigTreeTech

Thank you all

and…
Captura de tela de 2024-09-10 04-56-16

I have been doing some more testing and the printer is freezing at random points during the print. The machine stops and the WEBUI becomes unresponsive but I can still ssh into the board. I believe it is something to to with the mainboard and not with the MCU as when I check DMESG I get the following error.

[  842.863964] systemd-journald[510]: /run/log/journal/68b99407c7f54dde9a6a50e2f5700710/system.journal: Journal header limits reached or header out-of-date, rotating.
[ 5217.628930] Unable to handle kernel paging request at virtual address ffff8000832ef978
[ 5217.628966] Mem abort info:
[ 5217.628973]   ESR = 0x0000000096000007
[ 5217.628982]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 5217.628994]   SET = 0, FnV = 0
[ 5217.629003]   EA = 0, S1PTW = 0
[ 5217.629011]   FSC = 0x07: level 3 translation fault
[ 5217.629021] Data abort info:
[ 5217.629027]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 5217.629037]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 5217.629048]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 5217.629060] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000413e7000
[ 5217.629072] [ffff8000832ef978] pgd=100000007ffff003, p4d=100000007ffff003, pud=100000007fffe003, pmd=1000000044423003, pte=0000000000000000
[ 5217.629117] Internal error: Oops: 0000000096000007 [#1] SMP
[ 5217.634722] Modules linked in: ac200_phy lz4hc lz4 zram binfmt_misc nls_iso8859_1 snd_soc_hdmi_codec snd_usb_audio snd_hwdep uvcvideo snd_usbmidi_lib snd_rawmidi polyval_ce snd_seq_device polyval_generic videobuf2_vmalloc sunxi_cir uvc 8189fs sunxi_cedrus(C) rc_core cdc_acm dw_hdmi_i2s_audio dw_hdmi_cec v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc panfrost cfg80211 gpu_sched rfkill drm_shmem_helper dump_reg display_connector cpufreq_dt fuse dm_mod sunxi_gmac sunxi_ephy sunxi_ac200

I am attaching the klippy log file in case I missed something there

klippy.log (5.2 MB)

As mentioned before: This is a kernel panic. This is usually caused by either failing hardware or serious software bugs, e.g. in drivers or kernel modules. Another possibility is corrupted initramfs, e.g. due to hard / unclean shutdowns or other unexpected events.

Between hard and impossible to debug and surely beyond the scope of this place here.

Personally, I would start with a clean Linux install (if applicable on a new and fresh SD)

Thank you very much, I have tried switching the SD and the problem seems to persist. It might be the CB1, I have a CM4 coming, hopefully that will fix it.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.