Improved MCU connection error messages

Browsing this forum, I’ve come across many people asking for help due to the classic mcu: Unable to connect error. While the error is usually a quick fix, new users usually see the error, don’t know what to do, and ask here (I acknowledge that there are a few cases where the user needed help flashing Klipper firmware). A quick look at the klippy.log can diagnose this, but all the new user sees is the error on Mainsail/Fluidd.

In an attempt to help new users, I added a bit more description to the mcu: Unable to connect error for USB/UART devices (I don’t have CAN so I can’t test that unfortunately). Instead of simply logging the serial error to the klippy.log, this modification parses the error, adds common easy fixes, and reports it to the user. This also skips the 90s timer, and in my experience, reports the error in as little as 8s.

Tested on an SKR Mini E3 V3 and an SKR Pico.

Github


The error seen in klippy.log:

mcu 'mcu': Unable to open serial port: [Errno 2] could not open port /dev/serial/by-id/dummy-serial-port: [Errno 2] No such file or directory: '/dev/serial/by-id/dummy-serial-port'

The new error shown to the user:


The error shown in klippy.log:

mcu 'duplicate': Unable to open serial port: [Errno 11] Could not exclusively lock port /dev/serial/by-id/usb-Klipper_rp2040_504450612096241C-if00: [Errno 11] Resource temporarily unavailable


I’d really appreciate any feedback on the concept.

6 Likes

Looks like it would be helpful and should be a useful pointer to people who can read - I’m always surprised at how low this number is.

Now, when I tried to click on your GitHub link I got a 404 Error. Doing a quick check, I believe the code is in `work-stm32h7init-20250421’?

edit

Fine here for the first link. Second link, error 404.

1 Like

Nice work. Actually, the messages typically seen when the board is not correctly flashed could use some improvement as well. While they are technically correct, users often don’t know what direction they should take. See Mcu: Serial connection closed / Timeout on connect / Wait for identify_response

Same here. The branch is feat-improved-error-messages

1 Like

Thank you all for your feedback!

Sorry. I updated the link in my original post.

Thanks for the link. I’ll have to take a look.

1 Like

Based on that article, I added a more descriptive error message to the “Serial connection closed” error (takes about 12s before error shows). It recommends the user to verify Klipper was flashed, and their USB/CAN connection.


I also added a more descriptive message to the “Lost communication with MCU” error. This also recommends the user to check the USB/CAN connection.


I also played around with the idea of creating a clickable link to the relevant KB topic in the error message. I’m currently trying to modify Moonraker so that <a> tags don’t get HTML-serialized, but I’m unable to find where it’s serialized. I’m thinking putting the link in plaintext may suffice?

I guess the real challenge here is to find a proper solution for the individual error messages, as each can have multiple causes. The rather technical [Errno 2] No such file or directory: is always correct but not particularly helpful for the average user.

Perhaps something like this would be more useful:

The specified serial path for MCU ‘mcu’ does not exist. Ensure the MCU is correctly flashed and connected. Verify with “ls /dev/serial/by-id/*”.

1 Like

Shouldn’t the real challenge be finding the signature of each of the different causes of lost MCU connection and simply create a suggested corrective action for that particular issue.

Yes, I know that’s easier said than done but it should be reasonably easy to set up a database with the different reported MCU connection errors along with the error information available to Klipper along with the solution. I’m not sure if the klippy.log of the problem system contains all the necessary information, but it would be somewhere to start.

The basic response to the issue is probably check/replace the cable to the MCU but I suspect that there will be some interesting information at the next level down (ie wires are okay).

Most of the error messages only describe the symptom and likely cannot provide the underlying cause, as Klipper has no way of knowing why it is happening since the issue often is outside its control.

Take the above example:

  • Typo in the serial path?
  • MCU not connected via USB?
  • Wrong tty on an USART connection?
  • Not correctly flashed?
  • Issue on the Linux USB level?
  • Other Linux issue, like the infamous Debian UDEV bug?

Same is absolutely true for Timeout with MCU / Lost communication with MCU.

  • Did the MCU crash?
  • Someone pulled the plug?
  • A wire broke?

If it were that easy, all the KB articles would be pretty much surplus (not that I minded - on the contrary).

That does hold for most Klipper errors, but the “mcu: Unable to connect” error in particular does log the serial error reported by the system. The way I modified the code, it will look for either an [Errno 2] No such file or directory, or a [Errno 11] Resource temporarily unavailable message. That helps narrow it down for the unable to connect errors.

For other errors that are hard to diagnose even with the klippy.log, so far I’m unable to find a way to reliably get to the root cause of the issue, which lines up with what you said.

The modified “mcu: Unable to connect” command does (indirectly) use the klippy.log. Since the log is itself generated by Klippy, it’s fairly simple to identify the error before it gets logged, determine if there’s a common solution/recommendation, and present it to the user.


As I was typing this, my print errored :smile:

1 Like

The error message “Lost communication with MCU” is such an example, where I’m not sure if this description is falling short:

  • The MCU could have crashed, which automatically leads to the question: why?
  • There are reports here where, for example, EMI effects have caused this error.
  • When the error occurs under high load, it could indicate issues with the power supply, i.e., it might be too weak.
  • It could be a wiring issue.
  • It could be that third USB devices are doing something strange on the bus.
2 Likes

All good points. Considering there are plenty of potential causes and solutions for each error, do you think putting a link to a KB article could be useful?

I’m thinking there could be a brief description of the error (human readable), and a link to the relevant KB article.

I did try inserting <a> tags into the error message from Klippy, but it looks like Moonraker serializes it before sending it to the frontends. The above screenshot was done with an HTML edit. I did try modifying Moonraker yesterday to let the Klipper error message (state_message) go through without being serialized, but ended up getting lost in the codebase (maybe @Arksine can help?)

For most of those examples there are unique failure signatures for each failure.

There could be intermittent breakdowns in communications which are in the klippy.log. Problems could take place at specific toolhead locations. If the issue is caused by high power supply loads, then the operating heaters, steppers and anything else should be identified.

I’m sure that not all of the causes will have unique signatures but I would think that enough would that it would make identifying the corrective action easier.

Generally, all such considerations would need @koconnor’s guidance and approval. Although it is the “official” Klipper forum, I would not recommend linking to a forum from the main application. I think this would then require moving all the relevant articles into the main documentation.

There are also other considerations, such as:

  • Maintaining a permanent connection to the internet
  • Rather incorporating such functionality into Moonraker/Web interfaces to intercept errors and provide useful guidance
  • And so on

Don’t get me wrong: I very much appreciate improving the error messages and providing more useful information to the users. I just think it’s not an easy task, and I’d personally prefer to provide just the error rather than oversimplified or potentially misleading information.

Thanks for working on this. Alas, due to time constraints, I’m not sure I can give detailed feedback right now.

At a high-level I agree it would be good if we can improve the error messages. I would also be leery about adding html links to the error messages - it could become a long-term maintenance burden. So, a change like that I think we would need to widely discuss with everyone that will be maintaining the documentation. We also have to be aware that many people run modified code, and I’m unsure any single html page would be appropriate.

Cheers,
-Kevin

1 Like

Thank you for your input. I understand your concerns about inserting HTML links into error messages. For now, I’ll stick to my current approach of adding a bit more information to existing errors.

Do you think it would be best if I changed the errors to only show more details, and not to provide suggestions?

Examples:

Serial port not found: ‘/dev/serial/by-id/’. Are you sure this MCU has the correct serial port, is plugged in, and powered?

Serial port already in use: ‘/dev/serial/by-id/’. Are you sure this serial port is not in use by another MCU or program?

Serial connection closed. Ensure Klipper firmware is properly flashed to your MCU, and your USB/CAN connection is secure.

3 Likes

My vote is to provide a meaningful hint when the error is sufficiently clear, like [Errno 2] No such file or directory:

The message Serial port already in use also seems to be pretty unambiguous. To provide meaningful guidance, we should define the circumstances under which this error occurs. On a cursory look, it likely only appears when two Klipper instances are trying to access the same serial path, or?

In cases where the root cause is rather unclear or ambiguous, I’d stick to the error alone.

1 Like

5 posts were merged into an existing topic: Lost communications with MCU