Your error messages are bad and you should feel bad

ValueError: Expecting property name: line 1 column 2 (char 1)

psql: FATAL: database "eliribble" does not exist

gaierror(8, 'nodename nor servname provided, or not known')

I've been developing software for a couple of decades now. That's long enough to have seen a bunch of stuff and put in my 10k hours, but not long enough to have just given up hope or construct a robot army of doom. I've been through an interesting ride with error messages and come to a place that makes me, for lack of a better word, rather evangelical

x * sin(x). You'll find me in a trough

Most engineers are initially indifferent to error messages. Then they start to figure them out. Then they like them. Usually something breaks and they realize error messages are terrible. This is often about the time they get involved with compilers. In time they give in to a form of Stockholm Syndrome

The average person dislikes error messages. This is, in general, because the average person is not insane. Conversely, as software engineers trudge the path of their careers over the decades they generally end up either going insane and freezing themselves in emotional carbonite or they become oil painting instructors on public television, channeling their rage into an endless stream of happy accidents.

Bob Ross is my spirit animal

Before pursuing a career in painting Bob Ross wrote software for the PDP-11. He entered his infamous 'black period' after being introduced to autoconf

Once upon a time error messages were printed in books. Storage was expensive, men tamed the mammoth, and an error code was literally a number used to look up in a printed table some kind of message about what went wrong. Those tables were created by a partnership between experienced software engineers and technical writers, sort of like how a car is a partnership between dinosaurs and fire.

Time passed and storage became cheaper and it became feasible to embed error messages as strings inside software. It's a great useability feature - rather than print out a cryptic number and expect users to keep a book on hand you print out the value they are looking for and keep the book internal to the program. There was much rejoicing.

More time passed. Compiled languages rose and fell, dynamic languages came and went, Linux was literally just 5 years away from taking over the desktop space and programmers started to build up error messages in code. The net effect of this was that instead of this:

Error Description
0xFA You failed to toggle switch 11B before providing a LOAD instruction

You could then get this:

0xFA: You failed to toggle switch 11B before providing a LOAD instruction

And eventually you'd get this:

Cannot process LOAD instruction: 0xFA, user failed to toggle switch 11B

Underneath it all the code was likely pretty simple.

int process_load(int* ist) {
    if(switch_state(0x11b)) {
        return 0xFA;
    }
    // more cryptic logic goes here
}

More modern programming standards then would have transformed this into

#define SWITCH_ERROR 0xFA;
const char* ERRORS[] = {
    ...,
    "You failed to toggle switch 11B before providing a LOAD instruction",
}
int process_load(int* ist) {
    if(switch_state(0x11b)) {
        return SWITCH_ERROR;
    }
    // more cryptic logic goes here
}

You'll notice the use of preprocessor directives so that other programmers get helpful information. That information is not, of course, provided to the end user, probably for DMCA reasons. We've also centralized our errors into a large table so it packs tightly into memory, since that's still a little bit limited.

The final form of error-zard's code would of course been in C++ because, y'know, true exceptions demand blood sacrifice.

#define SWITCH_ERROR 0xFA;
const char* ERRORS[] = {
    ...,
    " user failed to toggle switch 11B
}
void process_load(int* ist) {
    try {
        if(switch_state(0x11b)) {
            throw SWITCH_ERROR;
        }
    } catch(int error) {
        std::cout << "Cannot process LOAD instruction: " << error << ERRORS[error];
    }
}

This code achieves several essential updates. First, we've introduced C++ exceptions which will increase the number of CPU instructions to handle the error by 100x. Second, we've pushed the responsibility of reporting the error into our example function which makes it larger and therefore increases our engineering prestige level, almost up to 6. Finally, we've changed the structure of the error message to look more technical by turning an actual sentence into two cryptic clauses joined by a colon.

Now, the insane thing about this is that programmers, like me, go to college and we start to learn stuff. We want to sound like we know what we're doing. You know what makes you sound like a legit pro? Cryptic error messages. Why? Because everyone is doing it. And why are they doing it? Becuase it's the lowest-friction path during development.

Like watersliding with kiss

Sure, sometimes low-friction paths are fun. But sometimes there's bears at the bottom. And you're being pushed down by scary people that are old enough to know better.

Why is it low-friction? Because all you want to do is deal with the error as quickly as possible and move on to what you were doing. Errors are exceptional. You care about the main path, the golden road your users will trod in 99% of cases. That's what you are concerned about, that cool feeling when the database connects, the data is there, the probability wave collapses and your bot tweets its first sentient thought. Throw a try/catch around that sucker. Tack on what your know at your current stack frame. Move on. Fulfill your destiny.

Errors are an afterthought for most developers. That's why they suck.

Let's talk about a concrete example to drive my point home.

You're in the middle of developing. You're thinking about your data schema, your business requirements, how things will behave under load. You're listening to Infected Mushroom. You're in the groove. You're writing in Python, building some electric web mind-dynamite.

def blow_mind(request):
    user = request['user']
    detonate(user.mind)

Suddenly after an epic bass drop you realize: "What if the current request doesn't include a user"? You're in the groove, liquid epicness flows from you

def blow_mind(request):
    user = request.get('user', None)
    detonate(user.mind)

What was the first error going to be? Well, let's assume that your web framework, like many, simply catches otherwise unhandled exceptions, turn them into a 500-level HTTP response and emit the message of the error as the body of the response. Your users will then see this:

KeyError: 'user'

Good thing you changed that code! Now you've transformed the implicit assumption - that a user would always be present, into an explicit option. You've confused your coworkers, since you still actually require user to be present, you just error in an even more useless way:

AttributeError: 'NoneType' object has no attribute 'mind'

"But! I can do better! This is not my first rodeo"

Impress me.

def blow_mind(request):
    try:
        user = request['user']
        detonate(user.mind)
    except KeyError as e:
        raise APIError("Failed to get user: %s")

We're going to assume that your framework does something sane and translates an APIError into a 400-level error. Congratulations, now your server looks less down and your users know they did something wrong. What did they do wrong? Whelp, here's the message they get:

Failed to get user: KeyError: 'user'

If you are some kind of insane technology apologist your connection with the nanites in your blood will tell you that you should essentially see the colons as delimiters of stack frames that have unwound. Therefore the code was assuming that something should have been there - aw screw it this is stupid your error message sucks. Try again.

def blow_mind(request):
    try:
        user = request['user']
        detonate(user.mind)
    except KeyError:
        raise APIError("Failed to get user")

This gives you an error message that actually seems fit for humans. Almost:

Failed to get user

Oh, except that you wrote it from the perspective of the computer. It's like the machines are talking to you! Rad!

No, that's not rad, the computers are the servants, but they need to be of actual service. Did you learn nothing from the Butlerian Jihad? Let's flip the message around so that we are actually treating the human beings in this loop as our first-class citizen

def blow_mind(request):
    try:
        user = request['user']
        detonate(user.mind)
    except KeyError:
        raise APIError("You must include a 'user' in your request")

The human failed to include a 'user'. This is infinitely better than what you had. You could stop here. That would make you approximately average. Average messages suck.

There's several problems here still! First off, I'm not even certain I'm catching the right error! What if my detonate function emits a KeyError for some reason? Maybe it doesn't explicitly but something it relies on does. KeyError is super common in Python. That would mean that a KeyError that originally meant that a column from your database that somehow got deleted now looks, to the user, like they failed to include a parameter. All because you weren't careful with your instruction scoping. Let's fix that:

def blow_mind(request):
    try:
        user = request['user']
    except KeyError:
        raise APIError("You must include a 'user' in your request")
    detonate(user.mind)

Now we're safer. A bit. We won't swallow messages from detonate and obscure their real meaning. Some of you may have picked up on the fact that I'm not even looking at the error's contents, just its type. That's a good point. But completely irrelevant to your user.

Wait, let's dwell on that. Our user. Let's think more about our user. Who are they? What are they doing? My error messages are actually part of the interface, part of the user experience.

Blessed is the maker. Blessed is his coming. Blessed are his errors. Blessed are his messages

The tablet of lightning is +3 to damage and extra effective against developers who only think of themselves when coding

In our example the user is someone much like you who is developing against your web API. Either that or someone using a browser. Since they screwed up supplying a required parameter they're a developer either way. They may be internal to your company, they may be external. What are some of the ways they might have gone wrong? Well, I'm going to assume that the user parameter is not any sort of session-based thing. It's just a regular parameter. And they either forgot it or they encoded it in a way our application doesn't understand. See how fun this is!?

Well, now that we've identified that we can start to think in more complex terms. A good application should tell the user when they've encoded something unreadable. That's a separate branching issue. Okay, so we'll assume their encoding is good but they just failed to supply the data. Let's stop treating this like an exception then and more like a feature

def blow_mind(request):
    if 'user' not in request:
        raise APIError("You must include a 'user' in your request")
    user = request['user']
    detonate(user.mind)

The user sees the same error, but now we are making it clear to other developers who read our code that 1) this is not an exceptional case, this is a feature of our application and 2) we aren't ignoring useful exception information because we don't have an exception here at all. You might think at this point that I'm saying we shouldn't use Python's ethos of 'Beg forgiveness rather than ask permission' or, as I like to put it 'Just use exceptions, it makes the code more exciting'. I'm not. That's the right tool in the right situation. This is not that situation. I have reasons, promise

Let's add another principle. We've thought of our user, we are validating their input. We can be more helpful. Let's help them achieve their goals by offering information they may not have known they want

def blow_mind(request):
    if 'user' not in request:
        raise APIError("Your request did not include the 'user' parameter. This must be provided to this endpoint. It should be a UUID")
    user = request['user']
    detonate(user.mind)

The message is even better now. We're anticipating their next problem - I provided a user, but it was the wrong data type - and helping them to avoid it. This is the kind of thing that sets apart acceptable error messages from excellent ones.

What if we had another parameter? Like whether or not to include the slice of lemon around the gold brick. And what if we had to validate it too...let's do that.

def blow_mind(request):
    if 'user' not in request:
        raise APIError("Your request did not include the 'user' parameter. This must be provided to this endpoint. It should be a UUID")
    if 'lemon' not in request:
        raise APIError("Your request did not include the 'lemon' parameter. This must be provided to this endpoint. It should be a boolean value")
    user = request['user']
    detonate(user.mind)

Ug, copy-pasting that gave me hives. Plus, I've added another problem. What if the user includes neither parameter? What do they see?:

...blah blah 'user' parameter...

But that's the wrong error message. They failed on two points. They need to supply both the user and the lemon parameter. We only complained about one. Because of how we raise errors. And how we handle errors.

What would the user like to see? Well, a list of all the ways they went wrong:

{
    "errors": [
        "Your request did not include the 'user' parameter. This must be provided to this endpoint. It should be a UUID",
        "Your request did not include the 'lemon' parameter. This must be provided to this endpoint. It should be a boolean value"
    ]
}

We could do that, right? Sure

def blow_mind(request):
    errors = []
    for arg, _type in [('user', 'UUID', ('lemon', 'boolean')]:
        if arg not in request:
            errors.append("Your request did not include the '{}' parameter. This must be provided to this endpoint. It should be a {}".format(arg, _type))
    if errors:
        raise APIError(errors)
    user = request['user']
    detonate(user.mind)

You'll need a translation from the errors that go into the APIError constructor into the JSON content, but you can handle that.

So, great, at this point we are gathering up our errors and sending them down as a package. Much better experience. But our code us ugly. Crazy ugly. We're doing 6 lines of validation for 2 lines of work. Let's clean it up a bit

def validate_request(request, signature):
    errors = []
    for arg, _type in signature:
        if arg not in request:
            errors.append("Your request did not include the '{}' parameter. This must be provided to this endpoint. It should be a {}".format(arg, _type))
    if errors:
        raise APIError(errors)

def blow_mind(request):
    validate_request(request, [('user', 'UUID', ('lemon', 'boolean')])
    user = request['user']
    detonate(user.mind)

Awwww, yeah. Much better. And we can reuse that function validate_request. In fact, at Authentise we use that pattern on steriods in order to do dynamic RAML generation when our API gets OPTIONS requests. You can get as elaborate as you like - validation of inputs and good error messages about them is a feature - and build structures according to the patterns in your language of choice.

At this point you may realize that I've somewhat strayed from the original topic. Weren't we talking about exceptions? No, we weren't, actually, I was talking about error messages. The translation from exceptions to error messages is a technical decision and often a bad one. There are legitimate times you should do it, but by far the bulk of your validation logic and your error messages should be treated as application features, not as ways of dealing with linguistic constructs. What are some of those legitimate times?

  1. Infrastructure issues. Our database literally exploded. Circuits melted. The Internet gained sentience and transcended reality. My toddler ate the Arduino. etc.
  2. There is no 2

Running out of memory? Infrastructure issue. No connectivity? Infrastructure issue. Disk drive gone? Infrastructure.

All user input ever? Not an infrastructure issue. Plan for it, build your application around it, give them decent instructions on how to fix it in the error message. If that involves multiple layers of exception throwing and catching, that's fine, that's an implementation detail that I and your users don't care about. What they should see, however, should never be a weakly interpreted redigestion of an error your operating system spit out. If so, you failed. What's the difference between:

ConnectionError: Cannot store results: Connection to database failed

and:

Sorry, our application can't connect to the database right now. You may want to try again in a moment or two to see if our automated systems can bring it back. If not, please check our status page (status.authentise.com) or contact technical support (1-800-SUP-PORT)

About 5 parsecs of happy customers and 4.5 angstroms of code. Behold

def main():
    save_record(sys.stdin)

def save_record(data):
    try:
        prepare_data(data)
        store(data)
    except Exception as e:
        raise Exception("Cannot store results: {}".format(e))

VS

def main():
    try:
        save_record(sys.stdin)
    except ConnectionError:
        print(
            "Sorry, our application can't connect to the database right now. "
            "You may want to try again in a moment or two to see if our automated "
            "systems can bring it back. If not, please check our status page "
            "(status.authentise.com) or contact technical support (1-800-SUP-PORT)"
        )

def save_record(data):
    prepare_data(data)
    try:
        store(data)
    except ConnectionError as e:
        logging.getLogger('database').exception("Cannot store results", e)
        raise

I'm still capturing all of the exception information (using Python's awesome logging system) so I haven't lost debuggability. I'm just giving my customers a modicum of consideration by providing them with information that may help them to make the correct next decision when it comes to interacting with me and my company. The actual form this takes will depend on your application. A web page may have a generic error page that gets a redirect any time a 500-level error occurs. A console application may write to stderr. A static library may raise a subclass of a library-specific exception class that contains the message as a property on the exception based on standard practice for a language. There's dozens of ways to provide the information, and thousands of ways to structure your code to fulfill that requirement.

My point is, good error messages, messages your users get real information out of, that tell them how they can succeed, are an implicit requirement in all software. That takes additional effort, additional consideration and a small obsession with understanding your user and what they need.

It comes down to working a bit harder to provide a bit more value.