More comments != Better Code

Comments provide a valuable opportunity to increase readability in your code, but instead often endup reducing readability. In this article I will explain why I believe blindly commenting code is an anti-pattern, and instead good organization along with properly named variables and functions is the key to readable code. I will be using python for coding examples, but this applies to any language.

Novice Code

When a novice programmer first starts really getting into programing longer stretches of code, their first instinct seems to be to comment all-the-things.

All-The-Things!!!

Perhaps something like this looks familiar:

#!/usr/bin/env python3

def main():
    s = input('Number Array: ')
    na = s.split(',')
    sm = 0;
    e = '';
    for n in na:
        try:
            sm += float(n)
        except (ValueError, OverflowError):
            e = "Error parsing number"
    if e != '':
        print(e)
    else:
        print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

It’s a simple program to add a few numbers together. After realizing it might be hard to follow, the novice decides to add some comments:

#!/usr/bin/env python3

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    # Split the string
    na = s.split(',')

    # Initialize some values
    sm = 0;
    e = '';

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            sm += float(n)
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            e = "Error parsing number"
    if e != '':
        # We have an error, print it
        print(e)
    else:
        # Yay, we made it, print the sum
        print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

The exact same code, just with comments now. This may seem a bit over the top, but these style comments are what I often see junior developers proudly adding. The number of lines of code has almost doubled, and now there is a lot more dense text taking away from the actual content of the code. A person going through is almost forced to read the comments, instead of reading the code. It’s not necessarily worse, but it's not exactly better either.

Now let's say the requirements were changed. A manager requests to add functionality to the program which checks if the sum is 42, + or - 0.5, and if it is it needs to remove the 2nd number and recalculate the sum. It also should notify the user when this happens.

After adding in this requirement the novices code is:

#!/usr/bin/env python3

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    # Split the string
    na = s.split(',')

    # Initialize some values
    sm = 0;
    e = '';

    # Ok, so this is the main bit. It does the actual
    # adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            sm += float(n)
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            e = "Error parsing number"
    if e != '':
        # We have an error, print it
        print(e)
    else:
        thrsh = 42
        if sm >= thrsh-0.5 and sm <= thrsh+0.5:
            # The sum is 42.  We need to fix it.
            print("Your sum is 42! Must re-compute!")

            # remove the 2nd number from the array
            na.pop(1)

            # re-initialize
            sm = 0

            for n in na:
                try:
                    # parse and add
                    sm += float(n)
                except (ValueError, OverflowError):
                    # Something went wrong.
                    # This will happen if they put in something that isn't a number
                    e = "Error parsing number"

        # Yay, we made it, print the sum
        print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

I hope you can see how bad this code is. There are large swaths of copy-pasta (largely copy-pasted code with slight changes), tersely named variables, and about half the "code" is actually comments. Now let's clean it up this mess.

Cleaning it up

First things first, let's refactor the summing into its own function:

#!/usr/bin/env python3

def nsum(na):
    # Initialize some values
    sm = 0;
    e = '';

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            sm += float(n)
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            e = "Error parsing number"
    return sm, e

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    # Split the string
    na = s.split(',')

    # Sum the numbers
    sm, e = nsum(na)

    if e != '':
        # We have an error, print it
        print(e)
    else:
        thrsh = 42
        if sm >= thrsh-0.5 and sm <= thrsh+0.5:
            # The sum is 42. We need to fix it.
            print("Your sum is 42! Must re-compute!")

            # remove the 2nd number from the array
            na.pop(1)

            # Sum the numbers
            sm, e = nsum(na)

        # Yay, we made it, print the sum
        print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

This helps DRY-out the code a bit. Notice how it reduces the code reuse and actually improves the readability and flow of the main function.

Let’s keep going, this time with our new function’s error handling:

#!/usr/bin/env python3

class ParsingError(Exception):
    pass

def nsum(na):
    # Initialize some values
    sm = 0;

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            sm += float(n)
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            raise ParsingError("Error parsing value: {}".format(n))
    return sm

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    # Split the string
    na = s.split(',')

    # Sum the numbers
    sm = nsum(na)

    thrsh = 42
    if sm >= thrsh-0.5 and sm <= thrsh+0.5:
        # The sum is 42. We need to fix it.
        print("Your sum is 42! Must re-compute!")

        # remove the 2nd number from the array
        na.pop(1)

        # Sum the numbers
        sm = nsum(na)

    # Yay, we made it, print the sum
    print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

This change again helps improve reability by moving the job of error handling entirely into the block of code which can cause the error. Notice how the output of the nsum function is now only the sum. We no longer have to worry about hanlding the possible error outside of the nsum function.
Note: The code will now raise an error when it can't parse the input. This will cause the program to end abruptly, but this isn't any different than what was happening before, as we were just printing out the error and exiting. All we have done is deferred that functionality to python's internal error handling.

Ok, now let's do a bit of refactoring:

#!/usr/bin/env python3

class ParsingError(Exception):
    pass

def nsum(na):
    # Initialize some values
    sm = 0;

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        sm += n
    return sm

def parse_input_to_numbers(raw_input):
    # Split the string
    na = raw_input.split(',')

    numbers = []

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            numbers.append(float(n))
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            raise ParsingError("Error parsing value: {}".format(n))
    return numbers

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    numbers = parse_input_to_numbers(s)

    # Sum the numbers
    sm = nsum(numbers)

    thrsh = 42
    if sm >= thrsh-0.5 and sm <= thrsh+0.5:
        # The sum is 42. We need to fix it.
        print("Your sum is 42! Must re-compute!")

        # remove the 2nd number from the array
        numbers.pop(1)

        # Sum the numbers
        sm = nsum(numbers)

    # Yay, we made it, print the sum
    print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

These changes separate out the responsibility of parsing the input from a combination of the main function and the nsum function into a function called parse_input_to_numbers. Notice how the nsum fuction is now only responsible for summing our list of numbers?

Speaking of nsum, now that we have separated it out sufficiently, we can see that it is just duplicating the functionality of the python built-in sum. Let's use that instead:

#!/usr/bin/env python3

class ParsingError(Exception):
    pass

def parse_input_to_numbers(raw_input):
    # Split the string
    na = raw_input.split(',')

    numbers = []

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            numbers.append(float(n))
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            raise ParsingError("Error parsing value: {}".format(n))
    return numbers

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    numbers = parse_input_to_numbers(s)

    # Sum the numbers
    sm = sum(numbers)

    thrsh = 42
    if sm >= thrsh-0.5 and sm <= thrsh+0.5:
        # The sum is 42. We need to fix it.
        print("Your sum is 42! Must re-compute!")

        # remove the 2nd number from the array
        numbers.pop(1)

        # Sum the numbers
        sm = sum(numbers)

    # Yay, we made it, print the sum
    print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

Whenever possible, use built-in functions and librarys. They generally are extensively tested and stable, and there is no need to do the extra work of re-inventing the wheel. A simple google search can endup saving hours or days worth of work.

Refactoring out the logic for checking if we are near 42 brings us to:

#!/usr/bin/env python3

class ParsingError(Exception):
    pass

def is_sum_close_to_threshold(numbers, threshold=42, delta=0.5):
    numbers_sum = sum(numbers)
    return abs(numbers_sum - threshold) <= delta

def parse_input_to_numbers(raw_input):
    # Split the string
    na = raw_input.split(',')

    numbers = []

    # Ok, so this is the main bit. It does the actual
    # Adding of the numbers.  It also parses the strings.
    for n in na:
        try:
            # parse and add
            numbers.append(float(n))
        except (ValueError, OverflowError):
            # Something went wrong.
            # This will happen if they put in something that isn't a number
            raise ParsingError("Error parsing value: {}".format(n))
    return numbers

def main():
    # Prompt the user for an input array of numbers
    s = input('Number Array: ')

    numbers = parse_input_to_numbers(s)

    if is_sum_close_to_threshold(numbers):
        # The sum is invalid. We need to fix it.
        print("Your sum is invalid! Must re-compute!")

        # remove the 2nd number from the array
        numbers.pop(1)

    # Sum the numbers
    sm = sum(numbers)

    # Yay, we made it, print the sum
    print("Sum: {}".format(sm))

if __name__ == '__main__':
    main()

Notice that I simplified the logic for checking the sum by using the abs python built-in.

Now to finish up our cleanup by dealing with the comments and variable names:

#!/usr/bin/env python3

class ParsingError(Exception):
    pass

def is_sum_close_to_threshold(numbers, threshold=42, delta=0.5):
    numbers_sum = sum(numbers)
    return abs(numbers_sum - threshold) <= delta

def parse_input_to_numbers(raw_input):
    raw_numbers = raw_input.split(',')
    numbers = []
    for raw_number in raw_numbers:
        try:
            numbers.append(float(raw_number))
        except (ValueError, OverflowError):
            raise ParsingError("Error parsing value: {}".format(raw_number))
    return numbers

def main():
    raw_input = input('Number Array: ')
    numbers = parse_input_to_numbers(raw_input)

    if is_sum_close_to_threshold(numbers):
        print("Your sum is invalid! Must re-compute!")
        numbers.pop(1)

    numbers_sum = sum(numbers)
    print("Sum: {}".format(numbers_sum))

if __name__ == '__main__':
    main()

Now we have some decent code. Notice that the main function does not contain much logic, instead it delegates to other functions. Also notice the names I’m using for functions and variables. The result is easy to read and follow code, yet is lacking any comments whatsoever.

Naming Conventions

In that final code I very deliberately picked concise, descriptive names. Let's go over some of the naming conventions I use and why.

Variables

A variable name should accurately describe what the variable represents. For example: in the above code numbers contains a list of numbers; raw_input is the user input from the input function; numbers_sum is the sum of the numbers obtained from our summing function. The variable names have meaning which can be easily seen.

Note: I do not put the variable type as part of the variable name. That is an anti-pattern. numbers_list is not any clearer than numbers; raw_input_string is not better than raw_input.

Functions

Functions should do exactly what their name implies.

In my example, parse_inupt_to_numbers parses the program's raw input to a list of numbers, as would be expected by its name; is_sum_close_to_threshold returns a boolean, true if an input of parsed numbers have a sum close to the specified threshold, false if not. Most, if not all, of the functionality can now be inferred from the function names, without having to delve into their code.

Functions should be small enough that you can describe them easily in a few words. If you are struggling to come up with a name for your function, chances are you either don't really understand what the function is doing, or it is doing too much. The word "and" should be avoided; if you find yourself using "and" in your function names, then that function should probably be two separate functions. Using naming conventions like this helps to enforce the single responsibility principle, which is an entire topic by itself, but is generaly concidered a good thing.

A Note On Comments

Comments are not bad. When used properly they will clarify tricky to understand bits of code, but they should only be used as a last resort. They should only be used after refactoring and renaming of functions/variables has failed to clarify the code.

Conclusion

Every programmer should be striving for self-documenting code, or code which is able to easily convey its meaning using organization and language. It is an art form, in the same way writing a good novel is an art form. It is easy to come up with a good idea, it is conveying that good idea in an easy to understand way which is the hard part. This is one of the main traits which separates an average writer from a great one, regardless if you are writing in English, Spanish, Chinese, Python, C++, or JavaScript.