Pytest and Parametrization

Python--being a dynamic language--is very fast to program in, although it also is very easy to let syntax and other errors sneak into your code. Good test coverage is key for any python code prioritizing stability.

Luckily, thanks to pytest, testing in python is usually a pretty enjoyable experience.


A quick note: My example tests in this article will be presented as functions without any parent class, but the concepts talked about here can just as easily be assigned to class based tests.


The Code

Here is a very simple function I created for us to test. It has one return code path, and raises different errors depending on the input validation. It is a very trivial piece of code, but should allow us to look into better testing practices.

#!/usr/bin/python3

def some_function_to_test(input0, input1):
    if input0 is None or input1 is None:
        raise NoneError

    if not isinstance(input0, str) or not isinstance(input1, str):
        raise StringError

    if input0 == 'foo':
        raise FooError

    if input1 == 'bar':
        raise BarError

    return 'input0 was: {}\n input1 was: {}'.format(input0, input1)

class BaseError(Exception):
    pass

class StringError(BaseError):
    pass

class NoneError(BaseError):
    pass

class FooError(BaseError):
    pass

class BarError(BaseError):
    pass

A Typical Set of Test

Let's write a set of tests that exercise all the possible code paths in this function:

import pytest
import some_code

def test_some_function_to_test_input0_None():
    with pytest.raises(some_code.NoneError):
        some_code.some_function_to_test(None, 'test')

def test_some_function_to_test_input1_None():
    with pytest.raises(some_code.NoneError):
        some_code.some_function_to_test('test', None)

def test_some_function_to_test_input0_not_string():
    with pytest.raises(some_code.StringError):
        some_code.some_function_to_test(1, 'test')

def test_some_function_to_test_input1_not_string():
    with pytest.raises(some_code.StringError):
        some_code.some_function_to_test('test', 1)

def test_some_function_to_test_input0_is_foo():
    with pytest.raises(some_code.FooError):
        some_code.some_function_to_test('foo', 'test')

def test_some_function_to_test_input1_is_bar():
    with pytest.raises(some_code.BarError):
        some_code.some_function_to_test('test', 'bar')

def test_some_function_to_test_input0_and_input1_are_valid():
    result = some_code.some_function_to_test('bar', 'foo')
    assert result == 'input0 was: bar\n input1 was: foo'

Nothing too special about these tests, though it is important to note that we are using the pytest.raises construct to test the error paths. We have 100% test coverage, and while we might not have tested all the edge cases, we can be fairly confident our function is going to work as expected.

A Better Way

While these tests aren't bad, there is a lot of code duplication in these tests, when all that really is changing is the inputs and the possible raised error. Test parametrization should be able to take care of this:

import pytest
import some_code

@pytest.mark.parametrize("input0, input1, error",[
    (None, 'test', some_code.NoneError),
    ('test', None, some_code.NoneError),
    (1, 'test', some_code.StringError),
    ('test', 1, some_code.StringError),
    ('foo', 'test', some_code.FooError),
    ('test', 'bar', some_code.BarError),
])
def test_some_function_to_test_invalid_inputs(input0, input1, error):
    with pytest.raises(error):
        some_code.some_function_to_test(input0, input1)

def test_some_function_to_test_input0_and_input1_are_valid():
    result = some_code.some_function_to_test('bar', 'foo')
    assert result == 'input0 was: bar\n input1 was: foo'

This helps simplify the tests quite a bit. We now have much less repetition, and adding another test case is as simple as adding another case to the parameterization list. That being said, we can do more.

By using the pytest-raises pytest plugin (disclaimer: pytest-raises was written by me), which can be used as a decorator for both individual tests and for individual test cases when parametrizing, we can simplify these tests further:

import pytest
import some_code

@pytest.mark.parametrize("input0, input1, expected",[
    pytest.mark.raises((None, 'test', None), exception=some_code.NoneError),
    pytest.mark.raises(('test', None, None), exception=some_code.NoneError),
    pytest.mark.raises((1, 'test', None), exception=some_code.StringError),
    pytest.mark.raises(('test', 1, None), exception=some_code.StringError),
    pytest.mark.raises(('foo', 'test', None), exception=some_code.FooError),
    pytest.mark.raises(('test', 'bar', None), exception=some_code.BarError),
    ('bar', 'foo', 'input0 was: bar\n input1 was: foo'),
])
def test_some_function_to_test(input0, input1, expected):
    result = some_code.some_function_to_test(input0, input1)
    assert result == expected

Now we are down to one test, with just a bunch of test cases, and yet are still able to obtain 100% test coverage. This code is much easier to maintain than the original set of tests, and it is much easier to add additional test cases. We have lost some context in the test name, but have gained that context in the marking of the tests as ones which raise an exception.

Why Not pytest.mark.xfail?

In the past, I have used (abused) pytest.mark.xfail in this situation. The tests look very similar:

import pytest
import some_code

@pytest.mark.parametrize("input0, input1, expected",[
    pytest.mark.xfail((None, 'test', None), raises=some_code.NoneError),
    pytest.mark.xfail(('test', None, None), raises=some_code.NoneError),
    pytest.mark.xfail((1, 'test', None), raises=some_code.StringError),
    pytest.mark.xfail(('test', 1, None), raises=some_code.StringError),
    pytest.mark.xfail(('foo', 'test', None), raises=some_code.FooError),
    pytest.mark.xfail(('test', 'bar', None), raises=some_code.BarError),
    ('bar', 'foo', 'input0 was: bar\n input1 was: foo'),
])
def test_some_function_to_test(input0, input1, expected):
    result = some_code.some_function_to_test(input0, input1)
    assert result == expected

Unfortunately pytest.mark.xfail was not designed for this, and has some major issues which forced me to write the pytest-raises plugin.

Consider the following tests:

@pytest.mark.parametrize("input0, input1, expected",[
    pytest.mark.xfail(('test', 'bar', None), raises=some_code.BarError),
    pytest.mark.xfail(('bar', 'foo', 'input0 was: bar\n input1 was: foo'), raises=some_code.BarError),
    ('bar', 'foo', 'input0 was: bar\n input1 was: foo'),
])
def test_some_function_to_test_xfail(input0, input1, expected):
    result = some_code.some_function_to_test(input0, input1)
    assert result == expected

Given our function, we would expect the first test case to pass, as it would raise an error as expected by our test; the second test would fail, as it would not raise any errors; and the third test would pass. So we would expect the result of our test run to be:

test_code_parametrize_xfail.py .F.

with a failing exit code.

Instead this is the actual result:

test_code_parametrize_xfail.py xX.

with a passing exit code.

The small x means the test failed and a failure was expected. The large X means the test did not fail and a failure was expected.

There are a two issues with this output: 1. The difference between x/X is very hard to see when running tests. 2. If a test passes which was expected to fail (the X case) and all other tests pass, the test run returns a passing exit code. This is the problem with using pytest.mark.xfail in this case.

It is important to note, neither of these issues are bugs in pytest.mark.xfail and pytest.mark.xfail does not need to be changed. I was simply using pytest.mark.xfail in a way it was not intended to be used. Often in programing we are forced to either use a tool we have in a way it was not intended to save time (abuse the tool) or to write the tool we actually need. I started off doing the former, but now have moved on to the latter (the pytest-raises plugin).

Conclusion

I don't like to be very picky about tests. Tests are a good thing, and any tests are better than no tests. But writing easy-to-read, maintainable tests will reduce cognitive load and maintenance burden, so those are what I strive to write, and encourage others to do as well. Hopefully this article has helped others to think and program that way as well.