Testing and validating
Notes on testing and data validation.
What I currently do:
- I use
assertto test small ad-hoc pieces of code,
pytestto test crucial pieces of code, and a series of validating functions that check certain assumptions about the data
What I need:
A process to ensure that my data preprocessing pipeline produces the intended data.
I still don’t fully understand how good data scientists test their code. Unit testing seems incomplete because while it ensures that a piece of code works as expected, the main challenge when working with large datasets is often that there are special cases in the data that I don’t know in advance and can’t think of. To catch these, I need to perform tests on the full data. In addition to that, manually creating a dataframe for testing is a huge pain when I need to test functions that create more complex data patterns (e.g. checking whether, in a financial transactions dataset, certain individuals in the data have a at least a certain number of monthly transactions for a certain type of bank account).
I currently mainly use validating functions that operate on the entire dataset at the end of the pipeline (e.g. transaction data production in entropy project), which already has proven invaluable in catching errors. Using a decorator to add each validator function to a list of validators, and then running the data through each validator in that list works well and is fairly convenient.
panderaseems potentially useful in that the defined schema can be used both to validate data and – excitingly – can also be used to generate sample datasets for
pytest, which could go a long way towards solving the above problem. But specifying a data schema for a non-trivial dataset is not easy, and I can’t see how to write one for a dataset like MDB, where I need constraints such as a certain number of financial accounts of a certain type per user. So, for now, I just use my own validation functions. The testing branch in the entropy project has a
schema.pyfile that expriments with the library.
This article has been very useful, suggesting the following approach to testing:
assertstatements for ad-hoc pieces of code in Jupyter Lab,
pytestfor pieces of code others user,
hypothesisfor code that operates on the data, and
panderaor other validator libraries for overall data validation. I basically do the first and last of these, and am still looking for ways to do
Basic test for return value
Very basic test
More transparent test with message, which will show when AssertionError is raised
Careful with floats: because of this:
Testing for exceptions
Use context manager that silences expected error if raised within context and raises an assertion error if expected error isn’t raised.
Failed: DID NOT RAISE <class 'ValueError'>
Test for correct error message
What’s a “well-tested” function?
- Bad arguments
- Examples: incomplete args, wrong dimensions, wrong type, etc.
- Return value: exception
- Special arguments
- Examples: values triggering special logic, boundary value (value between bad and good arguments and before or after values that raise special logic)
- Return value: expected value
- Normal arguments
- Examples: All other values, test 2 or 3
- Return value: expected value
Keeping tests organised
Principles to follow:
- Mirror structure of
- Name test modules as
test_<name of src module>
- Within test module, collect all tests for a single function in a class named
TestNameOfFunction(from DataCamp: ‘Test classes are containers inside test modules. They help separate tests for different functions within the test module, and serve as a structuring tool in the pytest framework.’)
Marking tests as expected to fail
Sometimes we might want to differentiate between failing code and tests that we know won’t run yet or under certain conditions (e.g. we might follow TDD and haven’t written a test yet, or we know a function only runs in Python 3). In this case, we can apply decorators to either functions or classes
Expect to fail always (e.g. because not implemented yet)
Expect to fail under certain conditions (e.g. certain Python versions, operating systems, etc.).
pytestruns all tests
pytest -xstops after first failure
pytest <path to test module>runs all tests in test module
pytest <path to test module>::<test class name>runs all tests in test module with specified node id
pytest <path to test module>::<test class name>::<test name>runs test with specified node id
pytest -k <patter>runds tests that fit pattern
pytest -k <TestNameOfFunction>runs all tests in specified class
pytest -k <NameOf and not second thing>runs all tests in specified class except for
pytest -rshow reasons
pytest -rsshow reasons for skipped tests
pytest -rxshow reasons for xfailed tests
pytest -rsxshow reasons for skipped and xfailed
Useful alternative using
tempdir() and fixture chaining:
NameError: name 'pytest' is not defined
Testing functions independently of dependencies
To test models, use toy datasets for which I know the correct results and perform sanity-checks using assertions I can know.
Tests for training function
Tests for final model
- Create baseline plot using plotting function and store as PNG image
- Test plotting function and compare to baseline
baseline image needs to be stored in
baseline subfolder of the plot module testing directory.
To create baseline image, do following:
>pytest -k 'test_plot_for_linear_data' --mpl-generate-path <path-to-baseline-folder>
To compare future tests with baseline image run:
> pytest -k 'test_plot_for_linear_data' --mpl