Testing and validating
Notes on testing and data validation.
What I currently do:
- I use
assert
to test small ad-hoc pieces of code,pytest
to test crucial pieces of code, and a series of validating functions that check certain assumptions about the data
What I need:
A process to ensure that my data preprocessing pipeline produces the intended data.
I still don’t fully understand how good data scientists test their code. Unit testing seems incomplete because while it ensures that a piece of code works as expected, the main challenge when working with large datasets is often that there are special cases in the data that I don’t know in advance and can’t think of. To catch these, I need to perform tests on the full data. In addition to that, manually creating a dataframe for testing is a huge pain when I need to test functions that create more complex data patterns (e.g. checking whether, in a financial transactions dataset, certain individuals in the data have a at least a certain number of monthly transactions for a certain type of bank account).
I currently mainly use validating functions that operate on the entire dataset at the end of the pipeline (e.g. transaction data production in entropy project), which already has proven invaluable in catching errors. Using a decorator to add each validator function to a list of validators, and then running the data through each validator in that list works well and is fairly convenient.
pandera
seems potentially useful in that the defined schema can be used both to validate data and – excitingly – can also be used to generate sample datasets forhypothesis
andpytest
, which could go a long way towards solving the above problem. But specifying a data schema for a non-trivial dataset is not easy, and I can’t see how to write one for a dataset like MDB, where I need constraints such as a certain number of financial accounts of a certain type per user. So, for now, I just use my own validation functions. The testing branch in the entropy project has aschema.py
file that expriments with the library.This article has been very useful, suggesting the following approach to testing:
assert
statements for ad-hoc pieces of code in Jupyter Lab,pytest
for pieces of code others user,hypothesis
for code that operates on the data, andpandera
or other validator libraries for overall data validation. I basically do the first and last of these, and am still looking for ways to do
pytest
notes
Basic test for return value
|
|
Very basic test
|
|
More transparent test with message, which will show when AssertionError is raised
|
|
Careful with floats: because of this:
|
|
False
Use this:
|
|
True
Testing for exceptions
Use context manager that silences expected error if raised within context and raises an assertion error if expected error isn’t raised.
|
|
|
|
Failed: DID NOT RAISE <class 'ValueError'>
Basic example:
|
|
Test for correct error message
|
|
What’s a “well-tested” function?
Argument types:
- Bad arguments
- Examples: incomplete args, wrong dimensions, wrong type, etc.
- Return value: exception
- Special arguments
- Examples: values triggering special logic, boundary value (value between bad and good arguments and before or after values that raise special logic)
- Return value: expected value
- Normal arguments
- Examples: All other values, test 2 or 3
- Return value: expected value
Keeping tests organised
Principles to follow:
- Mirror structure of
src
directory intests
directory - Name test modules as
test_<name of src module>
- Within test module, collect all tests for a single function in a class named
TestNameOfFunction
(from DataCamp: ‘Test classes are containers inside test modules. They help separate tests for different functions within the test module, and serve as a structuring tool in the pytest framework.’)
|
|
Marking tests as expected to fail
Sometimes we might want to differentiate between failing code and tests that we know won’t run yet or under certain conditions (e.g. we might follow TDD and haven’t written a test yet, or we know a function only runs in Python 3). In this case, we can apply decorators to either functions or classes
Expect to fail always (e.g. because not implemented yet)
|
|
Expect to fail under certain conditions (e.g. certain Python versions, operating systems, etc.).
|
|
Running pytests
pytest
runs all testspytest -x
stops after first failurepytest <path to test module>
runs all tests in test modulepytest <path to test module>::<test class name>
runs all tests in test module with specified node idpytest <path to test module>::<test class name>::<test name>
runs test with specified node idpytest -k <patter>
runds tests that fit patternpytest -k <TestNameOfFunction>
runs all tests in specified classpytest -k <NameOf and not second thing>
runs all tests in specified class except fortest_second_thing
pytest -r
show reasonspytest -rs
show reasons for skipped testspytest -rx
show reasons for xfailed testspytest -rsx
show reasons for skipped and xfailed
Fixtures
|
|
Useful alternative using tempdir()
and fixture chaining:
|
|
NameError: name 'pytest' is not defined
Mocking
Testing functions independently of dependencies
Testing models
To test models, use toy datasets for which I know the correct results and perform sanity-checks using assertions I can know.
Tests for training function
|
|
Tests for final model
|
|
Testing plots
Overall approach:
- Create baseline plot using plotting function and store as PNG image
- Test plotting function and compare to baseline
Install pytest-mpl
|
|
baseline image needs to be stored in baseline
subfolder of the plot module testing directory.
To create baseline image, do following:
>pytest -k 'test_plot_for_linear_data' --mpl-generate-path <path-to-baseline-folder>
To compare future tests with baseline image run:
> pytest -k 'test_plot_for_linear_data' --mpl