Fuzzy matching in Python
- Docs here
Most simple use case
Create helper function so we don’t need to specify
None each time.
Compare one sequence to multiple other sequences (
SequenceMatcher caches second sequence)
abc, abc -> 1.000 ab, abc -> 0.800 abcd, abc -> 0.857 cde, abc -> 0.333 def, abc -> 0.000
Based on this tutorial.
Finding perfect or imperfect substrings
One limitation of
SequenceMatcher is that two sequences that clearly refer to the same thing might get a lower score than two sequences that refer to something different.
fuzzywuzzy has a useful function for this based on what they call the “best-partial” heuristic, which returns the similarity score for the best substring of length
For one of my projects, I want to filter out financial transactions for which the description is a perfect or near-perfect substring of another transaction. So this is exactly what I need.