Fuzzy matching in Python
Contents
difflib
- Docs here
 
 |  | 
Most simple use case
 |  | 
0.9629629629629629
Create helper function so we don’t need to specify None each time.
 |  | 
0.9629629629629629
Compare one sequence to multiple other sequences (SequenceMatcher caches second sequence)
 |  | 
abc, abc  -> 1.000
ab, abc   -> 0.800
abcd, abc -> 0.857
cde, abc  -> 0.333
def, abc  -> 0.000
fuzzywuzzy
Based on this tutorial.
Finding perfect or imperfect substrings
One limitation of SequenceMatcher is that two sequences that clearly refer to the same thing might get a lower score than two sequences that refer to something different.
 |  | 
0.6086956521739131
0.7586206896551724
fuzzywuzzy has a useful function for this based on what they call the “best-partial” heuristic, which returns the similarity score for the best substring of length min(len(seq1)), len(seq2)).
 |  | 
100
69
For one of my projects, I want to filter out financial transactions for which the description is a perfect or near-perfect substring of another transaction. So this is exactly what I need.
 |  |