When encoding categorical variables, you might want to capture the similarities among these categories such as ‘Master Police Officer’ and ‘Police Officer III’. If so, use dirty-cat.
In the code above, I use dirty-cat’s SimilarityEncoder to encode the titles while capturing their similarities.
The correlation matrix shows how similar two labels are using the encoded values. We can see that the similarity between ‘Master Police Officer’ and ‘Police Officer III’ is 0.86.