Introduction - If you have any usage issues, please Google them yourself
Minhashing and Locality-Sensitive-Hashing (LSH). The algorithms are approximate in that they find only candidate pairs that are likely similar. Therefore, there are two types of error. A false positive (FP) is a candidate pair that is not actually similar according to the given similarity threshold. A false negative (FN) is a similar pair that is not a candidate pair. In this project, we will implement the Minhashing and LSH algorithms, apply them on data sets, and draw observations about when these algorithms perform.