Introduction - If you have any usage issues, please Google them yourself
Weighted minwise hashing (WMH) is one of the fundamental subroutine,
required by many celebrated approximation algorithms, commonly
adopted in industrial practice for large -scale search and learning. The
resource bottleneck with WMH is the computation of multiple (typically a
few hundreds to thousands) independent hashes of the data. We propose
a simple rejection type sampling scheme based on a carefully designed
red-green map, where we show that the number of rejected sample has
exactly the same distribution as weighted minwise sampling.