[Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be answered more efficiently by using locality sensitive hashing, where the main idea is that similar objects hash to the same bucket. LSH function: Probability of collision higher for similar objects Hash data using several LSH functions At query time, get all objects…

# Month: June 2008

## Set Similarity and Min Hash

Given two sets S1, S2, find similarity(S1, S2) – based not hamming distance (not Euclidean). Jaccard Measure View sets at a bit-array. Indexes representing each possible element, and 1/0 representing presence/absence of the element in the set. Then Jaccard measure = What happens when: n element in each set from a possible universe u, s.t….