Random Sampling over Joins

Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod 1999. What? Random sampling as a primitive relational operator: SAMPLE(R, f) where R is the relation and f the sample fraction. SAMPLE(Q, f) is a tougher problem, where Q is a relation produced by a query In particular, focus on sampling over…

1

Converting Between Random Sampling Methods

  Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records Independent Coin flips: choose a record with probability f. The sample has s distinct records where…

0

Reservoir Sampling

A simple random sampling strategy to produce a sample without replacement from a stream of data – that is, in one pass: O(N) Want to sample s instances – uniformly at random without replacement – from a population size of n records, where n is not known. Figuring out n would require 2 passes. Reservoir…

3