Converting Between Random Sampling Methods

Article
02/05/2008

Sampling f fraction out of n records:

Sampling with replacement
1. Sample is a multi-set of fn records. Any record could be samples multiple times.
Sampling without replacement
1. Each successive sample is uniformly at random from the remaining records
Independent Coin flips: choose a record with probability f. The sample has s distinct records where s is a random variable with Binomial distribution B(n, f), and has expectation of fn elements in the set.

Conversions:

Sample with replacement --> Sample without replacement
- Check upon each new selection and reject duplicates
Coin flips --> Sample without replacement
- Sample a larger fraction so that at least f fractoin records are present in the sample (Chernoff bounds). Reject required number of samples to ensure we get exactly f samples.
Sample without replacement --> Sample with replacement
- Sample with replacement from "without replacement" sample by using corresponding probabilities for duplication
Sample with replacement --> Coin flips sample
- Impossible
Sample without replacement --> Coin flips sample
- Impossible

Last two can be seen by considering an adversarial case. Under the binomial distribution, there is a non-zero probability of sampling the entire population. Since any sampled subset can never recreate the entire population, S with R and S without R can never be converted to the coin flips semantics.

Source: S. Chaudhuri, R. Motwani, V. Narasayya; On random Sampling over Joins.

Converting Between Random Sampling Methods

Conversions:

Additional resources