Information Theory (1) – The Science of Communication

IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding theory and communications (under unreliable channels), cryptography, physics, biomedical engineering, computer graphics, machine learning, statistics, and even gambling and stock trading. It is truly a marvelous through process. Framework: "Bit" as the unit of communication. Arbitrary. Ok as long…

1

Random Sampling over Joins

Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod 1999. What? Random sampling as a primitive relational operator: SAMPLE(R, f) where R is the relation and f the sample fraction. SAMPLE(Q, f) is a tougher problem, where Q is a relation produced by a query In particular, focus on sampling over…

1

Converting Between Random Sampling Methods

  Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records Independent Coin flips: choose a record with probability f. The sample has s distinct records where…

0

Reservoir Sampling

A simple random sampling strategy to produce a sample without replacement from a stream of data – that is, in one pass: O(N) Want to sample s instances – uniformly at random without replacement – from a population size of n records, where n is not known. Figuring out n would require 2 passes. Reservoir…

3