Hey all – I apologize for the (extremely) long period of no updates, I’ve been prioritizing other things. I’ve been accepted this Fall to begin my Ph.D. at University of California, Berkeley. Since I won’t be at Microsoft any longer, I’ve started a new blog with a new focus and more regular updates called Papers…

## P-complete and the limits of parallelization

We’re entering an era where CPU clock speeds will soon cease to scale upwards and instead CPU manufacturers are planning to put more and more independent cores on a chip. Intel plans to release an 80-core chip within 5 years. Consequently the research community is setting their eye on the manifold barriers to writing correct…

## Robin’s theorem

Most computer scientists are familiar with the P = NP problem, which asks essentially whether we can verify more problems in polynomial time than we can solve. So fundamentally does complexity theory hinge on this result that the Clay Mathematics Institute has labelled it one of their seven Millennium Prize Problems, and will award $1,000,000…

## Cache-oblivious data structures

In most data structure and algorithms classes, the model used for basic analysis is the traditional RAM model: we assume that we have a large, random-access array of memory, and count the number of simple reads/writes needed to perform the algorithm. For example, selection sort takes about n(n-1)/2 reads and 2n writes to sort an…

## K-nearest neighbor spatial search

I apologize to everyone for the hiatus – I realise a post is more than a little overdue and will try to be more regular in the future. I appreciate the support that I’ve received for the blog and continuing it. Consider the following problem: you’re starting a website for a chain of restaurants, and you…

## Functional list processing in C# 2.0 with anonymous delegates

One of the benefits of functional languages is their great flexibility in list manipulation, which enables them to express certain computations concisely that would require one or more verbose loops in procedural languages. Many of the features that functional programmers take for granted such as first class functions, persistent data structures, and garbage collection were included in…

## Factoring large numbers with quadratic sieve

Today I’m going to talk about how the quadratic sieve factoring algorithm works, giving a comprehensive description assuming knowledge of only basic university-level mathematics. The foundation of the most popular public-key cryptography algorithm in use today, RSA, rests on the difficulty of factoring large integers. When keys are generated, efficient algorithms are used to generate…

## Color quantization

If you’ve ever done work with Web graphics, I’m sure that at some point you reduced an image with thousands of colors, such as a screenshot or photograph, to an image with only 256 colors, for example to store the image in the GIF format. Remarkably, the reduced image is usually very visually similar to…

## How does JPEG actually work?

JPEG is an image encoding designed to compress photographs and similar images effectively, often 5 to 15 times over a raw bitmap format. It’s a lossy format that exploits properties of human vision to eliminate information that is difficult to distinguish. You see JPEGs all the time, and if you’ve seen a JPEG compressed with a low…

## Encoding correctness in types

This article will discuss effective use of types to catch some common problems at compile time not normally found by the typechecker. The typechecker is the most important tool in the programmer’s arsenal. Types enable you to structure code in a more maintainable way and prevent entire classes of errors from ever occurring at runtime,…