Selecting median of two sorted arrays

In this post, I’d like to discuss one interesting algorithm problem which took me quite a while to find an ideal solution. The problem is as followings:

Array A and B are sorted with length of m and n respectively. Try to select median of the two arrays within O(log(m+n)) time. For example,

A = { 1, 4, 6, 10, 18 }

B = { 1, 7, 13, 45, 58, 69, 100, 180, 300 }

Answer: 13

Step1: The lengths are equal. Let us start with a special scenario of the problem. The solution is as following.

Examine the middle element of each array, and throw out the lower half of the array with the smaller element (since all those must be less than ½ the numbers) and throw out the upper half of the array with the larger element (since all those must be greater than ½ the numbers). Now both arrays are still the same size. Repeat until you have two elements left. This is your median. Each step, you eliminate half of the numbers, so it should have a runtime of O(logn).

Step2: What if the lengths are not equal?

The smartest solution I’ve found is padding. It is best to think of median as a special case of the select problem, where given a set and an index k, you need to find the kth smallest element. Median is simply Select(n/2) when n is even, and fully defined by Select(\floor(n/2)) and Select (\Ceiling(n/2)) when n is odd. Now there is a simple O(log n) algorithm for the Select problem on the union of two equal sorted arrays, that throws out the appropriate halves of the two lists recursively.

If you want a simpler reduction to equal sized lists, pad the shorter list with an equal number of positive and negative infinities. If you need an odd number of pads (ie: when the source lists have an odd total), do it with positive infinity and (since the set size was odd) return the lesser of the two retuned values.