At long last, a post: Medians in T-SQL

Article
03/29/2006

Well, shoot. Start a blog and then never post to it---not the best use of bits. I can't promise that this will be the beginning of a post-filled revolution, but it will hopefully be of some use.

Assume you have a schema like:

sales(city varchar(30), number float)

For the purposes of this example, we don't care what number is---all we know is that we want to find the median value of number for each city. We also want to do it using only T-SQL. We also want to do it using only declarative T-SQL.

Here's how:

select middles.city, avg(convert(float, number)) as median
from (
select city, floor((count(*) + 1) / 2.0) as lowmiddle, ceiling((count(*) + 1) / 2.0) as highmiddle
from sales
group by city
) as middles
join (
select city, number, row_number() over (partition by city order by number) as rank
from sales
) as ranks
on (middles.city = ranks.city) and
((middles.lowmiddle = ranks.rank) or (middles.highmiddle = ranks.rank))
group by middles.city

Note that this version of median will return the average of the two middle values if the number of values for a particular city is even. It's simpler if you want a right- or left-biased median.

How does it work? The first subquery finds the rank for each of the two medians for each city---they'll be the same if the number of entries for the city is odd. The second subquery finds the rank of each entry within each city. All that is left to do is join these two to find the values associated with the median ranks and average them together.

[Note: Edited to correct the problem described by Alex in the comments.]

At long last, a post: Medians in T-SQL

Additional resources