Well, shoot. Start a blog and then never post to it—not the best use of bits. I can’t promise that this will be the beginning of a post-filled revolution, but it will hopefully be of some use.

Assume you have a schema like:

sales(city varchar(30), number float)

For the purposes of this example, we don’t care what number is—all we know is that we want to find the median value of number for each city. We also want to do it using only T-SQL. We also want to do it using only *declarative *T-SQL.

Here’s how:

select middles.city, avg(convert(float, number)) as median

from (

select city, floor((count(*) + 1) / 2.0) as lowmiddle, ceiling((count(*) + 1) / 2.0) as highmiddle

from sales

group by city

) as middles

join (

select city, number, row_number() over (partition by city order by number) as rank

from sales

) as ranks

on (middles.city = ranks.city) and

((middles.lowmiddle = ranks.rank) or (middles.highmiddle = ranks.rank))

group by middles.city

Note that this version of median will return the average of the two middle values if the number of values for a particular city is even. It’s simpler if you want a right- or left-biased median.

How does it work? The first subquery finds the rank for each of the two medians for each city—they’ll be the same if the number of entries for the city is odd. The second subquery finds the rank of each entry within each city. All that is left to do is join these two to find the values associated with the median ranks and average them together.

[Note: Edited to correct the problem described by Alex in the comments.]

A couple posts—and a long time—ago, I wrote about computing medians in T-SQL . Adam Mechanic has

I was looking for something similar. Tried various posts but didnt help. Finally found this and has helped me out. Thanks a ton!!!

Great post! Here’s a tweak, replace rank() with row_number(). The will ensure that all goes well with sets containing non-unique values.

Alex,

Ah! I see what you mean—very nice catch.

-Isaac