Monday, February 20, 2012

Percentiles in SQL Server 2005

Gurus,

I need to find the 50th, 75th and 90th percentiles for a set of values. For eg: if I have values 1, 2, 3, and 4 and need to get 2.5, 3.25 and 3.7 as the 50th, 75th and 90th percentiles. Since there was no aggregare function in SQL Server 2005 to deal with it, I think user defined aggregate functions was the answer. Can you direct me and give helpful tips.

Regards

Zacharia

You can use the new analytic functions like ROW_NUMBER, RANK, NTILE etc to calculate other functions. For example, you can simulate PERCENT_RANK function available in Excel like:

WITH test_ranked
AS
(
SELECT t.id, t.score

, RANK() OVER(ORDER BY t.score) as rank

, COUNT(*) OVER() as cnt
FROM test as t
)
SELECT p.id, p.score

, CASE p.cnt

-- for single row, the PERCENT_RANK is 0
WHEN 1 THEN CAST(0 AS FLOAT)
ELSE (p.rank-1.0)/(p.cnt -1.0)
END as percent_rank
FROM test_ranked as p
ORDER BY p.id;

|||

Thanks Umachandar for your reply. But I did not understand how the percentrank function can be used to solve this problem. In the above example of values 1, 2, 3 and 4 percentrank gives me a column of 0, 0.33, 0.66 and 1. The NTILE(100) returns a column of 1, 2, 3 and 4. How does that help when I need the values 2.5, 3.25 and 3.7 as the 50th, 75th and 90th percentiles. I do not really understand how the 75th and 90th percentiles are calculated. But for the 50th percentile it is the (n/2 +1)th value when n is odd and and avg(n/2, n/2 + 1) when n is even.

Sorry I do require more help

Zacharia

|||

Umachandar,

Thanks for your suggestion. I used the approach given here http://sqlteam.com/item.asp?ItemID=16480. Since I had Visual Studio 2005 Standard Edition, I could not create a User Defined Aggregate:-((. I relied on the stored procedure itself.

Zac

No comments:

Post a Comment