Showing posts with label percentiles. Show all posts
Showing posts with label percentiles. Show all posts

Monday, February 20, 2012

Percentiles in SQL Server 2005

Gurus,

I need to find the 50th, 75th and 90th percentiles for a set of values. For eg: if I have values 1, 2, 3, and 4 and need to get 2.5, 3.25 and 3.7 as the 50th, 75th and 90th percentiles. Since there was no aggregare function in SQL Server 2005 to deal with it, I think user defined aggregate functions was the answer. Can you direct me and give helpful tips.

Regards

Zacharia

You can use the new analytic functions like ROW_NUMBER, RANK, NTILE etc to calculate other functions. For example, you can simulate PERCENT_RANK function available in Excel like:

WITH test_ranked
AS
(
SELECT t.id, t.score

, RANK() OVER(ORDER BY t.score) as rank

, COUNT(*) OVER() as cnt
FROM test as t
)
SELECT p.id, p.score

, CASE p.cnt

-- for single row, the PERCENT_RANK is 0
WHEN 1 THEN CAST(0 AS FLOAT)
ELSE (p.rank-1.0)/(p.cnt -1.0)
END as percent_rank
FROM test_ranked as p
ORDER BY p.id;

|||

Thanks Umachandar for your reply. But I did not understand how the percentrank function can be used to solve this problem. In the above example of values 1, 2, 3 and 4 percentrank gives me a column of 0, 0.33, 0.66 and 1. The NTILE(100) returns a column of 1, 2, 3 and 4. How does that help when I need the values 2.5, 3.25 and 3.7 as the 50th, 75th and 90th percentiles. I do not really understand how the 75th and 90th percentiles are calculated. But for the 50th percentile it is the (n/2 +1)th value when n is odd and and avg(n/2, n/2 + 1) when n is even.

Sorry I do require more help

Zacharia

|||

Umachandar,

Thanks for your suggestion. I used the approach given here http://sqlteam.com/item.asp?ItemID=16480. Since I had Visual Studio 2005 Standard Edition, I could not create a User Defined Aggregate:-((. I relied on the stored procedure itself.

Zac

percentile and PercentRank

Hello All,

Does anyone know how to do an expression that calculate percentrank and percentiles? I have a table with a list of transactions and their response time. I need to find the 75th, 90th, 95th, and 98th percentile as well as the percentrank of <=5 seconds, 10 seconds, 30 seconds and 60 seconds.

Any help is appreciated!

Those are all SQL functions, not SSRS ones

NTILE(N) can give you the percentiles

so NTILE(100) will give you 1-100, while NTILE(10) is 1-10

SQL Ranking Functions - RANK, NTILE, DENSE_RANK, ROW_NUMBER

http://msdn2.microsoft.com/en-gb/library/ms189798(SQL.90).aspx

|||

Hmmmm. So how do I do it in Reporting Services? Do I need to put the NTILE function in my base SQL statement? Or are you saying the Microsoft does not have the ability to percent rank and percentiles in SSRS? I am starting to find it odd that Excel creates better reports than SSRS.

Thanks for your help

|||

yes, you use it in the base SQL statement in the "Dataset" tab (or Stored Proc)

should be much faster too

I haven't explored whether ranking is available in SSRS, but knowing Excel doesn't have ranking function either, I will assume minimum from SSRS

Note: if you have filter or parameterized report, ranking is RELATIVE in the "selected" dataset, not the "whole" dataset

e.g. if you pick only 1 person, his/her rank will always be #1; when you pick 10 person, his/her rank may be #1 to #10

|||

A query that uses the ROW_NUMBER() function might look something like:

Code Snippet

declare @.tranTable table
( tranID varchar(20),
elapsedTime integer
)
insert into @.tranTable
select 'Transaction # 1', 37.1403174930829 union all
select 'Transaction # 2', 26.0536172199012 union all
select 'Transaction # 3', 36.928245609864 union all
select 'Transaction # 4', 25.9599151436726 union all
select 'Transaction # 5', 57.9377841191627 union all
select 'Transaction # 6', 62.0332552986104 union all
select 'Transaction # 7', 42.2948297227879
--select * from @.tranTable

;with maxRank as
( select count(*) as maxRank
from @.tranTable
), ranker as
( select TranId,
elapsedTime,
100.0 * row_number() over
( order by elapsedTime, tranId ) / maxRank
as percentile
from maxRank
cross join @.tranTable
)
select * from ranker

With results that look something like:

Code Snippet

TranId elapsedTime percentile
-- --
Transaction # 4 25 14.285714285714
Transaction # 2 26 28.571428571428
Transaction # 3 36 42.857142857142
Transaction # 1 37 57.142857142857
Transaction # 7 42 71.428571428571
Transaction # 5 57 85.714285714285
Transaction # 6 62 100.000000000000

|||Excel does have both a PERCENTRANK and PERCENTILE functions which are very valuable. For example if you want to to find the 90th Percentile you would have ...

Code Snippet

=PERCENTILE(AF26:AF34,0.9)

So I would think SQL would have something like

Code Snippet

=PERCENTILE (fieldnamehere,0.9)

But it doesnt. Likewise with percentrenk, in excel you would have

Code Snippet

=PERCENTRANK(AF26:AF34,25.5,4)

But with either, SSRS or a SQL query does not appear to have a function. Still baffling to me. They do provide a standard deviation, SUM, AVG, and other basic reporting functions.

|||but no PRODUCT() aggregate