Monday, February 20, 2012

PERCENTILE_CONT equivalent in MS SQL

Hi All,

I have a table with a count of 30K rows that describe values over a period of time. I would like to be able to replicate the PERCENTILE_CONT function that is found in Oracle with MS SQL Server 2005 but have not had much success.

Would anybody know of a way to use current MS SQL syntax to retrieve the value of the 5th,10th,15th,20th etc... percentile of my table values? Assume that all of the values are associated with the same key. Oracle gives you the Percentile_Cont function for inverse percentile calculation but what does SQL2005 give you?

Sample Table Structure:

ID int, Timestamp datetime, Value int

There is no equivalent aggregate function in TSQL. You can however use the query described below which is same as the PERCENTILE_CONT implementation in Oracle. The ORDER BY clause in the PERCENTILE_CONT goes in the ROW_NUMBER OVER clause in TSQL query and the GROUP BY clause becomes the partition by clause in the TSQL query. I have kept the parameter to percentile_cont as variable. You can use the TSQL query and rewrite using Oracle syntax to check also.

-- Given the Oracle example query using sample EMP table:

SELECT deptno,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sal DESC) as percent_cont
FROM emp
GROUP BY deptno;

-- Oracle equivalent using analytic functions present in SQL Server

select deptno, sum(case grp
when 1 then (case PRN when RN then sal end)
else (case PRN when FRN then (CRN - RN) * sal when CRN then (RN - FRN) * sal end)
end) as percent_cont
from (
select deptno, sal, N, PRN, RN, CRN, FRN, case when e3.CRN = e3.FRN and e3.CRN = e3.RN then 1 else 0 end as grp
from (
select deptno, sal, N, PRN, RN, CEIL(RN) as CRN, FLOOR(RN) as FRN
from (
select deptno, sal, PRN, N, 1 + (.5* (N-1)) as RN
from (
select deptno, sal, row_number() over(partition by deptno order by sal desc) as PRN, count(*) over(partition by deptno) as N
from emp
) e1
) e2
) e3
) er
group by deptno;
-- TSQL equivalent:

-- Optimized version that requires only one scan of the data

-- Same as above except that this uses CTE instead of derived table
declare @.p float;
set @.p = .5;
with e1
as
(
select deptno, sal, row_number() over(partition by deptno order by sal desc) as PRN, count(*) over(partition by deptno) as N
from emp
),
e2
as
(
select deptno, sal, PRN, N, 1 + (@.p * (N-1)) as RN
from e1
),
e3
as
(
select deptno, sal, N, PRN, RN, CEILING(RN) as CRN, FLOOR(RN) as FRN
from e2
),
er
as
(
select deptno, sal, N, PRN, RN, CRN, FRN, case when e3.CRN = e3.FRN and e3.CRN = e3.RN then 1 else 0 end as grp
from e3
)
select deptno, sum(case grp
when 1 then (case PRN when RN then sal end)
else (case PRN when FRN then (CRN - RN) * sal when CRN then (RN - FRN) * sal end)
end) as percent_cont
from er
group by deptno;
go

For those who are interested, I blogged about this at link below. You can download a script that creates the Oracle demo tables in SQL Server and run the queries.

http://blogs.msdn.com/sqltips/archive/2006/05/15/598372.aspx

|||Thank you very much. That is exactly what I was looking for.

No comments:

Post a Comment