Monday, February 20, 2012

Percentile Calculation

Hi All,

I have around 1000 employee records containing employee number and their salary. How do i calculate percentile on these records?

For example
How to calculate 25th Percentile on Salary of these 1000 records?

Hope I am clear with my question.

Thanks in advance.

Regards,
qALook up the "TOP n [PERCENT]" clause in Books Online.

select top 25 percent * from [YourTable]|||Hi,
Thanks for your immediate response. But you got it wrong. Its not Percentage, its PERCENTILE. If you have used MS-Excel there is a function called
PERCENTILE(Array, k)

For example if I have 10,20,30,40,50 as values
then 30th PERCENTILE of this comes as 22

Thanks
qA|||No, I don't get how the calculation is done. And Excel's Books Online (which is normally very good at describing statistical functions) gives no explanation of the algorithm at all.|||Wikipedia, while citing Excel's function as an example, appears to me to be describing behavior similiar to TOP %:

http://en.wikipedia.org/wiki/Percentile|||Hi,
The top clause in a query will return one ore more than one records. But Percentile will always return one and only one value.|||Yes, but how to calculate that value? I found this lovely quote:

"There is no universally accepted definition of a percentile."

on this website: http://cnx.rice.edu/content/m10805/latest/

Very curious. I guess what intrigues me is that I have been using Excel and doing applied business statistics and data mining for more than a decade and have not run into this, or ever used that Excel function before.

Try describing what you want to do, without referring to the Excel function.|||OK, the more I research PERCENTILE calculation, the more I understand why I've never used it. It is about the stupidest statistical measure I have ever seen. Why? Because makes a strict linear interpolation between just two data points out of the set, without any regard to the distribution of the data.

As an example, the 30th percentile of your values (10, 20, 30, 40, 50) is 22, as Excel calculates it. But the 30th percentile of the following values (0, 10, 20, 1000, 1000000) is...guess what...also 22.

I see no practical purpose for such a measure unless your goal is to mislead your audience.

No comments:

Post a Comment