Monday, March 12, 2012

performance (distinct and group )

hi!
I have a doubt when the performance of some commands distinct and group by,
which is faster
some times group by is faster than distinct, somebody knows of something?
tanks
joaojoao
Some comments I have cut and pasted from a programming
good practices document
Carefully evaluate whether your query needs the DISTINCT
clause or not. The DISTINCT clause slows down virtually
every query it is in. Some developers automatically add
this clause to every one of their SELECT statements, even
when it is not necessary. This is a bad habit that should
be stopped. In addition, keep in mind that in some cases,
duplicate results in a query are not a problem. If this is
the case, then don't use a DISTINCT clause.
Don't use DISTINCT or ORDER BY in your SELECT statements
unless you really need them. Both options can add a lot of
additional overhead to your query, and they aren't always
needed for your application.
If your SELECT statement contains a HAVING clause, write
your query so that the WHERE clause does most of the work
(removing undesired rows) instead of the HAVING clause do
the work of removing undesired rows. Using the WHERE
clause appropriately can eliminate unnecessary rows before
they get to the GROUP BY and HAVING clause, saving some
unnecessary work, and boosting performance.
For example, in a SELECT statement with WHERE, GROUP BY,
and HAVING clauses, here's what happens. First, the WHERE
clause is used to select the appropriate rows that need to
be grouped. Next, the GROUP BY clause divides the rows
into sets of grouped rows, and then aggregates their
values. And last, the HAVING clause then eliminates
undesired aggregated groups. If the WHERE clause is used
to eliminate as many of the undesired rows as possible,
this means the GROUP BY and the HAVING clauses will have
less work to do, boosting the overall performance of the
query.
The GROUP BY clause can be used with or without an
aggregate function. But if you want optimum performance,
don't use the GROUP BY clause without an aggregate
function. This is because you can accomplish the same end
result by using the DISTINCT option instead, and it is
faster.
For example, you could write your query two different
ways:
USE Northwind
SELECT OrderID
FROM [Order Details]
WHERE UnitPrice > 10
GROUP BY OrderID
or
USE Northwind
SELECT DISTINCT OrderID
FROM [Order Details]
WHERE UnitPrice > 10
Both of the above queries produce the same results, but
the second one will use less resources and perform faster.
Hope this helps
John|||John Bandettini wrote:
<snip>
> SELECT OrderID
> FROM [Order Details]
> WHERE UnitPrice > 10
> GROUP BY OrderID
> or
> SELECT DISTINCT OrderID
> FROM [Order Details]
> WHERE UnitPrice > 10
> Both of the above queries produce the same results, but
> the second one will use less resources and perform faster.
John, when I run these queries, I get exactly the same query plan. This
is what I expected. I think nowadays SQL-Server is smart enough to
determine that these two queries produce the same result, and therefore
can be handled with the same (optimum) query plan.
Gert-Jan|||In addition...
If you get a significant difference between the two, you might want to communicate this to MS. If
you have a repro, you can post it here. These two operations are the same, semantically, and if the
optimizer produces a worse plan for distinct then we have found a weakness in the optimizer.
--
Tibor Karaszi, SQL Server MVP
Archive at: http://groups.google.com/groups?oi=djq&as ugroup=microsoft.public.sqlserver
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:3F9440F5.9E25C9C2@.toomuchspamalready.nl...
> John Bandettini wrote:
> <snip>
> > SELECT OrderID
> > FROM [Order Details]
> > WHERE UnitPrice > 10
> > GROUP BY OrderID
> > or
> >
> > SELECT DISTINCT OrderID
> > FROM [Order Details]
> > WHERE UnitPrice > 10
> >
> > Both of the above queries produce the same results, but
> > the second one will use less resources and perform faster.
> John, when I run these queries, I get exactly the same query plan. This
> is what I expected. I think nowadays SQL-Server is smart enough to
> determine that these two queries produce the same result, and therefore
> can be handled with the same (optimum) query plan.
> Gert-Jan

No comments:

Post a Comment