Wednesday, March 21, 2012

Performance Complex SQL Issue

Hi,
I have a table ORDER_DETAIL with 22 million rows which has an index
of
(person_id, code_id, created_dtt)

I have another ORDER table with 5 million rows which has an index
of
(order_dtt, person_id)

I have a small CODES table with 1000 rows which allows me to get
the 50 or so codes I need. My query needs to be something like this:

select od.person_id, od.code_id
from order_detail od, order o, codes c
where o.order_dtt between sysdate-365 and sysdate
and o.person_id = od.person_id
and od.code_id in (select code_id from codes where code_type =
'MYCODE')
and od.create_dtt between sysdate-365 and sysdate

But is this using the full index on the ORDER_DETAIL table? Should I
be using EXISTS in some fashion instead?

Accessing the ORDER_DETAIL table is a pain because it is so large, the
code_id's I need are a relatively small number but the date range is
about 25% of the table - same with the ORDER table. However it is the
only way I can get to filtering down to the code_id.

This is a simplification of the problem - but accurate - adding
additional indexes is not an option.

thanks!

Tim[posted and mailed, please reply in news]

Tim Smith (timasmith@.hotmail.com) writes:
> I have a table ORDER_DETAIL with 22 million rows which has an index of
> (person_id, code_id, created_dtt)
> I have another ORDER table with 5 million rows which has an index of
> (order_dtt, person_id)
> I have a small CODES table with 1000 rows which allows me to get
> the 50 or so codes I need. My query needs to be something like this:
> select od.person_id, od.code_id
> from order_detail od, order o, codes c
> where o.order_dtt between sysdate-365 and sysdate
> and o.person_id = od.person_id
> and od.code_id in (select code_id from codes where code_type =
> 'MYCODE')
> and od.create_dtt between sysdate-365 and sysdate
> But is this using the full index on the ORDER_DETAIL table? Should I
> be using EXISTS in some fashion instead?

First, what is sysdate supposed to be? I ask because there is nothing
called sysdate in MS SQL Server. While this forum is for MS SQL Server,
it has happened before, that people have asked questions that have
applied to other database engines. While some SQL questions are fairly
generic, performance questions are often engine specific, because
different DB engines uses difference strategies.

The query as written contains a superfluous occurance of codes in
the FROM clause. This could lead to a cartesian join between codes
and the rest of the result set.

The simplest way to involve codes in the query would be:

select od.person_id, od.code_id
from order_detail od, order o, codes c
where o.order_dtt between sysdate-365 and sysdate
and o.person_id = od.person_id
and od.code_id = c.code_id
and c.code_type = 'MYCODE'
and od.create_dtt between sysdate-365 and sysdate

Now, assuming that you are using MS SQL Server 2000, I think you would get
better performance without involving the orders table at all. Had you been
able to narrow down the range to say 1% of the table with the condition on
orders.order_dtt, it would be another issue.

To join orders and order_details, there are three different join
strategies that SQL Server can use Nested Loops, Merge Join and
Hash Join. Nested Loops, means that for each matching rows in
Orders, look up a row in Order Details. When you look up 25% of
the rows, you will access several data pages more than once, and
you will get more reads than for a plain table scan on Order Details.

Merge Join and Hash Join both involves scanning the tables, but only
doing it once. I don't know if a merge join is possible in this case.

Of course, here I am discussing the example query you posted. The
actual query you have may be different.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.aspsql

No comments:

Post a Comment