Wednesday, March 28, 2012

Performance for nightly jobs

I have a job that takes 9 hours that I need to optimize. The job is an sp
that populates a flat table. Here are the rules of the job.
Members own contracts. Contracts are made up of diffrent products that are
grouped in product categories. A certain product may be in several diffrent
contracts. My job needs to display the member, the product category, and a
distinct count of products in that category for the member.
I have optimized the query so far by taking it out of a cursor but I have
one last query that is still hurting my performance.
The first pass (which has a slight flaw) looked liked so...
SELECT a.member_id,
b.product_category,
COUNT(b.product_id)
FROM member_contract a (NOLOCK)
JOIN product b (NOLOCK)
ON a.contract_id = b.contract_id
GROUP BY a.member_id,
b.product_category
The result time was beautiful in which the whole query took less than 2 min
and the correct number of rows in the result set. Problem is that the
products are dupplicated and I do not get an acurate count or products in a
product category for each member (I get a number larger than what is actually
designated for that particular member), so I put in the key word distinct.
This really droped performance down. the new query looks like so...
SELECT a.member_id,
b.product_category,
COUNT(DISTINCT b.product_id)
FROM member_contract a (NOLOCK)
JOIN product b (NOLOCK)
ON a.contract_id = b.contract_id
GROUP BY a.member_id,
b.product_category
I have introduced the following indexes on the product table trying to
optimize this query.
1st
contract_id,
product_category,
product_id
2nd
product_category,
contract_id,
product_id
There is a pk & clustered index on the prooduct table that is not the
product_id (it is simply an identity column that cannot be changed because I
have a full-text index on another unrelated column).
Table product has 3 million rows
Table member_contract has 33,000 rows
For those that have taken the time to read through my pain...Thank you I
really appreciate it.
Jared
Hi,
Please post DDL (CREATE TABLE statements) and sample data (INSERT
statements, just enough to duplicate your incorrect count issue); there may
be another way to re-write the query without the DISTINCT keyword that's
slowing you down.
"kirk1880" <kirk1880@.discussions.microsoft.com> wrote in message
news:C70994A5-DAB1-4501-BF48-8966FA33C73F@.microsoft.com...
> I have a job that takes 9 hours that I need to optimize. The job is an sp
> that populates a flat table. Here are the rules of the job.
> Members own contracts. Contracts are made up of diffrent products that
are
> grouped in product categories. A certain product may be in several
diffrent
> contracts. My job needs to display the member, the product category, and
a
> distinct count of products in that category for the member.
> I have optimized the query so far by taking it out of a cursor but I have
> one last query that is still hurting my performance.
> The first pass (which has a slight flaw) looked liked so...
> SELECT a.member_id,
> b.product_category,
> COUNT(b.product_id)
> FROM member_contract a (NOLOCK)
> JOIN product b (NOLOCK)
> ON a.contract_id = b.contract_id
> GROUP BY a.member_id,
> b.product_category
> The result time was beautiful in which the whole query took less than 2
min
> and the correct number of rows in the result set. Problem is that the
> products are dupplicated and I do not get an acurate count or products in
a
> product category for each member (I get a number larger than what is
actually
> designated for that particular member), so I put in the key word distinct.
> This really droped performance down. the new query looks like so...
>
> SELECT a.member_id,
> b.product_category,
> COUNT(DISTINCT b.product_id)
> FROM member_contract a (NOLOCK)
> JOIN product b (NOLOCK)
> ON a.contract_id = b.contract_id
> GROUP BY a.member_id,
> b.product_category
> I have introduced the following indexes on the product table trying to
> optimize this query.
> 1st
> contract_id,
> product_category,
> product_id
> 2nd
> product_category,
> contract_id,
> product_id
> There is a pk & clustered index on the prooduct table that is not the
> product_id (it is simply an identity column that cannot be changed because
I
> have a full-text index on another unrelated column).
>
> Table product has 3 million rows
> Table member_contract has 33,000 rows
> For those that have taken the time to read through my pain...Thank you I
> really appreciate it.
> Jared
|||CREATE TABLE member_contract
(
member_id int,
contract_id int
)
CREATE TABLE product
(
contract_id,
product_category_id,
product_id
)
INSERT member_contract
values
(1,1)
INSERT member_contract
values
(1,2)
INSERT member_contract
values
(2,1)
INSERT member_contract
values
(2,3)
insert product
values
(1, 1, 1)
insert product
values
(1, 1, 1)
insert product
values
(1, 1, 1)
insert product
values
(1, 1, 2)
insert product
values
(1, 2, 1)
insert product
values
(1, 2, 1)
insert product
values
(2, 1, 4)
insert product
values
(2, 2, 2)
"Adam Machanic" wrote:

> Hi,
> Please post DDL (CREATE TABLE statements) and sample data (INSERT
> statements, just enough to duplicate your incorrect count issue); there may
> be another way to re-write the query without the DISTINCT keyword that's
> slowing you down.
>
> "kirk1880" <kirk1880@.discussions.microsoft.com> wrote in message
> news:C70994A5-DAB1-4501-BF48-8966FA33C73F@.microsoft.com...
> are
> diffrent
> a
> min
> a
> actually
> I
>
>
|||Kirk,
How can a product change categories?
insert product
values
(1, 1, 2)
insert product
values
(1, 2, 1)
"kirk1880" <kirk1880@.discussions.microsoft.com> wrote in message
news:CE4E31B2-A028-42FB-8B3C-4E74DD983881@.microsoft.com...
> CREATE TABLE product
> (
> contract_id,
> product_category_id,
> product_id
> )
>
|||The short answer is bad database design, The long answer is a lose term (and
when I say lose I mean very lose), for the word category in which a product
can be part of a package like needles, or it can be cross referanced in the
subscription drug catagory.
Right now I am playing with grouping all items I need to remove the dups and
then aggregating on the result of the correlated query. So far it is not
helping much.
"Adam Machanic" wrote:

> Kirk,
> How can a product change categories?
> insert product
> values
> (1, 1, 2)
> insert product
> values
> (1, 2, 1)
>
> "kirk1880" <kirk1880@.discussions.microsoft.com> wrote in message
> news:CE4E31B2-A028-42FB-8B3C-4E74DD983881@.microsoft.com...
>
>
|||Kirk,
I also notice that you don't have a primary key on the product table, and
you have duplicate data. Is that the case in the real database? I don't
think there's going to be a way to speed up this process until you deal with
your data integrity issues.
"kirk1880" <kirk1880@.discussions.microsoft.com> wrote in message
news:3B37AB13-099F-4FAD-B874-3F192A2B7C13@.microsoft.com...
> The short answer is bad database design, The long answer is a lose term
(and
> when I say lose I mean very lose), for the word category in which a
product
> can be part of a package like needles, or it can be cross referanced in
the
> subscription drug catagory.
> Right now I am playing with grouping all items I need to remove the dups
and
> then aggregating on the result of the correlated query. So far it is not
> helping much.
sql

No comments:

Post a Comment