Percentage of total: Performance impact of different approaches to querying for 20,000 rows of data

I have two servers - a web server and a data server. I have a web service on
my web server that receives a request for information on anywhere from 1 to
60,000 products. I'm looking at a couple of different approaches for
querying the database for the product information on the 1 to 60,000
products.
1) Pass the product list into a SQL stored procedure as a single delimited
string and have that SQL stored procedure do a SELECT * FROM Products WHERE
Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all of the
details of how I'll get from my single delimited string to that SQL query,
but I assume I can do it.
2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000 calls
to a SQL stored procedure that returns data for a single product. My client
that's issuing the requests is multi-threaded and I'm comfortable that I can
set this up so that I'm issuing the 60,000 calls 5-10 or more calls at a
time.
The client that's requesting the 1 to 60,000 products is not very time
sensitive. It's a batch product feed process that runs 1 to 4 times a day.
The database I'm querying also supports a web site that does have real-time
requirements, so I don't want to structure my query in such a way that my
web service is negatively impacting the performance of my web site.
I'm trying to understand the pros and cons of the two approaches and would
appreciate any inputs. Thoughts I have so far:
Option 1 may perform better for the product feed client because it has one
large network transaction instead of 1 to 60,000 small network transactions.
But Option 1 may put a more intense load on the SQL server for a period of
time, potentially negatively impacting the real-time performance of the web
site.
With either solution I probably want to look at ways to ensure that the
request for data for 1 to 60,000 products is done at a lower priority than
real-time requests from the web site.
Any thoughts or suggestions?
Thanks,
Zoe
Zoe,
Have a look at this link to see how to pass in a delimited list and process
it appropriately. But I would probably create 2 or more stored procedures to
handle the different ranges of product requests so you can get a proper
query plan for each. For instance if you only had one product specified you
can easily do an index seek and get a good plan. But to retrieve 60K
products you may need to do a scan or Merge Join or even Hash Join. Due to
parameter sniffing if the first time the proc was run it had 50K products
you would get a plan for that many rows. But when you call it the next time
even with 1 product you will still get the same plan as before. So I would
have your app decide how may products there will be and call one of 2 or
more (depends on how many different query plans you may encounter) sps so
they each get their own plan. The case where they only lookup a single
product you can use a straight forward query with an =. More than 1 you need
to use dynamic sql or parse it into a table with a UDF.
http://www.sommarskog.se/arrays-in-sql.html
Andrew J. Kelly SQL MVP
Solid Quality Mentors
"Zoe Hart" <zoe.hart@.nospam.competitive.com> wrote in message
news:eNn1t%235UIHA.1480@.TK2MSFTNGP06.phx.gbl...
>I have two servers - a web server and a data server. I have a web service
>on my web server that receives a request for information on anywhere from 1
>to 60,000 products. I'm looking at a couple of different approaches for
>querying the database for the product information on the 1 to 60,000
>products.
> 1) Pass the product list into a SQL stored procedure as a single delimited
> string and have that SQL stored procedure do a SELECT * FROM Products
> WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all
> of the details of how I'll get from my single delimited string to that SQL
> query, but I assume I can do it.
> 2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000
> calls to a SQL stored procedure that returns data for a single product. My
> client that's issuing the requests is multi-threaded and I'm comfortable
> that I can set this up so that I'm issuing the 60,000 calls 5-10 or more
> calls at a time.
> The client that's requesting the 1 to 60,000 products is not very time
> sensitive. It's a batch product feed process that runs 1 to 4 times a day.
> The database I'm querying also supports a web site that does have
> real-time requirements, so I don't want to structure my query in such a
> way that my web service is negatively impacting the performance of my web
> site.
> I'm trying to understand the pros and cons of the two approaches and would
> appreciate any inputs. Thoughts I have so far:
> Option 1 may perform better for the product feed client because it has one
> large network transaction instead of 1 to 60,000 small network
> transactions. But Option 1 may put a more intense load on the SQL server
> for a period of time, potentially negatively impacting the real-time
> performance of the web site.
> With either solution I probably want to look at ways to ensure that the
> request for data for 1 to 60,000 products is done at a lower priority than
> real-time requests from the web site.
> Any thoughts or suggestions?
> Thanks,
> Zoe
>
|||Zoe Hart (zoe.hart@.nospam.competitive.com) writes:
> 1) Pass the product list into a SQL stored procedure as a single
> delimited string and have that SQL stored procedure do a SELECT * FROM
> Products WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not
> sure of all of the details of how I'll get from my single delimited
> string to that SQL query, but I assume I can do it.
Whatever, don't do exactly this. The time it would take to compile
that statement is amazing, particularly if you are on SQL 2000.
But there are other alternatives, as I discuss in my article
http://www.sommarskog.se/arrays-in-sql.html.
Andrew made an important point about the need for different plans due
to the number of elements in the list. I think the best approach is to
unpack the list into a temp table, as this will cause a recompile
and the actual join is likely to use the best plan.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
|||Thanks for all the good input. Our rough plan at this point is to have a
stored procedure that accepts an XML input that contains the 1-n SKUs. That
stored procedure will create a temporary table that with columns for the SKU
and the other data we intend to look up. The stored procedure will write the
1-n SKUs to the temporary table and then use UPDATE FROM to join the
temporary table to one or more tables that contain the data we need and
update the columns in the temporary table. We'll then SELECT * from the
temporary table FOR XML to get our results. We can either go with that
result as is or map it to a format we like better in the code that calls the
stored proc.
Thanks again.
Zoe
"Zoe Hart" <zoe.hart@.nospam.competitive.com> wrote in message
news:eNn1t%235UIHA.1480@.TK2MSFTNGP06.phx.gbl...
>I have two servers - a web server and a data server. I have a web service
>on my web server that receives a request for information on anywhere from 1
>to 60,000 products. I'm looking at a couple of different approaches for
>querying the database for the product information on the 1 to 60,000
>products.
> 1) Pass the product list into a SQL stored procedure as a single delimited
> string and have that SQL stored procedure do a SELECT * FROM Products
> WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all
> of the details of how I'll get from my single delimited string to that SQL
> query, but I assume I can do it.
> 2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000
> calls to a SQL stored procedure that returns data for a single product. My
> client that's issuing the requests is multi-threaded and I'm comfortable
> that I can set this up so that I'm issuing the 60,000 calls 5-10 or more
> calls at a time.
> The client that's requesting the 1 to 60,000 products is not very time
> sensitive. It's a batch product feed process that runs 1 to 4 times a day.
> The database I'm querying also supports a web site that does have
> real-time requirements, so I don't want to structure my query in such a
> way that my web service is negatively impacting the performance of my web
> site.
> I'm trying to understand the pros and cons of the two approaches and would
> appreciate any inputs. Thoughts I have so far:
> Option 1 may perform better for the product feed client because it has one
> large network transaction instead of 1 to 60,000 small network
> transactions. But Option 1 may put a more intense load on the SQL server
> for a period of time, potentially negatively impacting the real-time
> performance of the web site.
> With either solution I probably want to look at ways to ensure that the
> request for data for 1 to 60,000 products is done at a lower priority than
> real-time requests from the web site.
> Any thoughts or suggestions?
> Thanks,
> Zoe
>
|||Sounds like about the best available approach.
Otherwise, if the query list is really that long, you might upload a
flat file to the server and then import it with an SSIS package.
Note that the numbers, 20k, 60k, are very tiny numbers for data in
terms of modern SQL Server performance, they are just very large
numbers if you try to list them all as part of a SQL command!
Josh
On Tue, 15 Jan 2008 09:52:25 -0500, "Zoe Hart"
<zoe.hart@.nospam.competitive.com> wrote:

>Thanks for all the good input. Our rough plan at this point is to have a
>stored procedure that accepts an XML input that contains the 1-n SKUs. That
>stored procedure will create a temporary table that with columns for the SKU
>and the other data we intend to look up. The stored procedure will write the
>1-n SKUs to the temporary table and then use UPDATE FROM to join the
>temporary table to one or more tables that contain the data we need and
>update the columns in the temporary table. We'll then SELECT * from the
>temporary table FOR XML to get our results. We can either go with that
>result as is or map it to a format we like better in the code that calls the
>stored proc.
>Thanks again.
>Zoe
>"Zoe Hart" <zoe.hart@.nospam.competitive.com> wrote in message
>news:eNn1t%235UIHA.1480@.TK2MSFTNGP06.phx.gbl...
>

Friday, March 30, 2012

Performance impact of different approaches to querying for 20,000 rows of data

No comments:

Post a Comment

Percentage of total

Blog Archive

About Me