Monday, March 12, 2012

Performance & Threading

Hi,

I am a web developer using c# and we use threading extensively across our website. I noticed that we appeared to be geting major speed degradation when seperate threads would make calls to the same SPC (with different params). After creating a few diferent test scenarios I ended up using a console application that allows me to spawn n threads to a SQL Server and then tells me the results.

We first saw this issue on SQL Server 2000 but we recently upgraded our stage server to SQL Server 2005 so I ran the tests there,

The Stage server is a 4CPU dual core opteron box, so there are 8 cores. There was very little activity on the server, no more than 1 or 2% cpu utilization.

The call was executed using: 'EXEC DBName.dbo.TestSPC' and dbo was the owner of the SPC. The SPC contained a simple select on the primary key, the table has about 19000 rows:

SELECT * FROM People WHERE id between 10000 and 20000

When the SPC is called serially, it takes (in ms):

188, 203, 188, 234, 219, 250, 172, 203 - total time including connections = 1766ms

If I send all 8 threads at the same time, each with thier own connection the results are:

922, 906, 922, 1016, 1000, 1203, 1172, 1047 - total time including connections = 1313ms

My questions are:

    Is this expected behaviour on an 8 core 2005 server? Why would the first thread take nearly 5 times longer to complete? I could understand some extra overhead, but 500% seems excessive. Running serially, is it usual for an SPC to vary in execution time as seen above. I ask this because if I execute a diferent SPC which is much more complex multiple times in a row in Query Analyzer, then I see variations from 800ms up to 5000ms on our stage server.

Thanks for looking.

Jim

What are the specs on the client box you are submitting the requests from? If you are spawning 8 threads on a single core box, obviously each thread's work is going to be serialized to some degree on the client box, so that may be part of it.

Also, are the times listed based on times reported from your client application or the server? To see how the server is actually handling each request, I'd recommend running a server-side SQL trace to capture the duration, reads, writes, and resource usage on the server, not on the client...you should notice that regardless of the times reported on the client, the times to execute the procedures on the server are relatively static after the initial compilation, optimization, etc. of the procedure. Times reported on the client could include network latencies, client latencies, etc.

Also, in your scenario above, when using multiple threads you mention that each thread uses it's own connection...do the times include creation of the connection in addition to execution and response to the query itself? If so, in the single-threaded attempt are you doing the same (i.e. creating/destroying connection on each execution attempt)?

Finally, an obvious possible issue in the multi-threaded scenario is blocking on the server...each simultaneous request to access the same records on the server will be blocked until the previous request(s) have been processed...you'd want to monitor the server to see if spids are getting blocked by others during the execution...note that blocked time is included in trace duration data, so that could be misleading if it's an issue...

|||

What are the specs on the client box you are submitting the requests from? If you are spawning 8 threads on a single core box, obviously each thread's work is going to be serialized to some degree on the client box, so that may be part of it.

Those times were done using terminal server directly on the SQL box. It was quicker from our Stage IIS server which is a 2 cpu HT machine (3.0ghz I believe).

Also, are the times listed based on times reported from your client application or the server? To see how the server is actually handling each request, I'd recommend running a server-side SQL trace to capture the duration, reads, writes, and resource usage on the server, not on the client...you should notice that regardless of the times reported on the client, the times to execute the procedures on the server are relatively static after the initial compilation, optimization, etc. of the procedure. Times reported on the client could include network latencies, client latencies, etc.

I did this when I was testing from other boxes and the time diferences were negligable. However this was on the box itself, so there were no other factors such as client speed, or network latency that I can think of.

Also, in your scenario above, when using multiple threads you mention that each thread uses it's own connection...do the times include creation of the connection in addition to execution and response to the query itself? If so, in the single-threaded attempt are you doing the same (i.e. creating/destroying connection on each execution attempt)?

Connection creation times have already been removed. In single threaded mode, and in multi-threaded mode, each operation has it's own connection, although in single threaded mode, it will reuse the same connection from the conneciton pool. However these are purely times using a SQLAdapter on an open connection, as I also have a time reading on how long each connection takes to open. The code for the fill is.

DateTime fillStart = DateTime.Now;
SqlDataAdapter adp = new SqlDataAdapter(sql, conn);
DataSet ds = new DataSet();
adp.Fill(ds);
DateTime fillEnd = DateTime.Now;

Finally, an obvious possible issue in the multi-threaded scenario is blocking on the server...each simultaneous request to access the same records on the server will be blocked until the previous request(s) have been processed...you'd want to monitor the server to see if spids are getting blocked by others during the execution...note that blocked time is included in trace duration data, so that could be misleading if it's an issue...

What you mentioned here is what I thought would happen, the first thread would return in the same time and then subsequent threads would return faster than in single threaded mode, but not in the same time as 1 thread + connection + thread creation time. I dont understand why the first thread the box receives is held up by 400% of its execution time.

I will run another trace on the server whilst running the console directly from the server, I will add the options you mentioned above: duration, reads, writes, and resource usage on the server, plus I will add locks. I will also run perfmon at the same time to see what is going on on the box.

Thanks for your help, I'll post more results in a few hours.

|||

You've also got to look at caching, IO capabilities, memory and network. You are selecting 10,000 records that is a huge number, and will undoubtedly lead to a level of blocking. What transaction isolation level is being used to access the data, this can change the amount of blocking that will occur.

In a good design you should not see that level of disparity, which suggests you are hitting some limit, which I would suggest is due to the amount of data.

No comments:

Post a Comment