Percentage of total: Performance gains when INSERTing multiple rows

I apologize since this seems to be a fundamental question, but I did try to search and there seems to be something wrong - after clicking on the "search" button, the page just will not update... I even tried "advanced search" but no luck there, either.

Anyway, my question is about writing multiple data rows to tables in a SQL Server database. Currently I'm using embedded SQL from a VC++ 2005 .Net program to write data to a SQL 2005 Express server (either on the same PC or on another, via TCP). As data becomes available, I issue INSERT statements, one by one. One row is about a dozen columns, typically 8 bytes each, for each of the few tables in the database. In most cases, I have several rows of data available at the same time. Those rows come at the rate of around 200-1000 per second for the different tables. With that setup, my SQL server is not able to keep up - the data ends up getting buffered in the program, waiting for the server, the server process uses just about all the CPU cycles it can get, etc.

I'm reading the "SQL in a nutshell" book from O'Reilly but it's mostly language oriented, it doesn't say much about performance improvements. It would seem natural that one could improve the performance. For example, when I do a SELECT query, the data comes very quickly, which makes me think that it is the overhead of making those individual calls with small amount of data. I would think that I should be able to request that multiple rows are written at the same time with a single SQL statement, and that this would improve the performance. However, I have no idea how to do that.

I did try one thing - enlisting the sequence of INSERTs into a single transaction, thinking it will all get buffered and only get written to the server after the Commit command is issued. I did that but it doesn't seem to help. What should I try to alleviate this problem? One thing to consider: although I use SQL Server for testing, I am trying to keep compatibility with other databases, e.g., ODBC to any available data source, including MS Access over Jet. I would love it if the solution to this were compatible (i.e., not involve any Transact SQL or other vendor-specific tricks).
I thank you in advance for your assistance!

KamenWith that setup, my SQL server is not able to keep up

That's an incorrect assumption...it's your code

You should have a stored procedure and call that sproc and set parameters instead and make the call to that.

If you want, you could "bundle" the inserts perhaps and have the sproc spin through it and do the inserts.

But I would try a simple insert sproc first|||You mean - still issue one SQL command for each row of data, but instead of using INSERT, I should call a stored procedure with the data from that record as parameters? Why would this be much faster? I'm not sure I understand what you mean. I was hoping to be able to stay away from procedures since I'm using ADO.Net and its handling of parameters is a little scary.
Kamen|||Brett knows way more than me about this but I admit I don't know how that is likely to improve performance (perhaps merely my ignorance :)). It is however MS recommended good prcatice (and ADO.NET in conjunction with SQL Server sprocs & parameters is easy peasy - you shouldn't have any problems).

Also - the transaction thing is really about integrity and keeping your changes atomic rather than about performance.

SQL Server can insert rows very fast - you should be able to insert rows much faster than 200 per second - so there might be another issue. Out of curiosity - does the SQL do anything other than simply insert data into a single table or is it more complex? If the former - is it a heap table or does it have a clustered index? If the latter (:D) is the index monotonically increasing (i.e. the next value is always greater than the preceding value, such as an identity column or "date run" column)?

.. and finally - ODBC connections are not as fast as OLE. The connection you use wouldn't really make your app specific to one db engine or another (for example you could write a class to handle making the connections and then make one single change to deal with BE changes from SQL Server to Orcale to Access).

HTH|||Are you using newrow (http://msdn2.microsoft.com/en-us/library/z16c79x4.aspx)? You can call it repeatedly to add all of the rows from a single group to the object before you send them to SQL, and it is a lot more efficient than doing things one at a time.

-PatP|||Are you using newrow (http://msdn2.microsoft.com/en-us/library/z16c79x4.aspx)? You can call it repeatedly to add all of the rows from a single group to the object before you send them to SQL, and it is a lot more efficient than doing things one at a time.Is that not a behind the scenes violation of 1NF Pat? (http://www.dbforums.com/showthread.php?t=1604936) :angel: (http://www.dbforums.com/showthread.php?t=1604936)|||insert
into mytable
( col1, col2, col3, col4 )
values
( 23, 57, 'A', 'Tom' )
, ( 14, 42, 'B', 'Dick' )
, ( 9, 37, 'C', 'Harry' )row constructors like this are part of standard sql

one INSERT statement, multiple sets of values

has been supported by MySQL for ages, mebbe one o' these days SQL Server will get around to supporting it too

:)|||Is that not a behind the scenes violation of 1NF Pat? (http://www.dbforums.com/showthread.php?t=1604936) :angel: (http://www.dbforums.com/showthread.php?t=1604936)No, changing the schema (either physically or logically) would be a violation of 1NF, but using a different transport methodology is not a violation at all.

The thread you referenced is changing the logical schema (actually abandoning the physical schema) by allowing a typeless list of arbitrary length to be passed to a stored procedure as a parameter. That is a no-no in the world of referential integrity, because it basically blows the schema completely out of the water... It allows the calling code to completely disregard any schema.

Using a record set on the client, populating it with some arbitrary number of rows (which the recordset will force to conform to all of the constraints it can check on the client side, and all constraints without exeception on the server side), then using an alternative transport mechanism (BCP instead of parsed Transact-SQL) isn't a problem from the perspective of Referential Integrity. True, you can hit some "gotchas" if you have schema bindings that the client can't confirm when the row is added to the recordset, but those will still be stopped cold before the rows are committed to disk.

I'm a rock-solid believer in referential integrity. I can't imagine conditions in which I'd suggest something less than 3NF, and I'm usually pickier than that. While this might permit short term violations (while the data is in memory, but not yet sent to the server), I don't know of any way to prevent that... Until the data hits the database, the user can speculate about anything that suits them, whether it meets the constraints or not. The data isn't "real" to me until it is accepted by the server.

-PatP|||insert
into mytable
( col1, col2, col3, col4 )
values
( 23, 57, 'A', 'Tom' )
, ( 14, 42, 'B', 'Dick' )
, ( 9, 37, 'C', 'Harry' )row constructors like this are part of standard sql

one INSERT statement, multiple sets of values

has been supported by MySQL for ages, mebbe one o' these days SQL Server will get around to supporting it too

:)Even though it has nothing to do with the point I was making, I hardly think row constructors are a part of the SQL standard...

The ratified standard is currently SQL-92, and I can't find any reference to row constructors in it. The proposed standard (sometimes called SQL-99, but I think that's become something of a mis-nomer by this point) does include a definition, but until X3H2 or a successor meets and ratifies it, SQL-99 is no more a standard than any vendor specific extensions are (since the vendor could just as validly claim that the committee will ratify their dialect as the new standard over any other other proposed standard).

I agree that MySQL supports the row constructor, and that several other SQL dialects support it too. I think that Microsoft SQL ought to support that syntax. I haven't seen anything that I recognize as a standard that includes any formal defintion of a row constructor yet.

What I was talking about is using relatively standard .NET constructs to work around several of the "achilles tendon" points in the process of moving data from client to server. You lose a lot of CPU cycles converting data into SQL syntax to send to the server, and often lose even more cycles on the server side converting the syntax back into binary for storage. You also usually have more bytes to send down the wire, which takes more time too. By sidestepping all of those conversions and overhead, you can speed up the process significantly. That is what using a DataTable object permits your code to do.

-PatP|||insert
into mytable
( col1, col2, col3, col4 )
values
( 23, 57, 'A', 'Tom' )
, ( 14, 42, 'B', 'Dick' )
, ( 9, 37, 'C', 'Harry' )row constructors like this are part of standard sql

one INSERT statement, multiple sets of values

has been supported by MySQL for ages, mebbe one o' these days SQL Server will get around to supporting it too

:)
You can use this though ya?
insert
into mytable
( col1, col2, col3, col4 )
SELECT
23, 57, 'A', 'Tom'
,UNION SELECT 14, 42, 'B', 'Dick'
,UNION SELECT 9, 37, 'C', 'Harry'|||yes, except for the dangling commas ;)

and i would use UNION ALL instead of UNION|||yes, except for the dangling commas ;)

and i would use UNION ALL instead of UNIONOops - yup and yup. Can't have inappropriate dangling.|||Thank you for the input. Let me give you a little more details about what I'm doing.

I have a variable amount of tables to which I write data. On each session, a new set of tables may be defined. With the current test, I have three tables with similar schema: a DATETIME field and a FLOAT field, which together form the primary key of the table. The former is also a foreign key, which is different for each session but remains the same within a session. (I know - not very optimal but helps find data afterwards.) The FLOAT field is time, so - yes, monotonically increasing, albeit not an integer, and thus - not consecutive. Then, after that, each table will have a variable number of REAL columns (in this case - 1 to 8). One of the tables gets 1000 INSERTs per second, the other two - 200.

The way I use ADO.Net is different than the usual - I do not use DataSets at all - I form SQL statements and use ExecuteNonQuery or the like. The reason is that I found some immaturities in ADO.Net and decided to keep it simple. The reason why I say that the SQL server cannot keep up with the amount of INSERTs I send is that the Task Manager shows nearly 100% CPU utilization of slqsrvr.exe as I do that. That's on a SQL server Express on a couple of older machines (P4 1.8GHz). I tried our company server (dual Xeon @. 2GHz, SQL 2003 SBS) and the utilization was around 30% but I think it is configured to not use more CPU time; it still was lagging behind, though.

The reason I thought using transactions would help with performance is because of a side effect - I read that all SQL commands are buffered in memory and then the whole thing is written at once. I thought that might help performance.

Anyway, I am willing to try various solutions, I'd just rather avoid trying things that aren't likely to help, which is why I appreciate the wide discussion here.

Kamen|||Actually, as of now, I'm mostly intrigued by the solution:

INSERT INTO NumData1_5_0 ( StartTime, Timestamp, Ch0, Ch1)
SELECT 8/14/2006 10:03:53 AM, 1.38956571296073, -0.1896683, -0.3793676
UNION ALL SELECT 8/14/2006 10:03:53 AM, 1.39457221518377, -0.2204517, -0.4409377
UNION ALL SELECT 8/14/2006 10:03:53 AM, 1.39957340946964, -0.2509933, -0.5020171
.......
.......

As long as it is very likely that it will improve performance. It would actually be quite easy to try. One of the attractive features of this approach is that the number of rows can vary each time.

Kamen|||don't forget to quote your datetime values, and what kind of a timestamp value is 1.399?|||Going back to Brett's suggestion, you are very likely having all of the insert statements individually compiled. In perfmon, check to see how many Batche Requests/sec vs how many SQL Compilations/sec you are getting (both under SQL Server:SQL Statistics). I would wager you are getting a nearly 1-1 ratio, which is bad. Very bad. Using the stored procedure, you should see the compilations reduced to 0.|||It is likely to improve performance.

However - I would also check out your clustered index on your NumData1_5_0 table (and any other affected tables). BoL is a good place to start.

You should be particularly interested in the general implications and principles of Clustered Indexes and specifically Page Splits and Fill Factor. Remember - your primary key might not necessarily be the clustered index (though it often is).
:)|||don't forget to quote your datetime values, and what kind of a timestamp value is 1.399?
Yes, I do quote the DATETIME value - I just forgot to put the quote marks in this example (it would give me a SQL error otherwise). The timestamp is in seconds (and fractions thereof), very convenient. ;) But possibly slower. :confused:
Kamen|||Going back to Brett's suggestion, you are very likely having all of the insert statements individually compiled. In perfmon, check to see how many Batche Requests/sec vs how many SQL Compilations/sec you are getting (both under SQL Server:SQL Statistics). I would wager you are getting a nearly 1-1 ratio, which is bad. Very bad. Using the stored procedure, you should see the compilations reduced to 0.

I will check on that and report back. One would think that a simple INSERT (one with constant data) would work as fast as an equivalent stored procedure but what do I know - that's why I'm asking the experts. :cool:

Kamen|||It is likely to improve performance.

However - I would also check out your clustered index on your NumData1_5_0 table (and any other affected tables). BoL is a good place to start.

You should be particularly interested in the general implications and principles of Clustered Indexes and specifically Page Splits and Fill Factor. Remember - your primary key might not necessarily be the clustered index (though it often is).
:)

I'm afraid I have no idea what you are talking about. Before I do some reading, I won't even know what a "clustered index" would be. Not to mention "BoL", "Page Splits" and "Fill Factor".

Kamen|||BOL = Books Online

your very best new friend ;)|||Going back to Brett's suggestion, you are very likely having all of the insert statements individually compiled. In perfmon, check to see how many Batche Requests/sec vs how many SQL Compilations/sec you are getting (both under SQL Server:SQL Statistics). I would wager you are getting a nearly 1-1 ratio, which is bad. Very bad. Using the stored procedure, you should see the compilations reduced to 0.

Yep, just checked - your wager would have paid off: the compilations are about the same number as the batch requests. So, is it 100% certain that the call to a stored procedure with a bunch of parameters is a lot less expensive than the compilation of a simple INSERT ?

Kamen|||Eliminating the compilations will reduce the run time by about 40 - 50% by my tests (your milage may vary). The CPU usage may be about the same, but the throughput will be better.|||Great. How about multiple data rows with a single procedure (as in the above example) - would that help, too?
Kamen|||At least when I see this problem, the conversion to/from syntax is 85% or more of the total delay. That's why I suggested switching to a Table object, because it allows you to completely bypass both conversions.

Keep in mind that your milage may vary, but typically you'll stall out at around 700 batches per second no matter how brawny your server gets. The only solution that I've found is to bypass the parsing altogether. It gets your compiles down to zero, gets close to the theoretical max speed on the wire. and alllows you to transfer batches of rows at once (allowing you to minimize the setup/teardown cost for a connection). On the boxes that I maxed out at 758 SQL batches per second, I was able to transfer 6800 rows per second for over an hour.

-PatP|||All right, I'm back on the "database side", that is, I had a thread synchronization issue to resolve in the application in mean time, but this issue seems to be connected so I need to work on it now.

So, Pat, you are saying that when using the ADO.Net methods, such as creating a DataAdapter, then setting its Connection and default query, etc., then creating a DataSet, putting the data in there, and finally calling Adapter->Update(), then SQL per se gets never involved? No SQL generation, no SQL compilation? If that is the case, it would be a compelling argument for me to try and use those methods.

On that assumption, have you used ADO.Net with C++ or with other languages, such as C# and VB.Net? Either way, I'd like to abuse your patience with some questions. There are so many ways to go about this. For example, in the creation of the Adapter, do you go by fill or by manually building the query? I'd like to keep it generic and not use SqlDataAdapter but DbDataAdapter, which has limited capabilities but that way I can stay compatible with ODBC sources, such as MS Access/Jet without writing separate code for each provider. Then, assuming you do not retrieve the whole contents of the table when creating the adapter, do you just create a DataSet with the changes and update with that?

Kamen|||Eliminating the compilations will reduce the run time by about 40 - 50% by my tests (your milage may vary). The CPU usage may be about the same, but the throughput will be better.

Curious update, MCrowley, with the full load that I had both batches/sec and compilations/sec were around 700/sec, just as Pat predicted. Which must be a SQL server limitation. However, when I had only selected the base-speed channels, that is only the ones that update 200/sec, my batch requests went to around 1600/s, while the compilations were around 240/sec. How is that possible - my SQL execute commands were sent around 200/sec, where did that 1600/sec come from?:confused:

Kamen|||If you run profiler, you will very likely see a lot of "cursor* overhead" as I call it. These are things that SQL Server does as part of its natural routine, but we have little control over. In Profiler, these show up as sp_prepare, sp_opencursor, etc.. These are generated by the ADO.NET methods.

* Not a T-SQL Cursor.|||Back to using ADO.Net for the above purpose. Here is the biggest question: how do you keep the data transfer between the application and the server minimal? Reading the ADO.Net documentation, the standard procedure is to fill a DataSet with data from one or more tables, then any change made to the DataSet in the application can be mirrored to the database by using Adapter->Update(). However, I could have millions of records in each table, I don't want to retrieve and keep in memory all of those! What is the best way about this?

Kamen|||At least when I see this problem, the conversion to/from syntax is 85% or more of the total delay. That's why I suggested switching to a Table object, because it allows you to completely bypass both conversions.

Keep in mind that your milage may vary, but typically you'll stall out at around 700 batches per second no matter how brawny your server gets. The only solution that I've found is to bypass the parsing altogether. It gets your compiles down to zero, gets close to the theoretical max speed on the wire. and alllows you to transfer batches of rows at once (allowing you to minimize the setup/teardown cost for a connection). On the boxes that I maxed out at 758 SQL batches per second, I was able to transfer 6800 rows per second for over an hour.

-PatP

Pat,
I did experiment with ADO.Net methods. However, I couldn't get anywhere near the numbers you are referring to. The best I could do was 600-650 rows per second. Which matched the number of Batches / sec from the SQL Statistics in the Performance monitor. The reported number of SQL Compilations / sec was zero. I expected that the number of batches would be reduced. For each call to Adapter->Update() I had 50 rows to be sent, so I expected that there would be 50 times less batches. It appears that the Update function issued as many batches as there were changed rows in the DataSet. I came across a headline for an article in MSDN about batch processing in ADO.Net that I must read; there must be a trick. Here is the test code that I used:

String ^DTblName = gcnew String("DlNumericData0_1_0Table");
m_ProvFactory = DbProviderFactories::GetFactory("System.Data.SqlClient");
m_Connection = m_ProvFactory->CreateConnection();
m_Connection->ConnectionString = "Data Source=tcp:SAKORSERVER;Initial Catalog=DynoLAB;Integrated Security=True";
m_Connection->Open();
m_Adapter = m_ProvFactory->CreateDataAdapter();

m_Adapter->SelectCommand = m_Connection->CreateCommand();
m_Adapter->SelectCommand->CommandText = "SELECT * FROM " + DTblName + " WHERE ChTimestamp=0";

m_Adapter->InsertCommand = m_Connection->CreateCommand();
m_Adapter->InsertCommand->CommandText = "INSERT INTO " + DTblName + "(SessionStartTime, ChTimestamp, Channel001data, " +
"Channel002data, Channel003data, Channel004data, Channel005data, Channel006data, Channel007data, Channel008data) " +
"VALUES (@.SessionStartTime, @.ChTimestamp, @.Channel001data, @.Channel002data, @.Channel003data, @.Channel004data, " +
"@.Channel005data, @.Channel006data, @.Channel007data, @.Channel008data)";
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.SessionStartTime", SqlDbType::DateTime, 0, "SessionStartTime"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.ChTimestamp", SqlDbType::Float, 0, "ChTimestamp"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel001data", SqlDbType::Real, 0, "Channel001data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel002data", SqlDbType::Real, 0, "Channel002data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel003data", SqlDbType::Real, 0, "Channel003data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel004data", SqlDbType::Real, 0, "Channel004data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel005data", SqlDbType::Real, 0, "Channel005data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel006data", SqlDbType::Real, 0, "Channel006data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel007data", SqlDbType::Real, 0, "Channel007data"));
m_Adapter->InsertCommand->Parameters->Add(gcnew SqlParameter("@.Channel008data", SqlDbType::Real, 0, "Channel008data"));

DataSet ^myDataSet = gcnew DataSet;
m_Adapter->Fill(myDataSet, DTblName);
while(m_bRun)
{
myDataSet->Clear(); // Reset DataSet every time so it doesn't grow forever
for(int i=0; i<50; ++i)
{
DataRow ^NRow = myDataSet->Tables[DTblName]->NewRow();
NRow["SessionStartTime"] = "8/18/2006 11:44:08 AM";
NRow["ChTimestamp"] = (double)Stopwatch::GetTimestamp()/Stopwatch::Frequency;
NRow["Channel001data"] = 1.0 + double(i)/10;
NRow["Channel002data"] = 2.0 + double(i)/10;
NRow["Channel003data"] = 3.0 + double(i)/10;
NRow["Channel004data"] = 4.0 + double(i)/10;
NRow["Channel005data"] = 5.0 + double(i)/10;
NRow["Channel006data"] = 6.0 + double(i)/10;
NRow["Channel007data"] = 7.0 + double(i)/10;
NRow["Channel008data"] = 8.0 + double(i)/10;
myDataSet->Tables[DTblName]->Rows->Add(NRow);
}
m_Adapter->Update(myDataSet, DTblName);
}

I hope to get to the bottom of this.

Kamen|||Well, I'm on to something. Just setting:
m_Adapter->UpdateBatchSize = 50;
improved it to 1400 rows/s (batches/sec) but still - I'm seeing every row sent as a batch request on the server, and I'm nowhere near Pat's 6800 rows per second!
Kamen|||Are the tables indexed? (I think Pootle Flump touche don this). It depends on the frequency of your batch hits of course but would it be feasible to delete the indexes, do the updates an then recreate the indexes after the update. This might speed things up a bit.|||Are the tables indexed? (I think Pootle Flump touche don this). It depends on the frequency of your batch hits of course but would it be feasible to delete the indexes, do the updates an then recreate the indexes after the update. This might speed things up a bit.

ummmm, no.

if your table is of any considerable size, removing and adding an index can take considerable time. you would be trading the overhead of adding some records to your index pages for completly rebuilding your indexes. If your database is not large now, one day it will be and your indexes will grow with the tables. this solution is not pactical, unlikely to solve your problem now and will definitly not scale as your database does.|||FYI - Pat is off line at the mo but we are expecting him back any time and I'm sure he will contribute. I logged off this thread as I had got it wrong re Brett's suggestion - I have been self flagellating ever since.

Anyway there are many other things that would need to be equal to match Pat's figures and indexes are one of them (hardware setup being another for example).

Just to cover the terms you weren't familiar with:
The clustered index determines the order that data is physically stored. It is stored in pages so... if your clustered index monotonically increases then each new insert goes at the end. If you insert a value into the clustered index that is between other values then SQL Server has to squeeze it in. Sometimes there is a gap and sometimes there isn't. When there isn't SQL Server has to make space. It does this by splitting a page and popping the data from this one page onto two. This has an overhead and it was this overhead I was thinking of. The fill factor is how full sql server fills each page when the index is created\ rebuilt. The lower the fill factor then the less data per page and so the more room there is to cram in more data before SQL Server needs to start splitting pages.
This is a nice discussion I read recently though you could read about indexes all day and still have more to learn:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70451

Also (as intimated above) if you are updating the values of columns that are contained in other indexes then you have the overhead of those index updates too.

Anyway - I think this is secondary (tertiary even) to the points made by the other learned posters.|||Thanks a lot for the input, guys! I may still be off the target, but with your help I have come a long way towards getting this done. I don't mean to abuse your patience but in case you want to look at this, here's the database schema:

Fixed tables (created in the beginning):

CREATE TABLE DlIndexTable (SessionStartTime DATETIME NOT NULL PRIMARY KEY, SchemaID INTEGER NOT NULL)

CREATE TABLE DlChannelInfoTable (
ChannelID INTEGER NOT NULL PRIMARY KEY,
ChannelName VARCHAR (255) NOT NULL,
ChannelUnits VARCHAR (32),
ChannelOrder INTEGER NOT NULL,
ChannelSampleRate REAL NOT NULL)

CREATE TABLE DlClusterTable (
ClusterID INTEGER NOT NULL,
ChannelID INTEGER NOT NULL REFERENCES DlChannelInfoTable,
CONSTRAINT ClstPriKey PRIMARY KEY (ClusterID, ChannelID))

CREATE TABLE DlSchemaTable (
SchemaID INTEGER NOT NULL,
ClusterID INTEGER NOT NULL REFERENCES DlChannelInfoTable,
CONSTRAINT SchPriKey PRIMARY KEY (SchemaID, ClusterID))

CREATE TABLE DlTextDataTable (
SessionStartTime DATETIME NOT NULL REFERENCES DlIndexTable,
ChTimestamp " + SysDblType + " NOT NULL,
Channel01data VARCHAR (255), Channel02data VARCHAR (255), Channel03data VARCHAR (255), Channel04data VARCHAR (255),
Channel05data VARCHAR (255), Channel06data VARCHAR (255), Channel07data VARCHAR (255), Channel08data VARCHAR (255),
Channel09data VARCHAR (255), Channel10data VARCHAR (255), Channel11data VARCHAR (255), Channel12data VARCHAR (255),
Channel13data VARCHAR (255), Channel14data VARCHAR (255), Channel15data VARCHAR (255), Channel16data VARCHAR (255),
CONSTRAINT TxtPriKey PRIMARY KEY (SessionStartTime, ChTimestamp))

There are also dynamically-created tables of the form:

CREATE TABLE DlNumericData0_1_0Table (
SessionStartTime DATETIME NOT NULL REFERENCES DlIndexTable,
ChTimestamp float NOT NULL,
Channel001data REAL NOT NULL, Channel002data REAL NOT NULL, Channel003data REAL NOT NULL, Channel004data REAL NOT NULL, Channel005data REAL NOT NULL, Channel006data REAL NOT NULL, Channel007data REAL NOT NULL, Channel008data REAL NOT NULL,
CONSTRAINT NumPriKey0_1_0 PRIMARY KEY (SessionStartTime, ChTimestamp))

During the performance cycle (i.e., when performance matters), the only operations are INSERTs into one or more (I expect there to be usually three-four) tables of the latter kind. Since I control 100% of this process, I could technically not have any constraints and not check referential integrity at all. But that is not a very "elegant" solution and I'm sure you guys would be the last to recommend such thing. I could technically not have the indexes at run time and then, when performance does not matter ("offline"), I could rebuild them but that also is not very elegant. And yes - those tables will eventually get millions of rows (over many sessions).

I could live with ~1500 rows per second. After all, the current solution stores data in a flat ASCII file and tops out at 200 lines / sec. But I thought I could do much better than that and kinda committed to very high throughput... :o Anyway, the testing continues.

Kamen|||One thing is for sure - performance depends on factors external to the code of the front and back ends. My original test was from my old P4 1.8GHz PC to the company server (dual Xeon 2.4 GHz, 1GB RAM, SQL 2000). Now I made a test from and to my local PC (SQL Express 2005) and the throughput went to 1700 /sec. I will implement this approach in the production code and see what kind of performance I ultimately get.
Kamen|||I've replied to this thread a couple of times, but my messages don't seem to stick very well.

The code that I used to test with was an early implementation of the Table object in the .NET libraries. The real advantage there is that large copies (I was doing a few hundred rows at a shot) were done using the BCP library instead of the more standard ADO library. This got us past the need to convert binary data to text to transmit as SQL, which the server then needed to parse to store as binary, which was quite wasteful.

I'm pretty sure that I've still got the test code around here, but I don't have the time or the resources to retrieve it right now... I'll see if I can get it later in the week for you.

-PatP|||Thanks, Pat. Don't worry about it - I'm using ADO.Net 2.0, which is not even compatible with the previous ADO.Net, much less with older stuff. And your situation does seem quite unique: your 6800 rows/s might be unreachable in my case but even the 1700 I got could be enough.
Kamen

Wednesday, March 28, 2012

Performance gains when INSERTing multiple rows

No comments:

Post a Comment

Percentage of total

Blog Archive

About Me