Wednesday, March 28, 2012

performance differences between queries...

I am updating a db with data from a file, in this data we have new info, info that has been updated and info that is to be removed from the db.
Now I was wondering which approach results in better performance/shorter executin time:

1. first update excisting values, then insert new ones, and last delete cancelled data

or

2. delete cancelled data and data that will be updated, then insert new and updated info

I get all this data from a file, in that file all rows are similar and there is one column that defines if the data is new, updated or to be deleted (thus all the updates also include the information for the enty that has not been altered).

// PatiPati, I think there are too many unknown variables to properly answer the question. For instance, how many columns are you updating compared to how many columns are not being updated? If you're updating one small column, and there are ten large columns, say Char(255), that are not being updated, then I believe an update would be faster than a "delete and insert all new rows". However, you would have to have the index to support finding the rows that need updating.

Personally, I always choose the way that makes the most sense logically. I would delete cancelled data, then update existing data and then insert the new rows.

Roby2222|||Thanks for you answer Roby2222, I hope this additional info will clarify things.

The amount of columns that are being updated differs from time to time, sometimes there is just one column that needs updating and sometimes almost all columns have to be updated...

Most columns are about the same size 'cause their max size is predefined in the system where it originates from.

Basically I was just wondering if it is worth doing 3 steps (delete+update+insert) when you get the same result with 2 steps (delet+insert)?

Pati|||Pati,

If you're doing a bulk insert, then the two step process will usually be quicker. If you're inserting row by row, then it's probably a toss up. If you really want to know the exact answer for your particular situation, I would run a test and benchmark both methods. You can use one scripts with all of the necessary statements, and then not how long it took to run in query analyzer.

Hope this helps,
Roby2222|||Thanks for your answers!

I think I'll go for the 2 step solution, 'cause it will probably result in simpler code (and therefore easier for someone else to read later if neccessary).

Perhaps I'll also test the performance manually as you suggested sometime later on when I'll have some time to spare! :)

Pati

No comments:

Post a Comment