Hi all,
Ok - I'm building a database around the content of an XML file I've been
supplied with - the process is straight forward enough...
The supplier has provided in the XML file far more data/information than I
actually need at this point in time, however, there's nothing to say that
"some" of this extra information might not be needed in the future...
I have therefore been adding columns like a mad man to incorporate the data
from the beginning, so if requested later on the impact on the system, and
the turn-around will be quicker.
I then wondered "wonder if this will have any side effects?"
Lets say, to make things simple I have a table with 5 columns that I do
need, I have a file with 40 items that could potentially be columns in this
table, I decide to add these now. I run my import process and populate my
table. Lets say the database size would have been 10Mb with the 5 columns,
but now, because of the types of data, and the sizes, its not 350Mb - would
this have any impact on SQL Server's performance when queries are run
against it? Obviously the sizes above are just "figures plucked from the
air" and are not actual sizes of my database, I was just trying to show a
big markup.
I would "guess" that if I wasn't using the other columns in the queries then
it shouldn't make any difference...
Any information would be appreciated - thanks in advance,
Regards
RobRob Meade wrote:
> Hi all,
> Ok - I'm building a database around the content of an XML file I've been
> supplied with - the process is straight forward enough...
> The supplier has provided in the XML file far more data/information than I
> actually need at this point in time, however, there's nothing to say that
> "some" of this extra information might not be needed in the future...
> I have therefore been adding columns like a mad man to incorporate the dat
a
> from the beginning, so if requested later on the impact on the system, and
> the turn-around will be quicker.
> I then wondered "wonder if this will have any side effects?"
> Lets say, to make things simple I have a table with 5 columns that I do
> need, I have a file with 40 items that could potentially be columns in thi
s
> table, I decide to add these now. I run my import process and populate my
> table. Lets say the database size would have been 10Mb with the 5 columns
,
> but now, because of the types of data, and the sizes, its not 350Mb - woul
d
> this have any impact on SQL Server's performance when queries are run
> against it? Obviously the sizes above are just "figures plucked from the
> air" and are not actual sizes of my database, I was just trying to show a
> big markup.
> I would "guess" that if I wasn't using the other columns in the queries th
en
> it shouldn't make any difference...
> Any information would be appreciated - thanks in advance,
> Regards
> Rob
>
Proper indexing and normalization will affect your table performance
more than the number of columns will. I hope you're not just building a
table that mimics the structure of the XML file. For instance, consider
an XML file that contains a list of customers, a billing address for
each customer, and for some customers, an optional shipping address.
Resist the temptation to build a single table that contains two sets of
address fields - one for billing, one for shipping. Instead, build a
"Customers" table that contains identifying information for the
CUSTOMER, such as customer ID, name, account number, etc.. Build an
"Addresses" table that contains information for each ADDRESS, such as an
address ID, street address, city, state, country, etc.. Build a third
table, "CustomerAddresses", that contains maybe four columns; customer
address ID, customer ID, address ID, and address type. This table
creates a 1-to-many link between a customer and its address(es). Even
the address type could be split off into an AddressTypes table,
containing an AddressType ID and a text label describing the AddressType
(billing, shipping, etc).|||"Tracy McKibben" wrote...
> Proper indexing and normalization will affect your table performance more
> than the number of columns will.
okey dokey
> I hope you're not just building a table that mimics the structure of the
> XML file.
Nope...one element "sku" has a number of attributes, these are all in the
same table, however, each sku can also have a number of "objectives",
"topics", "members" and so on...these are currently in separate tables.
I have used the SKUID to link between these tables and the sku table.
It was really more of a question about whether or not the addition (possibly
excessive - as some of it might not be used) data would have an adverse
effect.
Thanks for your reply Tracy.
Regards
Rob|||Rob Meade (ten.bewdoowsgnikNO-SPAM@.edaem.bbor) writes:
> Ok - I'm building a database around the content of an XML file I've been
> supplied with - the process is straight forward enough...
> The supplier has provided in the XML file far more data/information than I
> actually need at this point in time, however, there's nothing to say that
> "some" of this extra information might not be needed in the future...
> I have therefore been adding columns like a mad man to incorporate the
> data from the beginning, so if requested later on the impact on the
> system, and the turn-around will be quicker.
> I then wondered "wonder if this will have any side effects?"
> Lets say, to make things simple I have a table with 5 columns that I do
> need, I have a file with 40 items that could potentially be columns in
> this table, I decide to add these now. I run my import process and
> populate my table. Lets say the database size would have been 10Mb with
> the 5 columns, but now, because of the types of data, and the sizes, its
> not 350Mb - would this have any impact on SQL Server's performance when
> queries are run against it? Obviously the sizes above are just "figures
> plucked from the air" and are not actual sizes of my database, I was
> just trying to show a big markup.
The answer is, as it often does when it comes to performance: it depends.
However, here the "it depends" has a fairly straight-forward qualification.
If you have good indexing, the cost of the extra column is moderate.
The database will grow faster, and you must pay more attention to running
backup of the database and the transaction. It will also be more important
to keep fragmentation in check. But the overall cost for performance
does not have to be more than, say, 10%.
On the other hand, if you are lax about indexing, the cost for the
extra columns can be significant (assuming that they are in the same
tables as the rest). Since the rows are wider, there are fewer rows
per page, and more pages have to be read to scan tables.
Personally, I would consider the strategy of only adding the columns
you need, but add a table to save the XML documents in, so if need
occurs later, you can recover more fields from the XML document. This
is particularly attractive, if you are on SQL 2005 where you can store
the data in column of the xml data type. The performance cost for this
extra table is nil, but of course the increased database size takes
some administrative toll as discussed above.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Rob Meade wrote:
> Nope...one element "sku" has a number of attributes, these are all in the
> same table, however, each sku can also have a number of "objectives",
> "topics", "members" and so on...these are currently in separate tables.
> I have used the SKUID to link between these tables and the sku table.
> It was really more of a question about whether or not the addition (possib
ly
> excessive - as some of it might not be used) data would have an adverse
> effect.
Sounds like you're in good shape then. Make sure you have good indexes,
including covering indexes if appropriate, and you should be fine
performance-wise.
No comments:
Post a Comment