Hello:
We are testing SQL Server performance on XML columns and we have poor
perfomance, compared with another database (DB2). My questions are: a) is
there something we can do to improve performance; b) is there anything we can
do to decrease index size.
We used big XML documents (40 Kb), small - medium records (5000 records) and
JDBC.
Results are the following:
Insert 5000 records: 1,162,859 sec. (Oracle) vs. 383,953 sec. (DB2)
Counting 5000 records: 21,188 sec. (Oracle) vs. 1,265 sec. (DB2)
Index size: 1.66 GB (Oracle) vs. 700 Mb (DB2)
It's worth noting that given the size of XML documents, the full indexed
scenario (primary index and all secondary indexes defined) could not be
tested, because it took more than 45 min. to insert 5000 records and ended
with a disk full error, consuming more than 2.5 GB.
We tried two queries: insert and counting. To insert we used the following
instruction:
INSERT INTO TEST VALUES (?, ?) - first parameter is int, second a string
with XML content.
To count we used the following query:
SELECT COUNT(*) FROM (
SELECT info.value('
declare default element namespace "urn:hl7-org:v3";
(/ClinicalDocument/component/structuredBody/component[7]/section/entry/observation[code/@.code="550.90"]/effectiveTime/@.value)[1]
', 'nvarchar(1000)') YR FROM test) T
GROUP BY YR
ORDER BY YR DESC;
Index were created with the following instructions:
CREATE PRIMARY XML INDEX PIdx_TEST_info ON test(info)
CREATE XML INDEX PIdx_TEST_info_PATH ON test(info)
USING XML INDEX PIdx_TEST_info
FOR PATH
Thanks in advance for your help,
Javier
Hello Javier,
a.) Yes, defer creating the index until after all of your data has been inserted.
b.) Yes, only define the indexes that you know you are going to need.
Okay, so I know that A isn't a great answer, but honestly, it might be the
best choice you have here. Remember that the primary XML index persists the
node table to index space so every insert has to do the expensive parsing
and then maintain the index. You might be better of dropping and recreating
it after each bulk insert.
Schema binding make help the index be more space efficient due to typing,
but it looks like you're storing HL7 data. I can't imagine that SQL Server
is going to enjoy that schema very much.
Once you have the primary XML index beaten into submission, then you really
should decide and test if you need all of the other XML indexes or not. Its
very wasteful to define an index you don't need or use.
Its really not the size that's the issue here so much as it is the number
of elements and attributes within each instance and the relative amount of
useful data per node. Again if I'm guessing right in your 40k of data, you
probably have hundreds of elements and attributes. Since each of these becomes
a row in the node table, you're probably generating lots of rows with little
data in them.
I'd also have to wonder how much or little your JDBC driver is helping you
here, but that's topic for another NG.
Thanks!
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
|||Hi Javier
I would be interested in your test setup. Could you please contact me at my
email (remove the online from the address)?
Thanks
Michael
"Javier Diaz" <JavierDiaz@.discussions.microsoft.com> wrote in message
news:DF70B1FD-B8A5-4F6F-8CAA-865104B6572E@.microsoft.com...
> Hello:
> We are testing SQL Server performance on XML columns and we have poor
> perfomance, compared with another database (DB2). My questions are: a) is
> there something we can do to improve performance; b) is there anything we
> can
> do to decrease index size.
> We used big XML documents (40 Kb), small - medium records (5000 records)
> and
> JDBC.
> Results are the following:
> Insert 5000 records: 1,162,859 sec. (Oracle) vs. 383,953 sec. (DB2)
> Counting 5000 records: 21,188 sec. (Oracle) vs. 1,265 sec. (DB2)
> Index size: 1.66 GB (Oracle) vs. 700 Mb (DB2)
> It's worth noting that given the size of XML documents, the full indexed
> scenario (primary index and all secondary indexes defined) could not be
> tested, because it took more than 45 min. to insert 5000 records and ended
> with a disk full error, consuming more than 2.5 GB.
> We tried two queries: insert and counting. To insert we used the following
> instruction:
> INSERT INTO TEST VALUES (?, ?) - first parameter is int, second a string
> with XML content.
> To count we used the following query:
> SELECT COUNT(*) FROM (
> SELECT info.value('
> declare default element namespace "urn:hl7-org:v3";
> (/ClinicalDocument/component/structuredBody/component[7]/section/entry/observation[code/@.code="550.90"]/effectiveTime/@.value)[1]
> ', 'nvarchar(1000)') YR FROM test) T
> GROUP BY YR
> ORDER BY YR DESC;
> Index were created with the following instructions:
> CREATE PRIMARY XML INDEX PIdx_TEST_info ON test(info)
> CREATE XML INDEX PIdx_TEST_info_PATH ON test(info)
> USING XML INDEX PIdx_TEST_info
> FOR PATH
> Thanks in advance for your help,
> Javier
>
No comments:
Post a Comment