Percentage of total: Performance cost of joins

Wednesday, March 21, 2012

Performance cost of joins

We are in the stage of designing a database for our web application. We have two design options, one is to use normalized tables and using joins to get data from more than one table, the other is to duplicate fields in tables and avoid using joins.
The main purpose is to get high performance from the system. the system is supposed to produce reports that contain information from about 100,000 records from 5 or 6 related tables.
which of these approaches is better from the performance point of view? What is the performance cost of using joins over getting data from one table?Generally, one should not blindly assume that it's always 'normalized and slow' vs. 'de-normalized and fast’. Normalized databases when reading data typically perform better too provided that the optimizer is able to benefit from useful indexes for the frequent joins. Having said that, obviously there are corner cases where de-normalization does improve things for certain data access patterns especially when de-normalization allows you to get rid of some indexes and hence reduce the update costs. But, again, it all depends on your particular logical database schema and the prevalent data access/update patterns.|||

It is generally better to normalize your tables for better data integrity and to reduce duplicate data. After you have a good normalized foundation, then you can think about denormalizing with rollup tables for exceptional cases. For an OLTP workload, you may see some performance issues if you have frequently executed queries that have joins to more than four or five tables, since the query optimizer has to use heuristics to come up with a plan.

Your indexing strategy will have more effect on performance than anything else. You have to analyze how volatile your tables are and what kind of workload you have in order to determine what indexes to create. There are good DMV queries that will let you easily see which queries are being executed the most and which indexes are being used. Here is an example:

-- Get Top 200 executed SP's ordered by calls/minute
SELECT TOP 200 qt.text AS 'SP Name', qs.execution_count AS 'Execution Count',
qs.total_worker_time/ISNULL(qs.execution_count, 1) AS 'AvgWorkerTime',
qs.total_worker_time AS 'TotalWorkerTime',
qs.total_elapsed_time/ISNULL(qs.execution_count, 1) AS 'AvgElapsedTime',
qs.max_logical_reads, qs.max_logical_writes, qs.creation_time,
DATEDIFF(Minute, qs.creation_time, GetDate()) AS 'Age in Cache',
qs.execution_count/DATEDIFF(Minute, qs.creation_time, GetDate()) AS 'Calls/Minute',
qs.execution_count/DATEDIFF(Second, qs.creation_time, GetDate()) AS 'Calls/Second'
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE qt.dbid = 5 -- Change this for the database you are interested in
ORDER BY qs.execution_count/DATEDIFF(SECOND, qs.creation_time, GetDate())DESC

|||

its a question of tradeoff..... for faster dmls u need a normalized structures...for faster (generaly) reports de normalized is better..... so u have to decide where actually u want to optimize the performance...

for me normalized database does the trick... as it may prove to be more helpful in longer run ... ie u always have a greater integrity of data....

for reports u may use views to avoid repeated use of joins...

so optimal for me = normalized + views

Wednesday, March 21, 2012

Performance cost of joins

No comments:

Post a Comment

Percentage of total

Blog Archive

About Me