Percentage of total: Performance hit on subscribers under merge replication

I have a setup where a single 1.5 million row table is published to 6
separate databases using merge replication. On 4 out of the 6 I have no
problem, but on 2 there seems to be an excessive performance hit on the
subscribing server's CPU when the distributor attempts to apply updates AND
the merge agents never complete running successfully.
Here are some more details:
The Publisher and Distributor are on same server (2x800MHz, very little load
on server)
All 6 subscriptions are Push subscription, with merge agent running once an
hour, there are no filters on the article
When there are changes to synchronize they are almost always Updates, and
thousands of rows will be updated at each subscriber within a few hours on a
Saturday night. Then there will be no activity all week so the 7 tables can
synchronize before the next Saturday.
Subscriber #1: on same LAN as publisher. Very fast to synchronize with
Publisher
Subscribers #2, 3 & 4: each on a different LANs from the Publisher with 256K
DSL connections to Publisher. Synchronize with Publisher within a few hours.
Subscribers #5 & 6: each on a different LANs from the Publisher with < 128K
DSL connections to Publisher. Never seem to synchronize with Publisher. The
merge agents' histories both show a long list of messages saying "100
updates downloaded" then "The process could not deliver update(s) at the
'Subscriber'. ... The merge process timed out while executing a query.
Reconfigure the QueryTimeout parameter and retry the operation."
Subscribers #5 & 6 are not very fast boxes (500 MHz Pentium III). They serve
another DB application and the users complain that this application has
become much slower since the replication was set up. I assume this is caused
by high CPU because the subscribing database and this application's
databases are separate (i.e. the slowdown is not caused by table locks being
held)
How can I minimize the hit on Subscribers #5 & 6 and ensure the updates are
delivered to the subscribing database?
I have not figured this problem out yet. To work around the performance hit
problem, I want to find a way to have the merge agent only run during the
night. I know how to schedule the agent to run between certain hours, but is
there a way to ensure the agent stops at 7AM if it is running at that time?
"Laurence Neville" <laurenceneville@.hotmail.com> wrote in message
news:%23Lvtc$DeFHA.2288@.TK2MSFTNGP14.phx.gbl...
>I have a setup where a single 1.5 million row table is published to 6
>separate databases using merge replication. On 4 out of the 6 I have no
>problem, but on 2 there seems to be an excessive performance hit on the
>subscribing server's CPU when the distributor attempts to apply updates AND
>the merge agents never complete running successfully.
> Here are some more details:
> The Publisher and Distributor are on same server (2x800MHz, very little
> load on server)
> All 6 subscriptions are Push subscription, with merge agent running once
> an hour, there are no filters on the article
> When there are changes to synchronize they are almost always Updates, and
> thousands of rows will be updated at each subscriber within a few hours on
> a Saturday night. Then there will be no activity all week so the 7 tables
> can synchronize before the next Saturday.
> Subscriber #1: on same LAN as publisher. Very fast to synchronize with
> Publisher
> Subscribers #2, 3 & 4: each on a different LANs from the Publisher with
> 256K DSL connections to Publisher. Synchronize with Publisher within a few
> hours.
> Subscribers #5 & 6: each on a different LANs from the Publisher with <
> 128K DSL connections to Publisher. Never seem to synchronize with
> Publisher. The merge agents' histories both show a long list of messages
> saying "100 updates downloaded" then "The process could not deliver
> update(s) at the 'Subscriber'. ... The merge process timed out while
> executing a query. Reconfigure the QueryTimeout parameter and retry the
> operation."
> Subscribers #5 & 6 are not very fast boxes (500 MHz Pentium III). They
> serve another DB application and the users complain that this application
> has become much slower since the replication was set up. I assume this is
> caused by high CPU because the subscribing database and this application's
> databases are separate (i.e. the slowdown is not caused by table locks
> being held)
> How can I minimize the hit on Subscribers #5 & 6 and ensure the updates
> are delivered to the subscribing database?
>
|||Laurence,
you could have another job that calls sp_stop_job, and which runs at 7am. I
have something similar for a job load which couldn't conflict with business
hours. The problem is that if the main job isn't running, the sp_stop_job
will fail and give the fail icon in EM. So, I check the job status prior to
deciding whether to stop or not. The Proc I've created is below.
Rgds,
Paul Ibison SQL Server MVP, www.replicationanswers.com
(recommended sql server 2000 replication book:
http://www.nwsu.com/0974973602p.html)
CREATE PROCEDURE spReturnJobState (@.JobID uniqueidentifier, @.JobState int
OUTPUT)
/************************************************** **************************************
Description: Script to return the job state of the data load job so
if it is running we can stop it.
Returns: (None)
Author: Paul Ibison (xt 26163)
Date Created: 22/12/2004
Revisions:
************************************************** ***************************************/
AS
CREATE TABLE #xp_results (job_id UNIQUEIDENTIFIER NOT NULL,
last_run_date INT NOT NULL,
last_run_time INT NOT NULL,
next_run_date INT NOT NULL,
next_run_time INT NOT NULL,
next_run_schedule_id INT NOT NULL,
requested_to_run INT NOT NULL, -- BOOL
request_source INT NOT NULL,
request_source_id sysname collate database_default
null,
running INT NOT NULL, -- BOOL
current_step INT NOT NULL,
current_retry_attempt INT NOT NULL,
job_state INT NOT NULL)
INSERT INTO #xp_results
EXECUTE master.dbo.xp_sqlagent_enum_jobs 1, 'dbo'
SELECT @.JobState = job_state FROM #xp_results WHERE job_id = @.JobID
DROP TABLE #xp_results
GO
declare @.retstatus int
exec dba_admin..spReturnJobState 'E2682241-48BC-479B-BD99-795712292720',
@.retstatus output
if @.retstatus = 1 --executing
begin
exec msdb..sp_stop_job 'UPLOAD_PROCESS_DATA_JOB'
end
|||Paul,
Thanks very much, I will use this.
Do you have any suggestions about my other problem: merge agents failing to
deliver updates to subscribers that have slow connections to the publisher,
also causing a high load on the subscriber CPU at the same time? Are there
configuration options I can set for slow connections, or to reduce the load
on the CPU?
Thanks
Laurence
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:uwGwsT8eFHA.3932@.TK2MSFTNGP12.phx.gbl...
> Laurence,
> you could have another job that calls sp_stop_job, and which runs at 7am.
> I have something similar for a job load which couldn't conflict with
> business hours. The problem is that if the main job isn't running, the
> sp_stop_job will fail and give the fail icon in EM. So, I check the job
> status prior to deciding whether to stop or not. The Proc I've created is
> below.
> Rgds,
> Paul Ibison SQL Server MVP, www.replicationanswers.com
> (recommended sql server 2000 replication book:
> http://www.nwsu.com/0974973602p.html)
> CREATE PROCEDURE spReturnJobState (@.JobID uniqueidentifier, @.JobState int
> OUTPUT)
> /************************************************** **************************************
> Description: Script to return the job state of the data load job so
> if it is running we can stop it.
> Returns: (None)
> Author: Paul Ibison (xt 26163)
> Date Created: 22/12/2004
> Revisions:
> ************************************************** ***************************************/
> AS
> CREATE TABLE #xp_results (job_id UNIQUEIDENTIFIER NOT NULL,
> last_run_date INT NOT NULL,
> last_run_time INT NOT NULL,
> next_run_date INT NOT NULL,
> next_run_time INT NOT NULL,
> next_run_schedule_id INT NOT NULL,
> requested_to_run INT NOT NULL, -- BOOL
> request_source INT NOT NULL,
> request_source_id sysname collate database_default
> null,
> running INT NOT NULL, -- BOOL
> current_step INT NOT NULL,
> current_retry_attempt INT NOT NULL,
> job_state INT NOT NULL)
> INSERT INTO #xp_results
> EXECUTE master.dbo.xp_sqlagent_enum_jobs 1, 'dbo'
> SELECT @.JobState = job_state FROM #xp_results WHERE job_id = @.JobID
> DROP TABLE #xp_results
> GO
>
> declare @.retstatus int
> exec dba_admin..spReturnJobState 'E2682241-48BC-479B-BD99-795712292720',
> @.retstatus output
> if @.retstatus = 1 --executing
> begin
> exec msdb..sp_stop_job 'UPLOAD_PROCESS_DATA_JOB'
> end
>
|||Laurence,
what is the error when there is a failure?
Cheers,
Paul
"Laurence Neville" <laurenceneville@.hotmail.com> wrote in message
news:OSUG9oBfFHA.1148@.TK2MSFTNGP12.phx.gbl...
> Paul,
> Thanks very much, I will use this.
> Do you have any suggestions about my other problem: merge agents failing
> to deliver updates to subscribers that have slow connections to the
> publisher, also causing a high load on the subscriber CPU at the same
> time? Are there configuration options I can set for slow connections, or
> to reduce the load on the CPU?
> Thanks
> Laurence
> "Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
> news:uwGwsT8eFHA.3932@.TK2MSFTNGP12.phx.gbl...
>
|||The merge agents' histories both show a long list of messages saying "100
updates downloaded" then "The process could not deliver update(s) at the
'Subscriber'. ... The merge process timed out while executing a query.
Reconfigure the QueryTimeout parameter and retry the operation."
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:ug9mi3BfFHA.3936@.TK2MSFTNGP14.phx.gbl...
> Laurence,
> what is the error when there is a failure?
> Cheers,
> Paul
> "Laurence Neville" <laurenceneville@.hotmail.com> wrote in message
> news:OSUG9oBfFHA.1148@.TK2MSFTNGP12.phx.gbl...
>
|||OK - try creating a custom merge agent profile with a large value of
QueryTimeout and resynchronizing.
Cheers,
Paul Ibison
|||OK I am trying that out.
Question - like I said before, what I typically see in the agent history is
a long listof messages saying "100
updates downloaded" then "The process could not deliver update(s) at the
'Subscriber'. ..." When the agent runs again, do the updates that were
previously downloaded to the subscriber still exist on the subscriber (I
imagine them being in some "holding area" waiting to be applied to the
database) , or does the download start back at the beginning? In other
words, will all the updates eventually be downloaded (even if they are not
successfully applied to the database each time), or is the process always
going back to square one when it fails?
Thanks
Laurence
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:eTdCnqCfFHA.2372@.TK2MSFTNGP14.phx.gbl...
> OK - try creating a custom merge agent profile with a large value of
> QueryTimeout and resynchronizing.
> Cheers,
> Paul Ibison
>
|||Laurence,
Merge downloads the changes in batches and these batches are groups of
records which are applied as a transaction. An interruption to a batch will
cause a rollback of that particular batch while previous ones have been
already committed. Resynchronization will continue with the same batch
afterwards at the start of the batch and then do all the subsequent batches.
Rgds,
Paul Ibison SQL Server MVP, www.replicationanswers.com
(recommended sql server 2000 replication book:
http://www.nwsu.com/0974973602p.html)
|||When I see "100 updates downloaded" in the merge agent history, is that one
batch (i.e. 100 changes)? From reading Books Online that seems to be the
case ("By default, the Merge Agent processes 100 generations in each batch
uploaded and downloaded between the Publisher and Subscriber. ")
Next question: when I look at a given session's history and see 20 messages
saying "100 updates downloaded" then a single message saying "The process
could not deliver update(s) at the 'Subscriber'. ..." (either because the
merge process timed out, or because the job automatically gets stopped at
7AM - thanks for telling me how to do that!) does that mean that all 20
batches have failed to be applied and will be redownloaded in the next
session? If so, is there a way to limit the number of batches downloaded in
a session? I am looking for any techniques that will result in some changes
being applied to the subscriber, however slowly! I am also trying out the
Query Timeout parameter that you mentioned a few messages ago.
Thanks
Laurence
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:Or$gnsUfFHA.1148@.TK2MSFTNGP12.phx.gbl...
> Laurence,
> Merge downloads the changes in batches and these batches are groups of
> records which are applied as a transaction. An interruption to a batch
> will cause a rollback of that particular batch while previous ones have
> been already committed. Resynchronization will continue with the same
> batch afterwards at the start of the batch and then do all the subsequent
> batches.
> Rgds,
> Paul Ibison SQL Server MVP, www.replicationanswers.com
> (recommended sql server 2000 replication book:
> http://www.nwsu.com/0974973602p.html)
>

Friday, March 30, 2012

Performance hit on subscribers under merge replication

No comments:

Post a Comment

Percentage of total

Blog Archive

About Me