We have been burned a few times by high backlogs for whatever reason - DFS has an issue or someone massed changed permissions on a folder and created hundreds of thousands of files needing to replicate.
Long story short though I would like to be notified if there is a high backlog on any DFS server in the domain (we have 5) in any direction for any replicated folder or group.
I found a fantastic script to check this - takes about 40 minutes for the script to run and I can build in alert - ie if count is greater than 1000 - then send an email etc... and then run the script once ever couple of hours.
I assume we are not supposed to link to external web sites but if one was to go to a certain search engine and look for
"How do I get the current DFS replication backlog count? hkey" you would find the powershell script I am referring to. I had to make a change in the comments of the page to make it work for me.
Credit to author Kamal on this. Only change is a time count to see how long script took to run and the changes I mentioned above. Original is just commented out below.
So back to my question - am I over thinking this? - Is there a better event in the event log that I can search for and alert on that will tell me there is a high backlog count - is there an easy way to notify me - hey you might have an issue?
PS - not sure why it thinks the second half is comments... but works pastes into powershell editor and looks fine.
# Get Start Time
$startDTM = (Get-Date)
# Get all replication groups
$replicationgroups = dfsradmin rg list;
# Reduce loop by 3 lines to filter out junk from dfsradmin
$i = 0;
$imax = ($replicationgroups.count -3);
# Loop through each replication group
foreach ($replicationgroup in $replicationgroups) {
# Exclude first and last two lines as junk, and exclude the domain system volume
if (($i -ge 1) -and ($i -le $imax) -and ($replicationgroup -notlike "*domain system volume*")) {
# Format replication group name
$replicationgroup = $replicationgroup.split(" ");
$replicationgroup[-1] = "";
$replicationgroup = ($replicationgroup.trim() -join " ").trim();
# Get and format replication folder name
$replicationfolder = & cmd /c ("dfsradmin rf list /rgname:`"{0}`"" -f $replicationgroup);
$replicationfolder = (($replicationfolder[1].split("\"))[0]).trim();
# Get servers for the current replication group
$replicationservers = & cmd /c ("dfsradmin conn list /rgname:`"{0}`"" -f $replicationgroup);
# Reduce loop by 3 lines to filter out junk from dfsradmin
$j = 0;
$jmax = ($replicationservers.count -3);
# Loop through each replication member server
foreach ($replicationserver in $replicationservers) {
# Exclude first and last two lines as junk
if (($j -ge 1) -and ($j -le $jmax)) {
# Format server names
# $sendingserver = ($replicationserver.split(" "))[0].trim();
# $receivingserver = ($replicationserver.split(" "))[2].trim();
$sendingserver = ($replicationserver.split()| where {$_})[0].trim();
$receivingserver = ($replicationserver.split()| where {$_})[1].trim();
# Get backlog count with dfsrdiag
$backlog = & cmd /c ("dfsrdiag backlog /rgname:`"{0}`" /rfname:`"{1}`" /smem:{2} /rmem:{3}" -f $replicationgroup, $replicationfolder, $sendingserver, $receivingserver);
$backlogcount = ($backlog[1]).split(":")[1];
# Format backlog count
if ($backlogcount -ne $null) {
$backlogcount = $backlogcount.trim();
}
else {
$backlogcount = 0;
}
# Create output string to <replication group> <sending server> <receiving server> <backlog count>;
$outline = $replicationgroup + " From: " + $sendingserver + " To: " + $receivingserver + " Backlog: " + $backlogcount;
$outline;
}
$j = $j + 1;
}
}
$i = $i + 1;
}
# Get End Time
$endDTM = (Get-Date)
# Echo Time elapsed
"Elapsed Time: $(($endDTM-$startDTM).totalseconds) seconds"