Quantcast
Channel: File Services and Storage forum
Viewing all articles
Browse latest Browse all 13565

DFSR Stopping Replication Only on One Replication Folder - Help Please

$
0
0

Hello,
I could really do with some help regards to a very annoying problem we have with DFS replication.
Sorry for the long post but wanted you to have all the info you can to hopefully help me.
I've exhausted all trouble shooting that I can do and I am very close to resorting to use robocopy for the replication.

I've search as much as I can on technet and looked at all the DFSR Tips n Tricks etc but haven't found an solution to my problem.

We have a very simple setup, all details and what I've tried are below.
I will refer to the servers are purely SRV1 and SRV2 and the folder/share info below.

-------------------

Windows Setup

2 x Windows 2008 R2 SP1 servers (xenserver vm's with up to date tools) - Joined to domain
Each server is exactly the same as the other with regards to OS/Updates/drives etc.
C: - OS
D:\SharedDocs - Share1$
E:\ApplicationFiles - Share2$
F:\TSProfiles - TSProfiles$

10MB link between the two sites
All DFS HFA's have been applied
KB2663685 and registry entry HKLM\System\CurrentControlSet\Services\DFSR\Parameters\StopReplicationOnAutoRecovery 0
Windows firewall is disabled
File screening is enabled on the D:\SharedDocs folder for desktop.ini/thumbs.db on both servers.
I have disabled 8.3 filenames on all drives re a MS KB article which I thought was the issue but hasn't helped.
The WMI permission threads I've seen for DFSR issue aren't relevant as I've checked and all ok.
Virus software is disabled on the servers for troubleshooting.

----------------------

DFS Setup

SRV1 was the master/primary and is the live file share that users access the documents, the seed was done using robocopy initially.
SRV2 is the DR server which is read only and referral target is disabled

We have one namespace with 3 target folders.

Each server is a namespace server.
SRV1 referral status is enabled
SRV2 referral status is disabled

Folder Targets:

Share1$
  SRV1 - referral enabled - path \\srv1\share1$
  SRV2 - referral disabled - path \\srv2\share1$
Share2$
  SRV1 - referral enabled - path \\srv2\share2$
  SRV2 - referral disabled - path \\srv2\share2$
TSProfiles$
  SRV1 - referral enabled - path \\srv2\TSProfiles$
  SRV2 - referral disabled - path \\srv2\TSProfiles$

---------------------------------

Replication Groups:

SharedDocs and ApplicationFiles Replication

Replicated Folder: SharedDocs
 D:\SharedDocs - Enabled - SRV1 - SharedDocs - 20GB
 D:\SharedDocs - Enabled (read only) - SRV2 - SharedDocs - 20GB
File Filter: ~*, *.bak, *.tmp, desktop.ini, thumbs.db
Namespace path: \\domain\dfs\share1$

Replicated Folder: AppplicationFiles
 E:\ApplicationFiles - Enabled - SRV1 - ApplicationFiles - 10GB
 E:\ApplicationFiles - Enabled (read only) - SRV2 - ApplicationFiles - 10GB
File Filter: ~*, *.bak, *.tmp
Namespace path: \\domain\dfs\share2$

Schedule: 
Mon - Sun: 00:00 to 07:00, 20:00 to 00:00 Full
Mon - Sun: 07:00 to 20:00 4Mbps

TSProfiles
Replicated Folder: TSProfiles$
 F:\TSProfiles - Enabled - SRV1 - TSProfiles- 10GB
 F:\TSProfiles - Enabled (read only) - SRV2 - TSProfiles- 10GB
File Filter: ~*, *.bak, *.tmp, *.dat
Namespace path: \\domain\dfs\tsprofiles$

Schedule: 
Mon - Sun: 00:00 to 00:00 Full

----------------------------------

I have scheduled job that runs daily that sends me a backlog report (get-dfsrbacklog.ps1), DFS health report and DFS Propagation report.
Also propagation reports that run every 3 hours.

------------
OK here's the issue.
It was working fine up until a while ago, nothing major/obvious has changed to cause this.

Issue: Replication Stops Working Each Day for SharedDocs

The replication for SharedDocs stops working each day, both other replicated folders are ok, I can tell from the propagation reports which run every 3 hours.

The propagation reports tell me that after the 22:00 test which completes it then stops somewhere after that as the 01:00 report the test is delayed.
There are no backlogs shown either but the only way to get it working again for SharedDocs is to restart the DFSR service, sometimes on both servers. 

The backlog then appears from SRV1 to SRV2 and the files/propagation test finally replicate across.
So I have a script to restart the DFSR service daily at 4am for this but this isn't ideal.
I'm all out of idea's so if anyone can help I'd be forever in your debt!!

Troubleshooting:
The VM (SRV1) is backed up each night using PHD (the PHD guest tools are installed so VSS is working), during this time replication logs an event to say replication stops as the server is being backed up. I thought this might be the issue so disabled backups for a few nights but the problem continued.
I built a new downstream server and remove the old one just in case, that hasn't changed anything.
There is nothing in the event viewer either.
As mentioned at the top, all HFA's have been applied and the 'KB2663685 and registry entry HKLM\System\CurrentControlSet\Services\DFSR\Parameters\StopReplicationOnAutoRecovery 0' fix as well just to be sure. 

The DFSR debug files aren't the easiest to read either and not knowing what time this occurs doesn't help my search.
The only errors I can see are: 

20130911 02:22:57.337 5008 CCTX  1875 [WARN] VolumeIdTable::GetNonClusteredVolumes (Ignored) Unable to retrieve the volume's serial number and filesystem name. Volume will not be added to the Volume Id Table. volPath:\\?\Volume{9d341127-bb81-11e1-906f-806e6f6e6963}\ Error:[Error:21(0x15) Util::GetVolumeInformationW fsutil.cpp:306 5008 W The device is not ready.]

Error:
+[Error:9024(0x2340) UpstreamTransport::OpenFile upstreamtransport.cpp:1238 504 C The file meta data is not synchronized with the file system]
+[Error:9024(0x2340) OutConnection::OpenFile outconnection.cpp:711 504 C The file meta data is not synchronized with the file system]
+[Error:9024(0x2340) OutConnectionContentSetContext::OpenFile outconnection.cpp:2583 504 C The file meta data is not synchronized with the file system]
+[Error:9024(0x2340) OutConnectionContentSetContext::GetReplicaReader outconnection.cpp:4000 504 C The file meta data is not synchronized with the file system]
+[Error:9024(0x2340) OutConnectionContentSetContext::GetReplicaReader outconnection.cpp:3995 504 C The file meta data is not synchronized with the file system] 

Any help would be greatly appreciated.

Regards

Matt


Viewing all articles
Browse latest Browse all 13565

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>