I'm replicating between two servers in two sites (Server A - Server 2012 R2 STD, Server B - Server 2008 R2) over a VPN (Sonicwall Firewall). Though the initial replication seems to be happening it is very slow (the folder in question is less than 3GB). I'm seeing these in the event viewer every few minutes:
The DFS Replication service is stopping communication with partner PPIFTC for replication group FTC due to an error. The service will retry the connection periodically.
Additional Information:
Error: 1726 (The remote procedure call failed.)
and then....
The DFS Replication service successfully established an inbound connection with partner PPIFTC for replication group FTC.
-------------------------------------------
Here are all my troubleshooting steps (keep in mind that our VPN is going through a SonicWall <--I increased the TCP timeout to 24 hours):
-Increased TCP Timeout to 24 hours
-Added the following values on both sending and receiving members and rebooted server
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value =DisableTaskOffload
Type = DWORD
Data = 1
Value =EnableTCPChimney
Type = DWORD
Data = 0
Value =EnableTCPA
Type = DWORD
Data = 0
Value =EnableRSS
Type = DWORD
Data = 0
---------------------------------more troubleshooting--------------------------
-Disabled AntiVirus on both members
-Made sure DFSR TCP ports 135 & 5722 are open
-Installed all hotfixes for 2008 R2 (http://support.microsoft.com/kb/968429) and rebooted
-Ran NETSTAT –ANOBP TCP and the DFS executable results are listed below:
Sending Member:
[DFSRs.exe]
TCP 10.x.x.x:53 0.0.0.0:0 LISTENING 1692
[DFSRs.exe]
TCP 10.x.x.x:54669 10.x.x.x:5722 TIME_WAIT 0
TCP 10.x.x.x:54673 10.x.x.x:5722 ESTABLISHED 1656
[DFSRs.exe]
TCP 10.x.x.x:64773 10.x.x.x:389 ESTABLISHED 1692
[DFSRs.exe]
TCP 10.x.x.x:64787 10.x.x.x:389 ESTABLISHED 1656
[DFSRs.exe]
TCP 10.x.x.x:64795 10.x.x.x:389 ESTABLISHED 2104
Receiving Member:
[DFSRs.exe]
TCP 10.x.x.x:56683 10.x.x.x:389 ESTABLISHED 7472
[DFSRs.exe]
TCP 10.x.x.x:57625 10.x.x.x:54886 ESTABLISHED 2808
[DFSRs.exe]
TCP 10.x.x.x:61759 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61760 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61763 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61764 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61770 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61771 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61774 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61775 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61776 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61777 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61778 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61779 10.x.x.x:57625 TIME_WAIT 0
TCP 10.x.x.x:61784 10.x.x.x:52757 ESTABLISHED 7472
[DFSRs.exe]
TCP 10.x.x.x:63661 10.x.x.x:63781 ESTABLISHED 4880
------------------------------more troubleshooting--------------------------
-Increased Staging to 32GB
-Opened the ADSIedit.msc console to verify the "Authenticated Users" is set with the default READ permission on the following object:
a. The computer object of the DFS server
b. The DFSR-LocalSettings object under the DFS server computer object
-Ranping <var>10.x.x.x</var> -f -l 1472 and got replies back from both servers
-AD replication is successful on all partners
-Nslookup is working so DNS is working
-Updated NIC drivers on both servers
- I ran the following to set the Primary Member:
dfsradmin Membership Set /RGName:<replication group name> /RFName:<replicated folder name> /MemName:<primary member> /IsPrimary:True
Then Dfsrdiag Pollad /Member:<member name>
I'm seeing these errors in the dfsr logs:
20141014 19:28:17.746 9116 SRTR 957 [WARN] SERVER_EstablishSession Failed to establish a replicated folder session. connId:{45C8C309-4EDD-459A-A0BB-4C5FACD97D44} csId:{7AC7917F-F96F-411B-A4D8-6BB303B3C813}
Error:
+ [Error:9051(0x235b) UpstreamTransport::EstablishSession upstreamtransport.cpp:808 9116 C The content set is not ready]
+ [Error:9051(0x235b) OutConnection::EstablishSession outconnection.cpp:532 9116 C The content set is not ready]
+ [Error:9051(0x235b) OutConnection::EstablishSession outconnection.cpp:471 9116 C The content set is not ready]
---------------------------------------more troubleshooting-----------------------------
I've done a lot of research on the Internet and most of it is pointing to the same stuff I've tried. Does anyone have any other suggestions? Maybe I need to look somewhere else on the server side or firewall side?
I tried replicating from a 2012 R2 server to another 2012 server and am getting the same events in the event log so maybe it's not a server issue.
Some other things I'm wondering:
-Could it be the speed of the NICs? Server A is a 2012 Server that has Hyper-V installed. NIC teaming was initially setup and since Hyper-V is installed the NIC is a "vEthernet (Microsoft Network Adapter Multiplexor Driver Virtual Switch) running at a speed of 10.0Gbps whereas Server B is running a single NIC at 1.0Gbps
-Could occasional ping timeout's cause the issue? From time to time I get a timeout but it's not as often as the events I'm seeing. I'm getting 53ms pings. The folder is only 3 GB so it shouldn't take that long to replicate but it's been days. The schedule I have set for replication is mostly all day except for our backup times which start at 11pm-5am. Throughout the rest of the time I have it set anywhere from 4Mbps to 64 Kbps. Server A is on a 5mb circuit and Server B is on a 10mb circuit.