I've got a couple of new Windows 2016 Servers in a DFS partnership with data migrated from another server. One server is "primary" and has priority, while the second server is for backup purposes if the primary goes down. The data was pre-seeded
to each server using online documentation with robocopy and then put in a partnership (one at a time) with the original server until DFS stabilized and replication was working between all 3 servers. The servers hold 4TB of data with over 1 million files spread
across several different replication groups.
I'm not sure if it is related, but during initial replication each new target server reported a ton of "Conflicted" files and retransferred them. I wasn't able to figure out why (the file hashes completely matched, and all the file properties looked
the same). I followed the directions to the letter. So, I let it complete on its own over several days and fix itself.
Now, I've added a file-based, incremental forever, backup (Unitrends) to the primary server. It monitors the USN journal for file changes that need to be backed up. Those backups are inconsistent in size, because some kind of replication issue continues
to cause tons of updates to the USN journal. The DFS servers appear to be "re-syncing" a large number of unchanged files. In addition, they will generate a large number of "Conflicted" files again. These are all files that are static and
haven't been changed in years.
Some of the DFS debug logs were truncated so I wasn't able to go back as far as I would like. But, what I have seen is that before some of the really big incremental backups, the primary DFS server was shut down improperly and went in to DFS database consistency
checks. This is what I have been able to piece together:
- During database consistency checks, the primary DFS server requested a large amount of update info from the backup server, and a large number of updates were sent. It doesn't appear any data was transferred, but some type of file info was sent. I see log
entries like this on the backup server:
20190205 12:19:16.666 7108 INCO 3364 InConnection::ReceiveVvUp Received VvUp connId:{02FDEEF7-6BF4-4843-BEAD-63913F05AF1C} csId:{9CC6A255-6567-4827-B69A-0FEAAC73604F} csName:Shared vvUp:{A46E672B-EAB9-4B12-AEF4-20C83853EE1A} |-> { 1880210..1880215, 1880218, 1880219, + 1880221..1880223, 1880230, 1880300, 2095605, 2095804, 2099486, + 2311562..2311928, 2313768, 2313769, 2313771, 2313778, 2313787, + 2313789, 2313791, 2313797, 2313800, 2313805, 2313807, 2313810, + 2313814, 2313818, 2313821, 2313836, 2313837, 2313841, 2313843, + 2313845, 2313847, 2313855, 2313858, 2313862, 2313865, 2313872..2313874,
20190205 12:19:19.494 4972 JOIN 1201 Join::SubmitUpdate LDB Updating ID Record:
20190205 12:19:19.510 4972 JOIN 1253 Join::SubmitUpdate Sent: uid:{47A2D072-9BC8-41C8-8F41-A66DD8BD22E9}-v5699713 gvsn:{47A2D072-9BC8-41C8-8F41-A66DD8BD22E9}-v5699713 name:removedforprivacy.PDF connId:{CD5C48CF-2662-499E-BA31-2DBFC77A1BF7} csId:{9CC6A255-6567-4827-B69A-0FEAAC73604F} csName:Shared
- During this consistency check, the next incremental backup increased in size to 375GB.
- After the successful consistency check (Event 2214, 2002), the backup server began sending a ton of files to the primary server. The primary server began generating a large number of "Conflicted" files that absolutely haven't been changed in years.
- The following incremental backup was 2TB.
Prior to this, I also had a 2.1TB incremental backup, even though no servers were shutdown improperly, and no significant events occurred in the event logs between the two backups (the DFS debug logs were truncated). The only thing being that this was the
backup immediately proceeding the initial full backup (4TB) and that initial full backup took approximately 8 days to complete.
Why are these significant syncing events occurring that are generating large USN change logs?
Why are files that have not changed being re-synced / updated, and why are they being marked as "Conflicted," on the receiving end? What about the initial sync, and all the "Conflicted" files - is it related?
Is there some kind of check I can run to see if the two servers think their files are consistent and if they actually are - without triggering an entire resync? Is it possible the servers think they are in sync, but they are not actually - and how to determine
that?