I set up a DFS system for my company about five years ago, and until recently it’s been working flawlessly and without any issues. We have one domain with four sites, and in each site there are two file storage servers. We have about 2TB of data that is stored on each server (all servers have the same data), and a hub and spoke configuration where our central site is the first to receive and the first to send out changes to data. Server A in the central site is the hub for 1TB of the data and Server B in the central site is the hub for the remaining 1TB (approximately). There are about 3 million files total in the replicated folders. We have 20Mbit (up and down) fiber connections running between each of the sites.
About a month ago I got a complaint about missing files on a server which appeared to be due to slow replication. I checked the backlogs on the folder and discovered that it had 1.4 million files backlogged and waiting to replicate. This, on a folder that only contains about 300,000 files. I monitored its progress and after about 24 hours the log was empty and I didn’t think anything more of it. The next morning I decided to check it again, and at some point overnight it jumped to 600,000 files. Checked it 15 MINUTES later and it was up to 1.2 million!
For troubleshooting I have file auditing turned on on all the file servers. I went through the security logs and, using a few of the files from the backlog report, searched to see who or what was changing the files. There is no mention of any of the files being changed in anyway on any of the servers. Also, we don’t have AV running on any of the servers and our backup uses a simple copy procedure that doesn’t set the archive bit or change the date last accessed. I’m at a loss as to why these files keep replicating when it appears that they haven’t changed.
To top this all off, this morning I checked the backlogs and noticed a folder trying to replicate in an area that I’m very familiar with as it’s dedicated to the IT department. The folder was deleted YEARS ago, and that deletion had successfully replicated to all servers as I’ve been in the parent directory on all eight servers within the last couple of months. This latest development is what caused me to submit this question to the forum.
So has anyone seen anything like this? How do I troubleshoot this further? All this seemingly erroneous replication is causing a lot more file collisions and we’re starting to lose work on a regular basis.