Quantcast
Channel: File Services and Storage forum
Viewing all articles
Browse latest Browse all 10672

DFSr supported cluster configurations - replication between shared storage

$
0
0

I have a very specific configuration for DFSr that appears to be suffering severe performance issues when hosted on a cluster, as part of a DFS replication group.

My configuration:

3 Physical machines (blades) within a physical quadrant.

3 Physical machines (blades) hosted within a separate physical quadrant

Both quadrants are extremely well connected, local, 10GBit/s fibre.
There is local storage in each quadrant, no storage replication takes place.

The 3 machines in the first quadrant are MS clustered with shared storage LUNs on a 3PAR filer.
The 3 machines in the second quadrant are also clustered with shared storage, but on a separate 3PAR device.
8 shared LUNs are presented to the cluster in the first quadrant, and an identical storage layout is connected in the second quadrant. Each LUN has an associated HAFS application associated with it which can fail-over onto any machine in the local cluster.

DFS replication groups have been set up for each LUN and data is replicated from an "Active" cluster node entry point, to a "Passive" cluster node that provides no entry point to the data via DFSn and a Read-Only copy on it's shared cluster storage.
For the sake of argument, assume that all HAFS application instances in the first quadrant are "Active" in a read/write configuration, and all "Passive" instances of the HAFS applications in the other quadrants are Read-Only.

This guide: http://blogs.technet.com/b/filecab/archive/2009/06/29/deploying-dfs-replication-on-a-windows-failover-cluster-part-i.aspx defines how to add a clustered service to a replication group. It clearly shows using "Shared storage" for the cluster, which is common sense otherwise there effectively is no application fail-over possible and removes the entire point of using a resilient cluster.

This article: http://technet.microsoft.com/en-us/library/cc773238(v=ws.10).aspx#BKMK_061 defines the following:
DFS Replication in Windows Server 2012 and Windows Server 2008 R2 includes the ability to add a failover cluster as a member of a replication group. The DFS Replication service on versions of Windows prior to Windows Server 2008 R2 is not designed to coordinate with a failover cluster, and the service will not fail over to another node.

It then goes on to state, quite incredibly: 
DFS Replication does not support replicating files on Cluster Shared Volumes.

Stating quite simply that DFSr does not support Cluster Shared Volumes makes absolutely no sense at all after stating clusters are supported in replication groups and a technet guide is provided to setup and configure this configuration. What possible use is a clustered HAFS solution that has no shared storage between the clustered nodes - none at all.

My question:  I need some clarification, is the text meant to read "between" Clustered Shared Volumes?
The storage configuration must to be shared in order to form a clustered service in the first place. What we am seeing from experience is a serious degradation of performance when attempting to replicate / write data between two clusters running a HAFS configuration, in a DFS replication group.

If for instance, as a test, local / logical storage is mounted to a physical machine the performance of a DFS replication group between the unshared, logical storage on the physical nodes is approaching 15k small files per minute on initial write and even high for file amendments. When replicating between two nodes in a cluster, with shared clustered storage the solution manages a weak 2,500 files per minute on initial write and only 260 files per minute when attempting to update data / amend files.

By testing various configurations we have effectively ruled out the SAN, the storage, drivers, firmware, DFSr configuration, replication group configuration - the only factor left that makes any difference is replicating from shared clustered storage, to another shared clustered storage LUN.

So in summary:

Logical Volume ---> Logical Volume = Fast
Logical Volume ---> Clustered Shared Volume = ??
Clusted Shared Volume ---> Clustered Shared Volume = Pitifully slow

Can anyone explain why this might be?
The guidance in the article is in clear conflict with all other evidence provided around DFSr and clustering, however it seems to lean towards why we may be seeing a real issue with replication performance.

Many thanks for your time and any help/replies that may be received.
Paul


Viewing all articles
Browse latest Browse all 10672

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>