|
Tech Forum
Distributed File System (DFS)
Article summary
Distributed
File System (DFS) is a very useful feature of Windows 2003 and above.
This article discusses Windows 2003 DFS implementation. It allows
you to provide a virtual server path to users while storing files
on physically different servers. This way, file storage can be distributed
without having users remember multiple paths for various purposes.
In spite of being a very useful and elegant feature, I have
seen very few organisations using it. Therefore, I thought it is important to
increase the awareness about the need and utility of this feature.
What is DFS?
Usually, server resources and paths are shared using the
Universal Naming Convention (UNC). This has syntax like \\servername\sharename.
Most often such resources are scattered across the organisation. Each department
or each project may have a different server path. Due to this, users need to
remember multiple file locations and paths. If due to some reason the path changes,
many users will suddenly not be able to find files at the expected location.
This way, we have to take extra precautions to inform all users about such changes.
This is an operational nightmare.
DFS is a way of de-linking the physical file location and
the logical path used to access the files. There are two types of DFS
Single Root and Domain based DFS. DFS works on the Windows server. However,
the client Windows OS contains the DFS client to provide additional features
and caching.
The jargon-based name for this mode of de-linking physical
and logical file locations is called storage virtualization. Now
let us see how DFS works.
Features
DFS offers many features that make managing multiple file
servers much simpler and effective.
DFS links multiple shared folders on multiple servers into
a folder hierarchy. This hierarchy is same as a physical directory structure
on a single hard disk. However, in this case, the individual branch of the hierarchy
can be on any of the participating servers.
Even if the files are scattered across multiple servers,
users need to go to only one network location. This is a very powerful feature.
Users do not need to know if the actual file location has changed. There is
no need to inform everyone about using new paths or server names! Imagine how
much time and energy this can save. It reduces downtime required during server
renames, planned or unplanned shutdowns and so on.
As mentioned, during planned shutdowns, the file resources
can be temporarily made available from another standby server, without users
requiring to be notified about it. This way downtime related to maintenance
or disaster recovery tasks is completely eliminated.
This is very useful especially in Web servers. The Web server
file locations can be configured in such a way that even when the physical location
of the files changes to another server, the HTML links continue to work without
breaking.
It is possible to replicate data to one or more servers within
the DFS structure. This way, if one server is down, files will be automatically
served from other replicated locations. Whats more, users will not even
know the difference.
This is a conceptual extension of replication feature. Now
that you can put copies of the same file across multiple locations. If the file
is requested by more than one user at the same time, DFS will serve it from
different locations. This way, the load on one server is balanced across multiple
servers, which increases performance.
At a user level, they do not even come to know that the file
came from a particular replica on DFS.
DFS utilises the same NTFS security and file sharing permissions.
Therefore, no special configuration is required to integrate base security with
DFS.
- Ongoing hard disk space management
What happens when your hard disk space is exhausted? You
typically add another hard disk. Now, this hard disk will have another name.
What if this disk is on another server? Things would get worse. With DFS, you
can keep adding new directories to the namespace on completely separate servers.
Users never have to bother about the physical server name. This way, you can
grow your storage in steps without having to worry about destabilising file
access by users.
- Unifying heterogeneous platforms
DFS also supports NetWare. This way, administrators can unify
data access by combining servers running heterogeneous operating systems from
a file access perspective.
- Fault tolerance or higher availability of data
DFS works with clustering services. This combination offers
higher availability than just using clustering.
Types of DFS
There are two types of DFS Stand Alone DFS and Domain
DFS.
DFS is a virtual representation of a hard disk folder structure.
Therefore it has to start with the root directory, like a hard disk. This is
called the DFS root. This is actually a shared location on one of the servers.
In case of a stand alone DFS, if the server containing the DFS root folder is
not available, the remaining DFS structure will not be usable. This means, the
DFS is not very stable and fault tolerant.
Therefore, Domain DFS is also available. Domain DFS provides
features which can replicate the DFS root and thus provide much more fault tolerance.
Domain DFS information is stored in Active Directory. The path to access the
root or a link starts with the host domain name. A domain root can have multiple
root targets, which offer fault tolerance and load sharing at the root level.
Below the root, you have multiple links and targets. Links
can be actual file shares or other DFS root links. Targets are actual links
only to shared file folders.
Using these elements you can create a directory structure
of any complexity.
Creating a DFS system
The DFS console is available in Programs Administrative
Tools Distributed File System menu.
1. Right-click on the Distributed file system MMC node and
choose New DFS Root.
2.
Now a wizard appears. Choose the type of DFS first.
3. Now choose a host server. A host server is the server
that contains the shared folder which becomes the DFS root.
4. Now you are shown a list of existing shares on the selected
server. You can choose one of these or create a new share. This becomes the
new DFS root.
5.
Now right-click on the root and choose Add new Link
Here you can choose
any file share on any of the servers available on the network and add it as
a link.
6. Please note that the actual share name and the link name
that we specify are completely different. Therefore, the users do not need to
know the underlying share and server name at all. They just look at this link
as an item in the DFS root based hierarchy.
7.
Also notice the setting related to client side caching. DFS is accessed from
DFS client. This software must be running to use DFS files from servers. The
setting specifies the amount of time DFS client will cache this instance on
the local machine.
8. In Windows 2003, you can enable or disable this mapping
of DFS name and actual share on demand. Just right-click on a link and choose
Enable or Disable referral.
9. To enable replication for a given share, right-click on
the link and choose Create Replica. Now you need to specify another share. You
also need to specify the replication information.
Replication
There are two types of replicationAutomatic and Manual.
Automatic is available for Domain DFS only. For Stand Alone DFS, the files must
be replicated manually. File Replication service must be running on all target
servers for this to work.
There are four ways in which file replication can be achieved
between two or more DFS Roots or links.
1. Ring topology
2. Hub and Spoke
3. Mesh
4. Custom
First three options are self explanatory. Custom allows you
to specify a highly fine tuned method of replication, which can be tailored
to suit specific needs.
For example, let us say you have one DFS root which replicates
to servers at four different locations. Two locations have high bandwidth, two
have low. You can specify that the replication to high bandwidth servers must
be completed first. Only then can the low bandwidth servers will start replicating.
Such level of customisation provides you with fine grained control over the
replication process and helps you optimise bandwidth.
Further you can enable and disable any replication connection
on demand. This can be useful in cases where you want to make bandwidth available
to some other application.
Replication can be scheduled at convenient times. You can
include or exclude specific folders from DFS replication.
Replication between two sites can be given a priority high,
medium or low. Depending upon this configuration, the replication can be controlled
in a very sophisticated manner.
Once replication is managed, load balancing occurs automatically.
Security
Security is handled at individual share level. If you have
access to a file share directly, you will have same access through DFS as well.
Using DFS
For users, using DFS is as simple as using a standard file
share. In fact they do not even need to know that it is a DFS-based root.
Suppose I have servers A and B. A has the share called MAIN
which I have converted to DFS Root. B has a share called BSHARE. This is now
linked as a DFS Link called SUB which is under the DFS Root MAIN.
Now users access the DFS root by using the path \\A\MAIN.
When they open this share, they will see the existing regular
files and folders in the share MAIN. In addition they will see another FOLDER
called SUB. This folder actually does not exist on server A. It is a virtual
representation of \\B\BSHARE.
If DFS was not there, the users would have to remember two
shares \\A\MAIN and \\B\BSHARE.
Like this a complex hierarchy can be created seamlessly without
having users to worry about explicit paths.
In case the user actually wants to know whether a particular
folder is a regular one or a DFS mapped folder, you just need to look at the
folder properties. The diagram below shows the DFS tab on the folder Properties
of folders that are DFS Links.
Notes
1. DFS works only with NTFS file system
2. Antivirus software can interfere with replication.
3. The total file access path must contain less than 260
characters. Keep this in mind when you create DFS names.
4. DFS client must be loaded on machines that access DFS.
5. Users can find out the actual location of a DFS based
file by looking at the DFS tab of file properties.
6. You can create any number of Domain Roots. However, if
you create too many DFS namespaces, you are defeating the purpose of simplified
file access!
7. DFS offers many advantages. You first need to choose the
benefits you need and then plan accordingly.
8. Remember, DFS need not be planned all at one time. DFS
can grow as and when your needs grow. Unlike other features of Windows 2003,
DFS can be implemented in an incremental manner.
When to consider using DFS?
1. If you have multiple servers and you want to consolidate
server-based data access
2. If you want to add more file servers but do not want users
to learn more paths.
3. You are replacing existing servers but you do not want
users to stop accessing files from the familiar locations.
4. You have data on many sites but you want users to connect
to nearest server
Summary
I am sure you are already thinking about trying DFS out in
some scenario within your organisation. Go ahead and try it out. Of course,
do a pilot first. Do not just try things on a live server. If you need any help,
send me an email. However, please note that depending upon my workload, the
response time may vary.
 |
About
the Author:Dr Nitin Paranjape is the Chairman and MD of Maestros (Mediline).
He is a consultant with many organisations, covering appropriate technology
utilisation, business application of relevant technology, application architecture
and audit as well as knowledge transfer. He has authored more than 650 articles
on various technology-related subjects. He can be contacted at nitin@mediline.co.in |
|