Sunday, September 9, 2007

Storage area network (SAN)


In computing, a storage area network (SAN) is an architecture to attach remote computer storage devices (such as disk arrays, tape libraries and optical jukeboxes) to servers in such a way that, to the operating system, the devices appear as locally attached. Although cost and complexity is dropping, as of 2007, SANs are still uncommon outside larger enterprises.

By contrast to a SAN, network-attached storage (NAS) uses file-based protocols such as NFS or SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an abstract file rather than a disk block.


Network types
----------------------------------------------
Most storage networks use the SCSI protocol for communication between servers and disk drive devices. However, they do not use SCSI low-level physical interface (e.g. cables), as its bus topology is unsuitable for networking. To form a network, a mapping layer is used to other low-level protocols:

-Fibre Channel Protocol (FCP), mapping SCSI over Fibre Channel. Currently the most common. Comes in 1 Gbit/s, 2 Gbit/s, 4 Gbit/s, 8 Gbit/s, 10 Gbit/s variants.
-iSCSI, mapping SCSI over TCP/IP.
-HyperSCSI, mapping SCSI over Ethernet.
-FICON mapping over Fibre Channel (used by mainframe computers).
-ATA over Ethernet, mapping ATA over Ethernet.
-SCSI and/or TCP/IP mapping over InfiniBand (IB).

Storage sharing
----------------------------------------------
The driving force for the SAN market is rapid growth of highly transactional data that require high speed, block-level access to the hard drives (such as data from email servers, databases, and high usage file servers). Historically, enterprises were first creating "islands" of high performance SCSI disk arrays. Each island was dedicated to a different application and visible as a number of "virtual hard drives" (or LUNs).

SAN essentially enables connecting those storage islands using a high-speed network.

However, an operating system still sees SAN as a collection of LUNs and is supposed to maintain its own file systems on them. Still, the most reliable and most widely used are the local file systems, which cannot be shared among multiple hosts. If two independent local file systems resided on a shared LUN, they would be unaware of the fact, would have no means of cache synchronization and eventually would corrupt each other. Thus, sharing data between computers through a SAN requires advanced solutions, such as SAN file systems or clustered computing.

Despite such issues, SANs help to increase storage capacity utilization, since multiple servers share the same growth reserve on disk arrays.

In contrast, NAS allows many computers to access the same file system over the network and synchronizes their accesses. Lately, the introduction of NAS heads allowed easy conversion of SAN storage to NAS.

Benefits
----------------------------------------------
Sharing storage usually simplifies storage administration and adds flexibility since cables and storage devices do not have to be physically moved to move storage from one server to another.

Other benefits include the ability to allow servers to boot from the SAN itself. This allows for a quick and easy replacement of faulty servers since the SAN can be reconfigured so that a replacement server can use the LUN of the faulty server. This process can take as little as half an hour and is a relatively new idea being pioneered in newer data centers. There are a number of emerging products designed to facilitate and speed up this process still further. For example, Brocade offers an Application Resource Manager product which automatically provisions servers to boot off a SAN, with typical-case load times measured in minutes. While this area of technology is still new, many view it as being the future of the enterprise datacenter.

SANs also tend to enable more effective disaster recovery processes. A SAN could span a distant location containing a secondary storage array. This enables storage replication either implemented by disk array controllers, by server software, or by specialized SAN devices. Since IP WANs are often least costly method of long-distance transport, the Fibre Channel over IP (FCIP) and iSCSI protocols have been developed to allow SAN extension over IP networks. The traditional physical SCSI layer could only support a few meters of distance - not nearly enough to ensure business continuance in a disaster. Demand for this SAN application has increased dramatically after the September 11th attacks in the United States, and increased regulatory requirements associated with Sarbanes-Oxley and similar legislation.

Consolidation of disk arrays economically accelerated advancement of some of their advanced features. Those include I/O caching, snapshotting, volume cloning (Business Continuance Volumes or BCVs).

SAN infrastructure
----------------------------------------------
SANs often utilize a Fibre Channel fabric topology - an infrastructure specially designed to handle storage communications. It provides faster and more reliable access than higher-level protocols used in NAS. A fabric is similar in concept to a network segment in a local area network. A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches.

Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substantial scalability benefits to the SAN architecture by allowing data to cross between different fabrics without merging them. These offerings use proprietary protocol elements, and the top-level architectures being promoted are radically different. They often enable mapping Fibre Channel traffic over IP or over SONET/SDH.

Compatibility
----------------------------------------------
One of the early problems with Fibre Channel SANs was that the switches and other hardware from different manufacturers were not entirely compatible. Although the basic storage protocols FCP were always quite standard, some of the higher-level functions did not interoperate well. Similarly, many host operating systems would react badly to other operating systems sharing the same fabric. Many solutions were pushed to the market before standards were finalized and vendors innovated around the standards.

The combined efforts of the members of the Storage Networking Industry Association (SNIA) improved the situation during 2002 and 2003. Today most vendor devices, from HBAs to switches and arrays, interoperate nicely, though there are still many high-level functions that do not work between different manufacturers’ hardware.

SANs at home
----------------------------------------------
SANs are primarily used in large scale, high performance enterprise storage operations. It would be unusual to find a single disk drive connected directly to a SAN. Instead, SANs are normally networks of large disk arrays. SAN equipment is relatively expensive, therefore, Fibre Channel host bus adapters are rare in desktop computers. The iSCSI SAN technology is expected to eventually produce cheap SANs, but it is unlikely that this technology will be used outside the enterprise data center environment. Desktop clients are expected to continue using NAS protocols such as CIFS and NFS. The exception to this may be remote storage replication.

SANs in the Media and Entertainment
----------------------------------------------
Video editing workgroups require very high data rates. Outside of the enterprise market, this is one area that greatly benefits from SANs.

Per-node bandwidth usage control, sometimes referred to as quality-of-service (QoS), is especially important in video workgroups as it lets you ensure a fair and prioritized bandwidth usage across your network. Avid Unity and Tiger Technology MetaSAN are specifically designed for video networks and offer this functionality.

Storage virtualization and SANs
----------------------------------------------
Storage virtualization refers to the process of completely abstracting logical storage from physical storage. The physical storage resources are aggregated into storage pools, from which the logical storage is created. It presents to the user a logical space for data storage and transparently handles the process of mapping it to the actual physical location. This is of course naturally implemented inside each modern disk array, using vendor's proprietary solution. However, the goal is to virtualize multiple disk arrays, made by different vendors, scattered over the network, into a single monolithic storage device, which can be managed unifromly.


No comments: