 |

1776® Fault-Freedom™ II
FAQ's (from integrators & technical end users)
Will it work with my hardware?
- Fault-Freedom II is an open platform. If the type of servers to be used in your cluster can run SCO Open Server 5.0.4 or 5.0.5+, and have sufficient performance for your application, then they are okay with Fault-Freedom II. That's what SCO management requested in specifications for the product.
- There are certain specific components needed. For example, in most cases, mirroring between servers requires a dedicated "private" network cable between the servers in a cluster, separate from the network used by workstations and peripherals. Also, installation and systems administration require SVGA color video on the consoles of BOTH servers in the cluster, with at least 800 x 600 resolution and 256 colors. Finally, we recommend (but do not require) use of a caching disk controller in both servers; this makes the cluster more fault-tolerant.
- There are almost certainly other specific hardware requirements and strong recommendations which depend upon your application. We want to make sure you have the most "systems insurance" for the money spent, and do not run into problems after installation begins Thus, please review the Configuration Requirements document on this web site. Then, equally important, discuss your particular configuration with your supplier or a 1776 technical consultant to verify the hardware requirements.
Do the two servers in a cluster have to be identical?
System tuning will be simpler if you use the same type of system for both nodes in a cluster, but it is NOT required. The two systems can be different. For example, one server can be a multiprocessor system while the other server is a single processor system with less RAM.
There are really only four firm requirements with respect to system capacity:
- For each division (file system or raw partition) which you want to mirror between servers, the mirror-from and the mirror-to divisions on each server respectively must contain EXACTLY the same number of blocks. You may, however, have more PHYSICAL storage on one system than the other, so long as the logical divisions which are mirrored are the same size. The storage on each system does not have to be the same architecture. For example, you can use RAID on one server and not on the other.
- If there is a difference between the speed of the storage on each computer, the faster storage should be on the backup system, because otherwise the backup system will fall behind (unless disk I/O is light). If you use a RAID controller with cache on the primary system, you should also install a similar RAID controller with cache on the backup system.
- The backup system must have sufficient power and resources to handle the combined load of both systems (after failure of the primary server. Likewise,if you will be installing a bi-directional configuration, then each server has to have sufficient power to handle the combined load of both systems).
- The network interface cards (NICs) used in both computers should be the same model, for good mirroring performance.
How far apart can the two computers in a cluster be ? Can they be in separate cities?
The distance isn't the issue - the question is, do you want to use a WAN (Wide Area Network to connect the servers in the cluster. If so, there are two problems:
- The bandwidth is low compared with the speed of local disk I/O, so mirroring to the backup server is likely to fall behind more than a few seconds.
- It is inherently unreliable. For these reasons, if you want to put the servers in a cluster in separate cities over a WAN, we do not recommend installation of automatic failover. Instead, you may use the second server as a remote data archive - to keep an online backup of critical data at a remote site for disaster tolerance. We call this a "disaster tolerant" configuration.
If you want a remote disaster tolerant configuration, you may also be able to install "administrator-initiated failover". This will be possible if there is a way for the system's users to access the remote system. Please discuss this with your supplier or a 1776 technical consultant.
If you want to install automatic failover instead of "administrator-initiated
failover" then the two servers should be connected by a local area network
(LAN) under your control.
If the two servers are on a LAN (because you want automatic failover), you
can still accomplish a substantial amount of disaster tolerance. For example,
you can locate the second server down the hall, in a building across the street,
or even further away if a LAN cable of sufficient speed can be installed. This
amount of physical separation of the two servers offers substantial protection
against data loss due to fire, theft, water damage and other localized
disasters.
How much time does it take to failover to the second server?
Failover (whether automatic of administrator-initiated) takes anywhere from a few seconds to a few minutes, depending on the application. The actual process of failover takes less than 10 seconds! BUT. . . there is additional time required for Fault-Freedom II to ensure the integrity of the data. Usually this is just the amount of time which the Fault-Freedom II software needs to run a file system check (which is done automatically by the Fault-Freedom II software as part of a failover operation) and this varies depending on the amount of data you have. You may also optionally set up your application software of database software to automatically perform a data integrity operation (such as re-indexing or rollback operations) at the time of a failover to further ensure data integrity. The amount of extra time this requires at the time of a failover depends upon your application.
Note that if you install administrator-initiated failover instead of automatic failover, the amount of time to complete the failover operation is exactly the same, except that, with administrator-initiated faliover, the administrator must first make the decision to initiate the failover operation. Once the administrator starts the failover operation, the entire failover operation will take place just like an automatic failover operation.
Does the second computer just take over transparently? That's what I expected - no down time. No lost transactions!
Not exactly. Unlike (much more expensive) fault-tolerant computers, even though failover can be automatic with Fault-Freedom II software, there will be some seconds or minutes of downtime. Also, any transaction data which has not yet been written to disk (at the time of a computer failure) will not be saved. Users will log in again (when the second computer takes over) and re-enter the application. They will then have to re-do any transactions or other operations which were incomplete at the time of the failure. This is the nature of clustering software.
On the other hand, there are many advantages of clustering software over much more expensive fault-tolerant computers:
- Full redundancy of the data - no single point of failure and protection of the data from destruction due to physical damage to the storage subsystem.
- Ability to physically separate the servers for disaster tolerance.
- Ability to install "point-in-time" mirroring to protect against user or software errors that can corrupt data.
- Ability to do on-line system maintenance and on-line tape backup.
So what does a user experience when a computer fails?
If one computer dies, and the other server in a cluster takes over, it will be just like you rebooted the first server, but much faster. In other words, the users (on terminals or PC workstations) only need to login again. The will just login again; they do not have to know which server is running the application. With some older PC workstations, the PC workstation may have to be rebooted before it will recognize the backup computer has taken over.
If data corruption causes a system to go down, won't the other system become corrupted, too?
This can happen, though rarely. Other causes of system failure are much more common. It is always better to have a backup copy of your data than to rely upon a single copy of the data.
Is the second computer just a standby system, or can both computers in a cluster actually be running applications?
Yes. There are two ways to do this:
- You can have primary and a backup computer, but the backup computer also runs applications. For example, use the backup computer to run various kinds of communications applications like fax server, messaging, web site, etc. We call this an "integrated communications server"
- With Fault-Freedom II, you can run mission-critical application on the second computer, and use the first computer to backup the second computer. In other words, each computer backs the other up. The two computers are equal and symmetric, each one backing up the other . We call this "bi-directional" cluster.
In both cases, you are running applications on both computers, and therefore sharing the load between the two computers. This helps justify the purchase and installation of a cluster.
Does this mean we can run a single application on both computers, splitting the user load between the computers?
You cannot run the same application on both computers, if the two computers will access the same data files. The applications running on each computer must access different data files.
Can we use the mirror-to data on the second computer to run reports or other read-only tasks?
You can do this, but only in combination with "point-in-time" mirroring. In other words, you can run the reports (or other read-only tasks) only during the period of time when mirroring is turned off. This is actually a very useful way to setup a cluster. If you want continuous (rather than "point-in-time") mirroring, you can still accomplish this by using an MFS mount (running the reports on the second system, but accessing the data on the first system). You may want to discuss this further with your Fault-Freedom II supplier.
How long or how hard is it to install?
- Fault-Freedom II software itself can take just a few hours to install and configure. One of SCO's requirements was that the product be relatively easy to install. As compared to similar products on other platforms, it is relatively easy to install.
- As a practical matter, however, you should probably allow two to five days for the first time installation - depending on how complex your installation is. This gives time for reviewing the manual, getting the hardware set up properly, including the networks, fine-tuning the software configuration, and of course, testing the setup to see that it works just the way you want it to. If you are doing a particularly complex configuration, including optional feature modules (Process Monitoring and/or Bi-directional mirroring and failover), or if you are not familiar with the hardware to be used, then you should allow the most amount of time.
It is impossible to accurately predict the amount of time needed for installation, as there are many variables, including the possibility of unreliable hardware or networking. To help with first-time installation, 1776 Fault-Freedom II distributors (including 1776, Inc. itself, if you are in the U.S. or Canada) may offer an "Implementation Planning Session" - a telephone discussion of the steps needed for installation at a particular site - to reduce the time needed for installation. You may also choose to ask the distributor (or 1776, Inc. itself, if you are in the U.S. or Canada) to actually do the installation for you or to assist you by providing on-site training and support. (Note: available services depend on the particular 1776 distributor.)
|
|
|
 |
|