Disaster Recovery
An important part of our system is disaster recovery. This includes backups, mirrors, and clustering services. This page is an attempt to explain how we implement each of these technologies, providing backup recovery in the shortest time possible.
In order to provide a plausible disaster recovery solution for on site catastrophes, we run tape backups every night. We run Veritas Backup Exec V. 9.0 presently to back up servers and services we offer. The files are backed up to an IBM R3600 LTO tape library in the Computer Building. In the future we will be taking sets of tapes off-site and switching the sets in the tape library.
How it works
A variety of backup jobs are run during the week, including full and incremental. This is to ensure that any daily change that are made are recoverable. Below is a list off all the data that is stored on the tape library. The type of backup, day it is run, and estimated time for completion is included.
| Job Name | Backup Method | Day Executed | Time Executed | Drive Used | Est. GB (as of 1/30/03) | Est. time to complete (HH:MM) | Est. GB (as of 7/30/03) | Est. time to complete (HH:MM) | Est. GB (as of 10/20/03) | Est. time to complete (HH:MM) |
|
Udrive A-J Full |
Full | Saturday | 18:00 | IBM 1 | 186.29 | 17:27 | 248.39 | 16:40 | 321.9 | 21:08 |
|
Udrive K-Z Full |
Full | Saturday | 18:05 | IBM 2 | 184.43 | 18:21 | 243.12 | 16:12 | 318.7 | 21:11 |
|
Udrive A-J Inc. |
Incremental | Sunday-Friday | 04:30 | IBM 1 | 3.73 | 08:55 | 39.9 | 5:48 | ||
|
Udrive K-Z Inc. |
Incremental | Sunday-Friday | 04:31 | IBM 2 | 3.71 | 02:08 | 38.3 | 5:45 | ||
|
Profiles A-J Full |
Full | Sunday | 23:00 | IBM 1 | 81.49 | 41:28 | Total: 235.5 | 15:53 | 154.1 | 24:14 |
|
Profiles K-Z Full |
Full | Sunday | 23:00 | IBM 2 | 77.52 | 39:48 | 152.6 | 21:22 | ||
| Full | Friday | 16:05 | IBM 1 | 167.91 | 30:00 | 79.5 | 1:35 | 260.4 | 07:17 | |
| Incremental | Sunday-Thursday | 10:00 | IBM 1 | 6.58 | 00:11 | 26.0 | 00:44 |
The files that are included in the backups are self explanatory. The Profiles are backup up from \\REMORA2\S: and the Udrive is backed up from \\REMORA1\Users on the REMORA NAS Cluster. You can see a list of the files backup up with the Servers Full and Servers Incremental jobs here.
The IBM R3600
LTO Tape Library has 20 slots that hold 100 GB LTO tapes. The speed of the
backup jobs vary with the number of files/directories it backs up.
The tape library has a web interface that we use for administration purposes.
It is connected to THING1 via fibre channel.
We had to install several different options on Backup Exec for it to work correctly for how it was set up:
| Option Name | Description/Reason for using |
|
Library Expansion Option |
Needed this to utilize the multiple drives and auto loading features of the R3600 |
|
Remote Agent for Windows |
Installed on the remote computers to increase speed and reliability of backups. (Also stops the job from appearing as 'failed" in the Backup Exec logs.) |
|
SAN Shared Storage Option |
This option was needed for the proper operation of the fibre controller. |
|
Open File Option |
Backs up files that are open. Before Backup Exec would skip them. |
Future Considerations
There are still issues with the time it takes to backup the Profiles directory on the udrive. The reason for this is that there are millions of small files (about 5 million in Profiles) to be backed up rather than a lower number (about 1.7 million in Udrive) of larger ones. An option that we are looking into to solve this problem is the Intelligent Imaging option for Backup Exec. This backs up the metadata and then creates an image backup of the original data. When the backup job is run only the metadata and the image of the actual data is backed up. According to Veritas, this causes no temp space to be backed up, making the job faster. However, there is one drawback to this option. According to Veritas, the restores using this method take much longer than using regularly generated backups.
In an effort to provide a quicker, more efficient restoration procedure for the udrive and the profiles, we have recently implemented using the large IBM Enterprise Storage System (ESS) SAN. You can read about it here.
Clustering is a large part of our systems. A cluster is defined as two or more computers working together as one. This has many benefits, high availability being the most important. With clustering we can update our servers without taking down the services they provide. Also, if one of the servers goes down the service will automatically fail over to the other server. This provides a seamless transition that the end user is usually not aware of and gives us time to look at the problem.
Due to the demand 24 hours a day for certain services we incorporate a number of clusters:
| Cluster Name | Machine Type | Services Provided | Operating System |
| BOSS | (2) IBM xSeries 340 | SQL, Exchange | W2KAS |
| HOPS | (2) Dell PowerEdge 4600 | ProfilesUP 5, 6 | W2KAS |
| HAS | (2) IBM Netfinity 5600 | Pals, ConManServer, UserReg, Web server, SQL | W2KAS |
| SUDS | (2) IBM xSeries 360 | Udrive and ProfilesUP 3, 4 in WIN domain | W2KAS |
| THING | (2) Dell PowerEdge 2450 | Tape Backup, CD Images | W2003AS |
This site maintained by the Classroom and Lab Computing group of Information Technology Services.
Suggestions and comments about this web site: CLC Webmasters; Other contacts here.
This page was last modified: 11/11/2003 9:49:31 AM.