IT Considerations For Disaster Recovery

Hedge funds: IT considerations for disaster recovery

Richard Seager, Director, RSM Moffat, London
Originally published in the May 2005 issue

The increased threat of terrorism in the UK since the events of September 11, the continuing focus from the regulatory bodies such as the FSA and SEC (NY), and the increased reliance on technology, has meant that planning for a disaster has become paramount to the hedge fund community.

The belief that 'it will not happen to us' or that insurance alone will cover the cost and will enable an organisation to recover effectively from a loss or incident is no longer an acceptable strategy. Insurance is a key component of an overall business continuity solution, but it does not win back the confidence of investors when a business is out of action and their investment is at risk. While it may provide for the financial aspects of a loss or incident, it does not provide the methodology to recover and rebuild the organisation or win back investors.

It is common for investors to ask for visibility of disaster recovery and business continuity plans, and the regulatory bodies have taken a more proactive approach to establishing guidelines for the financial services industry. Disaster recovery planning and testing is a critical part of the business process.

We have worked with various financial services clients, a majority of these being hedge fund managers, in areas of planning, implementation and testing of disaster recovery scenarios.

The majority of our clients lease infrastructure, market data services and building space from a global leader in the industry such as Sungard or IBM. This proves to be the most cost effective solution as the cost of the equipment and space is shared between the subscribers. The downside to this is the space is 'shared' and not dedicated. Each of these companies has their own way of managing this process be it first come first served, or equitable share.

Where reliance on technology is significant and downtime has to be kept to a minimum, clients purchase their own hardware, set up real-time replication and house the replicated servers at one of these sites. This gives the flexibility of having data safely off-site and up-to-date and has the option to rent desks/office space as required. With the development of secure home working technology, the reliance on being in the office may not be required, especially in a short term disaster situation.

 

Provision for the technology aspect of disaster recovery, the business critical IT components and their recovery time objective (RTO) have to be identified by the client in association with the IT provider and a suitable solution to meet these requirements implemented.

For many companies the RTO allows for no outage and the solution has to be the installation of a High Availability (HA) infrastructure. These systems typically involve the provision of hardware in both the primary and disaster recovery site linked by direct high speed IP bandwidth – the reduction in telecoms costs has made this more viable in the last 18 months. This connectivity acts as an extension to the corporate network. This facilitates data replication in real time between sites using either a hardware or software (Veritas Global Cluster or Double Take) solution.

The increasing amount of research delivered by email, and general increases in both email flow and general data storage for files and databases, has meant clients' data storage requirements have increased significantly in the last 12 months. This in turn has had a knock-on effect on how you tackle disaster recovery.

This continued expansion of data storage and its criticality to the business has meant that the older methods of restoring data from tape in the event of a disaster is no longer a viable timely solution. However, tape back-up is still essential to guard against data corruption and can provide a cost-effective restoration path for non-critical data.

This data growth has led to the deployment of SAN's (storage area networks) and NAS (Network Attached Storage) which can provide terabytes of useable space in a flexible and expandable manner. Replicating this type of environment increases complexity and expense. These solutions allow clients to have a "live" or standby view of their data but lead to increased operational cost and overall increase in the total cost of ownership (TCO) of the IT infrastructure. For businesses that implement this type of solution the cost of provision is secondary to the loss of income should the IT environment not be available.

The illustration (left) shows the components and connectivity associated with a standby data replication solution housed in a shared disaster recovery site. Unless a client has invested in a dedicated suite/site for disaster recovery, this scenario would be our recommendation both from a fiscal and technical perspective.

In our experience the key is getting the business to focus on the really important components and applying the high-end solutions where appropriate. This ensures the business is operational in the shortest time; having a hierarchical approach to the service provision ensures costs are efficiently managed. Replicating the whole environment can be an expensive business and an unnecessary one.

Whilst the IT disaster recovery plan is an important element at a physical level, this is only one aspect of recovering the business. To be successful the IT recovery plan has to be written in conjunction with a business recovery plan which is typically written, and owned, by the client. This plan would include business processes, procedures, contact information, and a time-line of where the business expects to be at key points of the invocation.

Case study – an actual invocation!

For a London-based hedge fund the dividends of provisioning, testing and documenting for a disaster were recently proven.

In this instance all incoming circuits from BT were lost when work in the street cut through the main circuits. To add to the disaster, a water pipe was also dug through! The main servers, data and computer equipment were left intact in the client's building, however no external connectivity remained.

The CEO took the decision to invoke disaster recovery and instructed us and all relevant staff to attend the disaster recovery site. Hardware was made available by the provider and we rebuilt the environment and re-established external connectivity. The client had phones and key market data services within two hours of invocation, and full restoration of IT systems within six hours.

Resolving the problem at the primary office took nearly a week and this client operated from their disaster recovery offices for the duration.

One key lesson learned in this invocation was that the process for returning to primary offices had not been clearly identified. It was one area that had not been tested during the twice-yearly disaster recovery tests!

The success of this invocation was due to the client ensuring their plans were well documented and frequently tested. The costs associated with providing this were minimal compared to the loss of business that would have been encountered if no such plans were in place, being less than £7k per annum.

RSM Moffat is an IT consultancy and project management company which specialises in IT outsourcing solutions for hedge fund managers. RSM Moffat is a wholly owned subsidiary of RSM Robson Rhodes.