Natural disasters, such as floods or earthquakes, are what most people envision when considering a disaster recovery plan. However, any event that prevents you from accessing the data and systems necessary to conduct business should be taken into account. A regional power failure, a rapidly spreading computer virus, employee sabotage, external data fraud, devastating terrorist attacks or even an influenza pandemic should all be covered in a disaster recovery (DR) plan.
Outages are unacceptable for today’s businesses. Consider the impact if your trading systems were to go down, or your voice-communications were interrupted during peak trading hours. These business disruptions are extremely costly for hedge funds pursuing sophisticated strategies that rely on the ability to detect and exploit short-lived inefficiencies and opportunities. They result in loss of revenues, damaged relationships with clients, and a reduction in overall productivity.
Disaster recovery plans are also being evaluated by another critical audience – your prospective investors. Investors are adding DR plans to their due-diligence check list and requesting that a comprehensive, tested disaster recovery plan is in place before they invest their money. DR plans are also closely tied to compliance and governance requirements so investment firms are required to maintain and backup their data for regulatory reasons. A comprehensive DR and compliance plan is crucial to maintaining everyday operations and reporting activities. To help you monitor your fund activity at all levels and to be prepared, you need a disaster recovery plan that delivers in-depth transparency into all of your various systems.
With so much at stake, your company has no option but to implement a well planned DR plan.
Disaster recovery planning, implementation and management is a discipline requiring specialized talents that are often difficult to recruit and retain. Following are some of the initial considerations when looking to implement a DR system.
Data is a firm’s most crucial asset, andprotecting it is one of the most important issues in maintaining continuous business operations. To strictly rely on unstructured backup and archiving processes with unreliable media is unacceptable when dealing with your valuable data. Tape is an appropriate choice for day-to-day restoration, archiving or longer-term storage, but it is completely unsuited to the critical tasks involved in disaster recovery and business continuity.
Following are some of the uncertainties you have to consider when using tape backup:
Business-specific requirements for hedgefund firms will vary. Firms that adopt buy-and-hold “long” strategies have fewer trading requirements. Firms that pursue technical and sophisticated strategies to exploit inefficiencies are very sensitive to downtime as their strategies require the ability to execute fast, high-volume trades. A firm’s disaster recovery preparations and strategies must reflect their underlying business strategy. These underlying requirements in turn directly shape capital-budget decisions.
Universal upfront costs include server hardware, software, connectivity and other resources, such as staff training. Collectively, these represent major investments of capital. More broadly, firms must consider if outsourcing disaster recovery to a service provider or keeping it in-house is right for their business. Questions to consider when evaluating in-house versus outsourcing include “should you lease the real-estate and procure, install and maintain all of that equipment yourself?” and “what are the capital-budget implications of outsourcing DR versus handling it in-house?”
The upfront capital costs of each approach, in-sourcing vs. outsourcing, are generally the same – but ongoing maintenance and management should be given careful consideration as it varies with different approaches.
Understandably, many firms are unenthusiastic about investing their valuable time in understanding, executing and managing a thoughtful, comprehensive disaster recovery plan. Most firms prefer instead to devote their time to the revenue-generating activities of the firm, and want to focus on trading strategies and investment opportunities. However, having a disaster recovery system is a crucial business operations component of any responsible investment firm.
Therefore, firms are increasingly outsourcing appropriate portions or the entire disaster recovery plan to qualified service providers who can bring infrastructure, expertise and focus to their DR requirements and challenges.
It is important to assess all of your critical systems and make decisions about which data, application and voice systems are the most important steps you need to take when formulating a disaster recovery strategy. A key objective in prioritizing your various applications, systems and data sources is determining the Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs).
An RPO is the targeted point in time to which systems and data must be recovered after an outage, and represents the maximum amount of data loss a business can incur in an outage. Organizations must first determine their RPO and then build a DR application that meets their RPO. For example, a trading application might have an RPO of 30 seconds. In the event of an outage and recovery only the latest 30 seconds of data would be lost, everything up until that last 30 seconds would be available.
The goal for the amount of time it takes to actually recover that lost data or service is represented by the RTO. In other words, how long are you willing to wait for recovery of your data? The RTO for mission-critical systems – such as trading or voice systems – might be extremely short while the RTO for a general ledger system might be several hours. These choices carry significant implications in terms of the investments they require, so to make the right choices for your firm you need to carefully analyze the various tradeoffs.
Trading strategies can affect determining a firm’s RPO or RTO. If your firm, for example, is primarily engaged in high-complexity arbitrage or sophisticated quant strategies, your RTOs might be shorter than a firm primarily going along with buy-and-hold strategies.
The “key contributor” dimension is another consideration. Ensure that employees whose knowledge of real-time data is most crucial, receive added emphasis and attention in the DR plan. Ensuring that the biggest revenue-producers (or, perhaps, portfolio managers) receive higher priority for service recovery is key.
A hot site is an offsite physical location where copies of a business’ critical systems, such as trading applications and data, are maintained. A hot site also includes an office space from which employees can work during an outage. The office space can include real estate with separate offices, cubes, desks, workstations, phones and additional office resources and infrastructure.
A hot site must be located within reasonable distance to a firm’s primary location so employees can access it quickly. An earthquake or hurricane could take a wide path and make both locations inaccessible. However, if a hot site is located too far away, employees may not be willing to travel the 50-100 miles to reach it – particularly at a time of a natural disaster or unrest that leaves their home or family vulnerable.
Understanding that hot site facility operators “overbook” their facilities much like airlines is imperative. These facilities charge on a per-seat basis, so they regularly overbook their seats to maximize their profit. In the event of a far-reaching crisis, firms may end up competing with other hot site customers for the same facilities. It is important to understand your rights and access privileges.
A remote site, by contrast, provides a more efficient and concentrated set of services that are often more suitable for a hedge fund. Without physical desks and office infrastructure, a remote site provides a replica of a firm’s IT environment that employees can securely access through standard Internet connections. In most cases, this model provides several advantages:
Three additional important factors to consider when evaluating and selecting a disaster recovery system are: Infrastructure, Security and Testing.
The infrastructure of the remote site or hot site must have multiple levels of redundancy designed and built into each of the following aspects of the facility.
Since remote and hot sites have a constant flow of people mainly unaffiliated with your company, the remote site should have an even higher standard of physical security than a firm’s primary location. Important must-haves include:
Only disaster recovery plans that are regularly and rigorously tested are considered useful. When an outage occurs, firms do not want to rely on an untested DR plan where gaps, mistakes and failures could be encountered and leave employees without service. Regular testing allows a firm to find and amend gaps caused by technology changes or upgrades, and also trains employees so they are comfortable when the DR plan is actually executed.
Starting off small and building up to a full, comprehensive test that includes an unannounced exercise is the best testing technique. By starting small, employees can become familiar with the resources available to them during an outage. As testing requires the shutdown of various systems and components to ensure appropriate fallovers occur, experienced individuals with training in DR solutions should lead these tests. Essential plan testing guidelines include:
Unfortunately, in most instances, business continuity events occur around ordinary occurrences such as a local power failure or a water-main break. However, even mundane matters such as these can cause detrimental damage to an investment firms’ operations. Careful analysis and planning can prepare a firm to “expect the unexpected” and help ensure business as usual operations in the event of an outage.
In closing, here is a brief checklist for building a disaster recovery plan: