Yesterday’s News

How headlines & trading data can help forecast the markets

Dow Jones Newswires

4:40pm, EST: FDA WARNS OF EYE INFECTIONS WITH CONTACT LENSES>BOL.

This Dow Jones Newswires headline on April 10, 2006 sent traders scrambling for the “sell” key, plummeting Bausch & Lomb’s stock 7.8% in after-hours trading. Two days later the stock had lost 22% of its value as the company struggled to reassure customers they would not go blind from using Bausch & Lomb products while at the same time trying to convince investors that an end to the crisis was near.

While news of a product recall like this is unexpected by nature, the results are predictable: investors panic, the stock falls and the company goes into damage control mode. How much the stock falls and how much tertiary damage is done to its sector and suppliers are often the most difficult questions to answer.

To date, Wall Street has relied on its more experienced traders, supported by diligent research teams, to translate news into trading decisions. Soon, traders and portfolio managers will benefit from trading models and programs that can take streaming, real-time headlines and immediately assess how a company’s stock – as well as that of its competitors and suppliers – will react in the milliseconds after an important announcement is made. News archives are coming online, with the promise of more predictive power for the trading community.

Coding the news

Technical traders have long used and developed models based on numbers. If crude oil hits a new high, automotive stocks will be negatively affected. If Wal-Mart beats earnings estimates, the rest of the retail sector will likely rise on the news. These principals remain constant whether analyzed by a human being or a computer.

In the BOL case, a Dow Jones reporter covering the FDA spotted the warning about the contaminated lens solution on the agency’s website and immediately flashed a headline to hundreds of thousands of news terminals around the world. In the milliseconds between the reporter’s story hitting the wire and its global transmission, no less than seven categorization codes were added to the story. These were largely invisible to the reader but this additional metadata meant a lot to computers on the receiving end. The BOL story carried markers indicating its relevance to “Health Industries,” “Medical Supplies” and “Consumer Issues,” among others.

If he or she was looking for arbitrage opportunities – such as BOL and a company offering a competing product – the program could short BOL and buy its competition in the blink of an eye.

21:23 GMT *WSJ: HCA Inc. In Talks On $21 Billion Buyout

This July 23, 2006, story on the buyout of HCA Inc. by a private equity consortium, for example, had codes “Health Care Providers,” “High-Yield Issuers” and “Mergers & Acquisitions” added to it.

To the user, ticker symbols and categorization codes make the news easy to spot and easy to find later on. For computer programs, they are another data point to incorporate into the model, as are words from a headline or the body of a story. If a trader is accumulating hospital operators, the program could spot the words “hospital” and “buyout” and alert the trader to a strategy-shifting change in the industry.

With keywords and the metadata applied to stories and headlines on stocks and bonds traded worldwide, as well as to the economic indicators that shape their performance – GDP, housing starts, unemployment rates and the CPI, to name a few – the combinations for back-testing seem endless. Extraordinary events such as Hurricane Katrina, for example, are also factors which can affect a stock and economic indicators. Hedge fund managers looking to reduce risk based on climate changes might program weather and geographic location codes, as well as the word “refinery”, into their models. Analyst ratings also offer a wealth of opportunities as companies miss, reach or exceed market expectations.

More is definitely better

Funds seeking to build out algorithm trading capabilities are looking at a fairly significant investment. Aside from robust technological infrastructure, funds will need high-speed real-time data, historical databases, back-testing facilities, analysis engines, monitoring tools and high-speed connectivity.

Weighing in at 300 gigabytes, Dow Jones News and Archives for Algorithmic Applications is the equivalent of 60,000 copies of the complete works of Shakespeare, in a stack of CD-ROM’s or a single external hard drive.

Once the news is copied onto the hard drive, the historical pricing data must be added. Matching these two information sources is complex, making it necessary to have a strong database for analysis. Finally, predicting the future is calculation-intensive, so one must expect to pay for processing power. Hedge fund managers crunching all this news and metadata will need as much help as they can get. If two funds have the same model in play, the one who processes the mountain of information first will win.

The most difficult aspect of integrating news into the firms’ algorithmic infrastructure is analyzing the various news events and understanding how they impact the price of the security. Once those relationships are developed then integrating real-time news into firms’ algorithmic platforms is not immensely difficult.

“A hedge fund should look for an archive that is complete, fully time-stamped and captures the events and news that pertain to what they trade,” said Larry Tabb, Founder and CEO of TheTabb Group, a financial markets advisory firm focused on the intersection of the financial markets and technology. “The quality of the archive is very important because without a good history it is difficult to create the appropriate rules and models.”

Where hedge fund managers will earn their performance fees will be in the creative modeling process and delivering alpha. It’s not that finding relationships between the news and the pricing changes is so difficult, but being more innovative than everyone else is. What will separate the heroes from the has-beens will be their level of creativity and skill.

Upcoming challenges for the hedge fund community

In order to bring this new generation of technical trading into the mainstream, the hedge fund community will have a lot of additional work to do from a technological, mathematical and linguistic standpoint. All of which means there will be some very exciting – and challenging – conversations within the hedge fund community over the next few years.

Initially, the news events incorporated into models will be fairly simple earnings and economic announcements. These will be relatively easy to understand and analyze. As the need grows to build more sophisticated and unique programs, these same firms will have to be more creative, learning to decipher nuance in newsflow. It’s more difficult to determine how Bernanke’s comments about the economy will influence trading, and programs will have to grow in complexity.

The good news is that archive providers aren’t just leaving that stack of CD-ROMs on the doorstop and saying “have a nice day.” They’re pitching in to help, hiring experts to get hedge funds off the ground and encouraging test drives. Dow Jones Newswires, for example, provides a sample of its Archives and then offers guidance on the dissemination of news, how all the information is assembled into the feed, and then the archives. Finally, Dow Jones provides direction on navigating through the Archives’ fields of information and the real-time news feed and identifying the metadata fields for the programmers’ models.

As news-enabled algorithms evolve, hedge funds will be presented with some interesting problems and opportunities. The faster news is assimilated into stock prices the more difficult it will become for people to react to the news in terms of trading.

“As algorithms become more advanced we will see computers play a more dominant roll in equity pricing,” said Tabb. While complex news events will take time to analyze, relatively simple events such as earnings releases, and economic events can be quickly analyzed by computer and incorporated into equity pricing.

Evaluating an archive

There are four rules to keep in mind when evaluating a news archive:

  1. More is Better – Statisticians know the greater the N (or potential universe), the more reliable and precise the findings of a study will be. That doesn’t just mean the number of stories, although that is critical, but it also includes precision of time-stamping and the levels and consistency of metadata which can be cross-tabbed in studies and used as markers in relational databases.
  2. Consistent Quality – Journalism is not a precise science, but any archives will judged in terms of the guidelines, standards, and reputation of a news provider. The degree of GIGO (Garbage In, Garbage Out) must always be taken into consideration when building models.
  3. No Revisionist History – Re-coding of news stories that have already run is not allowed; the archive should represent what was truly available to the trading community at the time of the story, and not be changed or altered to reflect an acquisition, bankruptcy or other corporate action. A programmer looking to model what happened in the automotive industry after the DaimlerChrysler merger will want to be able to isolate the relevant newsflow labeled with C or DAI rather than finding it all lumped together with the current ticker symbol, DCX.
  4. Format – Archives will have different formats. A standardized XML format will make it easier for the hedge fund’s programmers to derive their models.

Dow Jones News and Archives for Algorithmic Applications includes a real-time news feed and 20-years of Dow Jones data files; market-leading sources include Dow Jones Newswires, The Wall Street Journal and Barron’s. The information is delivered in standard XML formats and the metadata fields – date, timestamp, headline, full story, ticker symbol, and Dow Jones category codes – are organized consistently in the database and the real-time field, making it easy to map breaking news to archival data.

The road ahead

Most funds are taking a gradual approach in terms of entrusting their strategies to these new tools. Typically, a manager will set up his models to simply broadcast alerts if new risks enter a trading scheme. Over time, the models will go beyond issuing warnings to actually hitting the breaks on a trade. Perhaps the day will come when complex multi-asset arbitrage strategies will be running on auto-pilot.

“As brokerage firms and investment managers become more adept at understanding the impact of news events on asset pricing,” says Tabb, “news analysis models will become more sophisticated in their utilization of these engines and integrate these strategies into other asset classes and markets.”

“If financial markets’ history teaches us anything,” Tabb continues “it is that as we become more advanced the time between a news event and the assimilation of that event into asset prices declines and electronically readable news is the next leap forward in this progression.”

As with any new tool, expect an arms race. Obvious strategies, such as unloading a stock after a major downgrade, will be profitable for a while as the computers read the news and hit the “sell” button faster than any human being ever could. High impact, but irregularly occurring news will probably be the next issue, such as comments from Federal Reserve Chairman Ben Bernanke about interest rates or inflation. While none of these tools will be immediately accepted, with time they may become the norm.

The hedge fund community, which has always been known for its creativity, competitive drive and risk taking is well suited to take on this challenge. They have long responded to what investors are demanding, and are poised to deliver it.