Finding the Words to Move Markets

Algotrading digs deep into breaking news

ROB PASSARELLA, MANAGING DIRECTOR, INSTITUTIONAL MARKETS, DOW JONES & CO.
Originally published in the June 2010 issue

Academia – despite its rather fusty image at times – has a lot to impart to the world of financial services. Academics have used Dow Jones news for some time to extract data for their research – looking deep inside more than 30 years of archival data to discover patterns that repeat themselves again and again. One of many things we’ve learned from the ways academics use our news is that there is a great deal of value in quantifying text with numerics – shorthand analytics that describe the “word data” contained in news stories, paragraphs and headlines. We already know that news moves markets every day around the world. But executing trades based on news headlines is only part of the opportunity. What about the trends that lie behind the news, and beneath the headlines? What if market participants could add news to existing trading strategies or build entirely new ones by analysing hidden patterns in the news – the kind of discovery that merely reading headlines as they arrive can’t produce?

We recently launched a new product offering for traders, quants, research analysts and asset managers called Dow Jones Lexicon. The idea for it grew both out of developments in the academic world and conversations with customers. We found that the traders had already picked up on what the academic community was doing, but lacked the tools to gain deeper insight into news data. So we decided to create a solution that could transform news into numbers and turn those hidden patterns into predictive and profitable trading strategies.
Lexicon is an XML feed tagged with metadata that analyses the tone of news items from all Dow Jones’ content sources – newswires and The Wall Street Journal, among others – based on patterns of words used within each story – including positive words such as “gain”, “mature”, “climb”, “accomplish” or “exceptional,” and negative words such as “weak”, “shortage”, “retreat”, “fail” or “miss.” Lexicon contains six dictionaries in all, mining similar finance-specific terms in categories denoting strength, weakness, uncertainty or litigious activity. These dictionaries were developed by a professor of finance at the University of Notre Dame, Bill McDonald, whose research focuses on capital markets, including text analysis to generate customisable indicators that can be used in trading strategies.

Lexicon is powered by derived data technology – literally, quantifying hidden values and trends within an extremely large amount of news stories. Lexicon has the flexibility to apply different sets of “content lists” to our news and to permission the output for specific customers. The feed can also be paired with Dow Jones’ real-time text feed or archival content to provide the raw material clients need to perform this kind of analysis.

The basic Lexicon service delivers data for the dictionaries we developed with Professor McDonald. But we also know that many firms have developed their own custom dictionaries that define terms in ways relevant to their specific trading strategies. We can host these custom dictionaries as well and process them against our news – a sub-second process that delivers the results directly to the customer. We will also make our dictionaries available to clients upon request – allowing them to understand our methodology and approach. We believe that the key to Lexicon is not just in the dictionaries themselves, but rather in the content, the metadata, the speed of the delivery and our ability to apply dictionaries to our 30-year archive.

Using news in trading models
Incorporating news into automated trading has been around for many years in reference to expected corporate and economic events. This style of trading is prevalent among hedge funds and others, and helps to feed the ongoing low-latency trading war. Lexicon represents the next phase, providing users with the tools to use news to analyse trends, uncover opportunities and also trade around unexpected events.

Trading models are based around numerical inputs ranging from price, volume, fundamental data, etc. and the key is to provide news in a consistent, numerical format which users can both understand and trust. We provide a complete archive of our content which allows researchers and quants to do full back-testing on the data to validate the contribution of news to their strategies.

Trading strategies around newsfeeds have involved various concepts ranging from simple headline counting, which allows models to analyse noise around certain instruments, to standard pre-built sentiment indicators. All of these are valid strategies and are being used by a variety of investors. The benefit of Lexicon is that it offers greater flexibility over standard pre-built indicators.

Hedge fund managers are always looking for new ways to generate additional alpha. Currently integration of news is one of the highest priorities for market participants over the next 12-18 months. We are now seeing our traditional client base of hedge funds and banks expand to include traditional long only asset managers. This is a growing trend in the US and is also developing in Europe.

Making news actionable data
The derived data technology in Lexicon enables users to look at content in a quantitative way. As a news story is published, words are coded based on sentiment (positive, negative or neutral); strong words versus weak words; uncertain words and words that are litigious. As these words appear in news stories, they are analysed to see how frequently and where they are used. Quants and traders then take these numbers to build their own indicators and use them as part of trading models.

Ultimately, a report at the bottom of the story is produced which gives not only a full count of the words but also a list of the words and an indication of where these appear in the story. This positioning of the words is very crucial; for example, if a word is in the headline it is likely to have more significance to the story than a word buried in the body of the text. Also, a frequencycount will also give a better indication of how illustrative the word is to the story. These are some of the factors that can be taken into account when doing analysis of the system’s output.

Another example of derived data is the monthly Dow Jones Economic Sentiment Indicator, or ESI, launched in 2009. The ESI aims to predict the health of the US economy by analysing the coverage of 15 major American daily newspapers. Using a proprietary algorithm and deploying derived data technology, the ESI examines every article in each of the newspapers for positive and negative sentiment about the economy. The ESI’s back-testing to 1990 shows that the ESI clearly highlighted the risk that the US economy was sliding into recession in 2001 and 2008 and suggests the indicator can help predict economic turning points as much as seven months in advance of other indicators.

Identify words that signal trends
This comes back to the work of several academics, including McDonald, with whom we developed the custom dictionaries. All Dow Jones news content is scanned, overlaying these dictionaries, to produce the output. In addition to the Dow Jones dictionaries we can utilise clients’ own in-house dictionaries allowing them to utilise their own in-house research. Relevant economic information goes beyond the accounting numbers, according to Professor McDonald. He notes that sentiment analysis provides a method to assess the massive amounts of extant text in milliseconds and its success is critically linked to the quality of the underlying dictionaries.

dowjones1

News sentiment impacting markets
News, such as a takeover bid, can have an immediate effect on stock price. At the same time, sometimes sentiment builds over a longer period of time which allows users to see trading opportunities. One example of research that we undertook had to do with the impact of sentiment on the S&P 500 during 2009. We calculated the ratio of positive/negative words and then compared that ratio with the S&P 500 index. The result was a positive correlation of sentiment to the S&P 500 and this ratio actually anticipated the S&P 500 by roughly 1 to 1½ months in some cases. This type of correlation would open up clear trading opportunities dependent on the time horizon of the strategy (See Figure 1).

This is just one way that the news can be used. Another method might be building news sentiment indices which can be used to weight various stocks at various periods throughout year (such as in the run up towards year-end results), where traders may be able to get insight into the tone of the result by analysing the tone of the news.

News sentiment can not only be used as trading input to enhance returns – it can also be used as a risk management tool to enhance market risk calculations. The Lexicon analysis can be combined with the VaR calculation which will lead to a quicker, more flexible and more accurate risk assessment calculation. The system offers a consistent approach to extracting meaningful numeric content from text-based articles. A quant can then take this analysis of tone and sentiment and add weightings of importance to phrases or positions in the article. It is also possible to weight these stories with other inputs including price, volume, market activity, and even factor in the time of year for potential seasonal effects.