Big Data: Beyond the Hype
Wesley Chan, senior vice president and director of Stock Selection Research at Acadian Asset Management, argues that automation is no substitute for the hard work of analyzing information.
Surely by now you have heard about Big Data? Wikipedia defines it as “a broad term for data sets so large or complex that traditional data processing applications are inadequate.” The increase in logging, sensors, and mobile technology has caused the number of quantifiable things to explode, and advances in data storage, visualization and machine learning have made patterns easier to spot. That at least is the dominant narrative of our time.
Given the number of recent headlines, this topic has reached the saturation point for many. Popular opinion will take Big Data on its inevitable trajectory from universal cure-all to yesterday’s news as its applications become ubiquitous. Any thoughtful observer will increasingly conclude that Big Data is no panacea. Simply having a large volume of information does not make anyone’s job easier – in fact, a great deal harder.
Let’s take a moment to look beyond the hype. In finance, there is no question that data availability is increasing exponentially and that the technology to manage it is getting faster and cheaper. Should we as investors care?
The answer is yes and no. Outside of high-frequency trading strategies, most investment strategies do not actually need Big Data. Or rather, it is not clear that the vast labor involved is justified by the resulting expected predictive power. This is because, at the moment, the best applications of Big Data are within firms, not across assets. For example, with the division- and location-level turnover of employees of any firm I can tell you a great deal about the state of mind of employees and the importance of various initiatives, but much less about whether or not the firm will beat a set of competitors. If I had the same information for many firms, I might do better. But such data is very hard to come by, inconsistently formatted and organized, and, without intuition about what to look for, likely to lead to an ill-conceived data-mining expedition.
For investors, the main goal will be mapping such data to investible assets, which is extremely difficult. Knitting disparate data sources together – and purging them of errors – is an art. Wikipedia further mentions that “Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy.” Recall that most internally generated data was never meant specifically to forecast asset returns. To put it bluntly, working with Big Data is manual, time consuming and expensive.
Assuming the data is in good order, automation is no substitute for the hard work of analyzing information. At the moment, machine and human approaches are surprisingly far apart in ability to identify “things.” Past studies have famously demonstrated how difficult it is for a computer to simply identify a chair, or a cat, from pictures. Current strategies include machine learning – training a program with repetitive examples – and environmental control, which means changing the problem to a format that makes it easier for a machine to handle. Both are illustrated by the fact that many firms have invested in humans to tag pictures by hand, for the purpose of training algorithms to do the same job on a wider scale. In investing, a major application is text, but the industry is still dealing with the same problem. There are infinite ways to say “a firm is doing well [or poorly]” in the English language. For immediate news, one can use set reporting phrases, but not so for longer and more thoughtful forms of text. For example my firm, Acadian Asset Management, recently sent a simple two-sentence section of a research report through four different text analysis software packages. The goal was to correctly identify places, people, and product names. As you might expect, we got four different results, and none without significant errors.