Sentiment analysis: Not all paths are the same
In 1998, Merrill Lynch claimed that between 80%-90% of all potentially usable business information originated in unstructured form.
This unstructured data, mostly text, is from public filings, research reports, and the internet, eg, public news, blogs, etc. It has long been a challenge that all the buy side, sell side, and data vendors have tried to address, either in house or by the use of third party products to extract insight out of textural data.
The principal focus to achieve this is ‘sentiment and news analysis’, which extracts the polarity or opinion of an entity, eg, a company, out of a piece of text by a variety of algorithms. These have evolved from simple positive or negative word lists, to more mature rule based and/or machine learning. But limitations still exist.
Vendor or in-house developed products attempt to explain how the indicator is derived, but the user never fully understands either the detail or how the results have been calculated. The underlying drivers in the news flow that caused the result to be obtained are simply not addressed.
It is difficult for business users, be they research analysts, economists, traders or portfolio managers, to interpret the results. As a consequence, they have no confidence to make critical investment decisions based on this approach.
It is at best extremely difficult if not impossible, to modify how the sentiment data is derived –it is very time consuming and typically involves expensive custom projects. The knowledge between vendor and user is at best, isolated.
Less alpha when used by more users
The most widely-known sentiment products are from third party vendors. Processing unstructured data requires significant collaborative effort and specialised skillsets, such as natural language processing (NLP), financial domain knowledge and massive parallel processing (MPP) systems. These are very expensive and difficult to build and maintain.
In reality, all the sentiment analytics vendors publish the same result to all their clients, therefore even when alpha exists on sentiment data, it will be diluted when more and more clients purchase the same data.
It sounds like a trivial exercise to distribute sentiment data in the same way as pricing or reference data and leave it for the user to consume, integrate and use.
This is simply not the case. Firstly, sentiment analytics is about processing and responding to real-time stories, events that are happening right now. Sentiment fades when it takes a long time to consume
Secondly, building up in-house consumption capability is costly. This obstructs the adoption of sentiment data, especially in financial institutions that are forced to spend most of their IT related budgets on regulatory or compliance projects.