Three Ways to Analyse Social and Market Data - Recreating the Ear of the Markets Part II

Once market prices hit the screen or populate a spreadsheet they are already history. In fact markets move on sentiment fed by stories that can bubble up from almost anywhere. David Cox posed the question: What if we could track the evolution of stories – the noise from other pits – before they are baked into market prices? While the technology may be available today to create this revolution, the challenge is to identify the right pieces and pull them all together...

The Revolution in Data Management

Content technologies are designed to deal with only one type of data, and do not focus on the interaction between them. For example databases handle structured information in a specified schema; and document management systems store Word documents and PDFs, using metadata to organize them. Social content is even less clear. The sprawl of text, images and video online resides on systems that serve content, but don’t understand meaning and sentiment. At the extreme, tweets are a stream of consciousness, unstructured except for the crude #hashtags.

The first step to recreating the ear of the market is listening to all these sources, from evidentiary sources (market data, news, press releases, filings, research), to less formal content (blogs, tweets, IMs and emails). Effective solutions sit at the nexus of all those signals, capable of speaking the language of each source. This may seem like more noise: turning the tap all the way when the bathtub is already full. In actual fact, one way to separate information from noise is to look for correlations between seemingly random signals. Hence, the more comprehensive the list of sources, the more likely a system is to spot leading trends.

A certain amount of data normalization is required, such as converting instrument symbols to standard forms, and identifying entities in text. After that, three technology paradigms may be used to process these signals: Search and Business Intelligence (BI), to analyze the past; Complex Event Processing (CEP), to react to the present in real-time; and Predictive Analytics, to anticipate the future.

Analyzing the Past, Present and Future

Search and BI together can be used to better understand what moved the markets in the past, as well as to find little-publicized nuggets that could move sentiment in the future.

Sophisticated relevancy models, originally exclusive to web search engines, are now available to financial institutions and regulators. Once all the text is indexed, analysts have the tools to find the needle in the haystack. In addition BI technology continues to develop, allowing users to process billions of data points and analyze the relationships between them.

With semantic analysis and unsupervised clustering, search engines are also capable of detecting resonant trends across signals in near real-time. This can surface an important story that was not obvious in any single source, and be a leading indicator of market sentiment to come.

For pure real-time analysis, CEP technologies are needed. These high performing frameworks allow business rules to be triggered based on information gushing from the tap at full throttle. CEP also provides the ability to calculate mathematical trends in-flow, without the need to pause and snapshot the data.

For example a rule could correlate the variable sentiment from unstructured sources (like news wires and social networks) with classic market data (trading prices, macro and micro-economic forecasts). When defined thresholds were met, this could trigger either an automated response, or start an analysis workflow.

Predictive Analytics is the algorithmic extension to the above. It uses empirical data as well as business rules to infer future market movements based on current patterns. The key here again is to take a comprehensive view of all data, rather than just looking at lagging market prices. This is a clear way to gain a competitive advantage over other risk and trading systems.

Last but not least, for humans to consume all this can be a challenge. Thanks to data visualization, we no longer need to rely on massive arrays of screens to absorb it. Instead of spreadsheets of numbers, users should leverage dynamic and colorful charts and 3D models. This will allow them to reduce multiple trading screens to just a few.

A New World of Connected Information

All the above technologies (sentiment analysis, data transformation, search, CEP, BI…) are already widely available in the market. The big shift now is the arrival of large scale affordable Cloud computing. Analysts and traders may have previously conjectured on the correlation between blogs and stocks. But no bank or hedge fund realistically had the IT resources to collect the massive amounts of real-time unstructured data out there, let alone to perform the calculations separating signal from noise.

With private and public clouds, analysis can begin on the desktop, then “burst” into high performance resources when needed. Banks only pay for the duration of that calculation, without the overhead of running large data centers 24x7. Cloud also allows multiple technologies to exist in a single virtual environment. Collocation is no longer an issue. This flexibility and accessibility is already revolutionizing many industries – and financial services are poised to be next.

The maturing of search, CEP and predictive analytics, combined with open data access, visualization and affordable Cloud resources: all these contribute to the advent of a game-changing era for data processing. Be it for risk analysis, or to design more effective trading strategies, firms now have the possibility to once again tap into the ear of the market – where now the “market” isn’t just the floor the trader is on, but an ever-expanding world of interconnected information.