Tuesday, January 22, 2013

The Origins of Big Data

While sharing our thoughts on big data with our communications team, we were story tellers. The story around big data was impromptu! We realized the oft quoted Volume, Variety and Velocity actually can be mapped to Transactions, Interactions and Actions. I have represented it using a Visual.ly infographic background.


Here is a summary -

“The trend we observe is that the problems around big data are increasingly being spoken about more in business terms viz. Transactions, Interactions, Actions and less in technology terms viz. Volume, Variety, Velocity. These two represent complementary aspects and from a big data perspective promise better business-IT alignment in 2013, as business gets hungrier still for more actionable information.”

Volume - Transactions

More interestingly, as in a story, it flowed along time and we realize that big data appears on the scene as an IT challenge to grapple with when the Volume happens, which comes from either growing rate of transactions. Sometimes transactions occur in several hundreds per second, or as billions of database records required to process in a single batch were the volume is multiplied due to newer, more sophisticated models being applied as in the case of risk analysis. Big data appears on the scene as a serious IT challenge and project to deal with associated issues around scale and performance of large volumes of data. Typically, these are Operational in nature and internal facing.

These large volumes are often dealt with by relying on a public Cloud infrastructure such as Amazon, Rackspace, Azure, etc. or more sophisticated solutions involving 'big data appliances' that combine Terabyte scale RAM at hardware level with  in-memory processing software from large companies such as HP, Oracle, SAP, etc.

Variety - Interactions

The next level of big data problems surface when dealing with external facing information arising out of Interactions with customers, and other stakeholders. Here one is confronted with a huge variety of information, mostly textual, captured from customers interactions with call centers, emails, or meta data from these including videos, logs, etc. The challenge is in semantic analysis of huge volumes of text to determine either user intent or sentiment and project brand reputation, etc. However, despite ability to process this volume and variety, getting a reasonably accurate measurement that is 'good enough' still remains a daunting challenge.

Value - Transactions + Interactions

The third level of big data appears where some try to make sense of all the data that is available - structured and unstructured, transactions and interactions, current and historical, to enrich the information, pollinate the raw text by determining business entities extracted, linking them to richer structured data, linking to yet other sources of external information, to triangulate and derive a better view of the data for more Value.

Velocity - Actions

And finally, we deal with Velocity of the information as it happens. Could be for obvious aspects like Fraud detection, etc. but also to determine actionable insights before information goes stale. This requires addressing all aspects of big data to be addressed as it flows and within a highly crunched time frame. For example, an equity analyst or broker would like to be informed about trading anomalies or patterns detected as intraday trades happen.