Monday's Musings: The Three V's of Big Data

July 3, 2011

The Three V's Traditionally Define Big Data
Traditionally, big data describes data that's too large for existing systems to process. Over the past three years, experts and gurus in the space have added additional characteristics to define big data. As big data enters the mainstream language, it's time to revisit the definition.

Volume. This original characteristic describes the relative size of data to the processing capability. Today a large number may be 10 terabytes. In 12 months 50 terabytes may constitute big data if we follow Moore's Law. Overcoming the volume issue requires technologies that store vast amounts of data in a scalable fashion and provide distributed approaches to querying or finding that data. Two options exist today: Apache Hadoop based solutions and massively parallel processing databases such as CalPont, EXASOL, GreenPlum, HP Vertica, IBM Netezza, Kognitio, ParAccel, and Teradata Kickfire
Velocity. This characteristic describes the frequency at which data is generated, captured, and shared. The growth in sensor data from devices, and web based click stream analysis now create requirements for greater real-time use cases. The velocity of large data streams power the ability to parse text, detect sentiment, and identify new patterns. Real-time offers in a world of engagement, require fast matching and immediate feedback loops so promotions align with geo location data, customer purchase history, and current sentiment. Key technologies that address velocity include streaming processing and complex event processing. NoSQL databases are used when relational approaches no longer make sense. In addition, the use of in-memory data bases (IMDB), columnar databases, and key value stores help improve retrieval of pre-calculated data.
Variety. A proliferation of data types from social, machine to machine, and mobile sources add new data types to traditional transactional data. Data no longer fits into neat, easy to consume structures. New types include content, geo-spatial, hardware data points, location based, log data, machine data, metrics, mobile, physical data points, process, RFID’s, search, sentiment, streaming data, social, text, and web. The addition of unstructured data such as speech, text, and language increasingly complicate the ability to categorize data. Some technologies that deal with unstructured data include data mining, text analytics, and noisy text analytics.

The Bottom Line: Start With Your Business Objectives

In Stephen Covey's book, Seven Habits of Highly Effective People, he starts with a saying, "Begin with the End in Mind". For big data projects, ask the key questions. What patterns will you uncover that will change how you go to market or address fraud? Can you apply sentiment and location to create new customer experiences. What additional insights can help you create new and disruptive busienss models? Big data is just a technology and tool. How you apply this tool to your business models and objectives will determine whether big data is a luxury or a necessity.
Your POV
What business problem will require you to start with Big Data? What are the key outcomes? Where do you expect to move the needle? Add your comments to the blog or send us a comment at R (at) SoftwareInsider (dot) org or R (at) ConstellationRG (dot) com
Resources

Reprints
Reprints can be purchased through Constellation Research, Inc. To request official reprints in PDF format, please contact Sales .
Disclosure
Although we work closely with many mega software vendors, we want you to trust us. For the full disclosure policy, stay tuned for the full client list on the Constellation Research website.
* Not responsible for any factual errors or omissions. However, happy to correct any errors upon email receipt.
Copyright © 2001 -2011 R Wang and Insider Associates, LLC All rights reserved.
Contact the Sales team to purchase this report on a a la carte basis or join the Constellation Customer Experience!

#bigdata analytics BI Big Data business analytics business intelligence business technology business value CalPont colummnar database columnar database complex event processing Constellation Research Consumerization of IT content Data deluge data governance data quality data streaming enterprise applications enterprise apps Enterprise apps strategy Enterprise Software enterprise strategy EXASOL geo-spatial hardware data points HP Vertica IBM Netezza in memory database Kickfire Kognitio location based log data machine data metrics mobile Monday's Musings Netezza ParAccel physical data points process R "Ray" Wang; RFID’s rwang0 Search sentiment social social service social support SocialText socialytics Software Insider SoftwareInsider streaming data teradata Teradata Kickfire text Variety Velocity Vertica Volume web