Revisiting the Three V’s of Big Data
It’s time to revisit that original post from July 4th, 2011 post on the the Three V’s of big data. Here’s the recap:
Traditionally, big data describes data that’s too large for existing systems to process. Over the past three years, experts and gurus in the space have added additional characteristics to define big data. As big data enters the mainstream language, it’s time to revisit the definition (see Figure 1.)
- Volume. This original characteristic describes the relative size of data to the processing capability. Today a large number may be 10 terabytes. In 12 months 50 terabytes may constitute big data if we follow Moore’s Law. Overcoming the volume issue requires technologies that store vast amounts of data in a scalable fashion and provide distributed approaches to querying or finding that data. Two options exist today: Apache Hadoop based solutions and massively parallel processing databases such as CalPont, EMC GreenPlum, EXASOL, HP Vertica, IBM Netezza, Kognitio, ParAccel, and Teradata Kickfire
- Velocity. Velocity describes the frequency at which data is generated, captured, and shared. The growth in sensor data from devices, and web based click stream analysis now create requirements for greater real-time use cases. The velocity of large data streams power the ability to parse text, detect sentiment, and identify new patterns. Real-time offers in a world of engagement, require fast matching and immediate feedback loops so promotions align with geo location data, customer purchase history, and current sentiment. Key technologies that address velocity include streaming processing and complex event processing. NoSQL databases are used when relational approaches no longer make sense. In addition, the use of in-memory data bases (IMDB), columnar databases, and key value stores help improve retrieval of pre-calculated data.
- Variety. A proliferation of data types from social, machine to machine, and mobile sources add new data types to traditional transactional data. Data no longer fits into neat, easy to consume structures. New types include content, geo-spatial, hardware data points, location based, log data, machine data, metrics, mobile, physical data points, process, RFID’s, search, sentiment, streaming data, social, text, and web. The addition of unstructured data such as speech, text, and language increasingly complicate the ability to categorize data. Some technologies that deal with unstructured data include data mining, text analytics, and noisy text analytics.
Figure 1. The Three V’s of Big Data
Contextual Scenarios Require Two More V’s
In an age where we shift from transactions to engagement and then to experience, the forces of social, mobile, cloud, and unified communications add two more big data characteristics that should be considered when seeking insights. These characteristics highlight the importance and complexity required to solve context in big data.
- Viscosity – Viscosity measures the resistance to flow in the volume of data. This resistance can come from different data sources, friction from integration flow rates, and processing required to turn the data into insight. Technologies to deal with viscosity include improved streaming, agile integration bus’, and complex event processing.
- Virality – Virality describes how quickly information gets dispersed across people to people (P2P) networks. Virality measures how quickly data is spread and shared to each unique node. Time is a determinant factor along with rate of spread.
Figure 2. The Five V’s of Big Data
The Bottom Line: Big Data Provides The Key Element In Moving From Real Time To Right Time
Context represents the next frontier as we move to intelligent systems. Big data systems and techniques will provide the key infrastructure in delivering context within business processes, across relationships, by geo spatial position, and within a time spectrum. As engagement systems make the shift to experiential systems, expect context to provide the key filter in improving signal to noise ratios. Big data provides the context required to move from real time to right time.
Catch Constellation’s Big Data Coverage From VP and Principal Analyst – Neil Raden
Upcoming Report: Analytics in the Organization: Types, Roles and Skills
“Analytics” is a critical component of enterprise architecture capabilities, though most organizations have only recently begun to develop experience using quantitative methods. This report discusses the role of analytics, why it is a difficult topic for many, and what actions you should take. It lays out the various meanings of analytics, provide a framework for aligning various types of analytics with associated roles and skill sets needed.
Blog Post: What Is a Data Scientist (and What Isn’t)
Big Data doesn’t happen by itself. Because the tools and techniques are different from traditional Data Warehousing/Business Intelligence approaches, big Data requires different skills. This role has become known as the Data Scientist. Have a look at analyst Neil Raden’s take on the data scientist.
Watch for the following:
- Here is all my stuff: Select what you like:
- Understanding Data: Mechanical MDM, Ontology, Machine Learing
- Future of IBM’s Watson
- Tainted Truth: How to Read Statistical Research
- noSql: The End of the Relational Database
- Analytical Platforms: Revenge of the Relatioal Database
- Next Wave of BI
- The Data Scientist
- Planning and Performance Management Supercharged with ANalytics
- Hadoop vs. ETL vs. ELT
- CEP: From Product Class to Wider Application
- Real-Time Decision-Making: Where It Fits
- Are Rules-Based Management Systems Dead?
- Skills Checklist for Big Data
- Skills Checklist for Business Analytics
- Interactive Data Visualization
- Let the Gorillas Write the Script: Forget Requirements
- Data Warehouse Rescue: What to Do with your Legacy Warehouse
- BI Rescue: What to Do with your Legacy BI
Your POV
What business problem will require you to start with Big Data? What are the key outcomes? Where do you expect to move the needle? Add your comments to the blog or send us a comment at R (at) SoftwareInsider (dot) org or R (at) ConstellationRG (dot) com
Resources
- Monday’s Musings: The Three V’s of Big Data
- Research Report: Rethink Your Next Generation Business Intelligence Strategy
- Monday’s Musings: Balancing The Six S’s In Consumerization Of IT
- Monday’s Musings: A Working Vendor Landscape For Social Business
- Research Report: The Upcoming Battle For The Largest Share Of The Technology Budget Part 1
- Research Report: How The Five Pillars Of Consumer Tech Influence Enterprise Innovation
- Best Practices: Five Simple Rules For Social Business
Reprints
Reprints can be purchased through Constellation Research, Inc. To request official reprints in PDF format, please contact Sales .
Disclosure
Although we work closely with many mega software vendors, we want you to trust us. For the full disclosure policy, stay tuned for the full client list on the Constellation Research website.
* Not responsible for any factual errors or omissions. However, happy to correct any errors upon email receipt.
Copyright © 2001 -2012 R Wang and Insider Associates, LLC All rights reserved.
Contact the Sales team to purchase this report on a a la carte basis or join the Constellation Customer Experience!




6 Comments »
There might be two other elements here:
value: pieces of data have different value such as how connected to key performance indicators. Learning big data systems should have the ability to learn and pre-process data based on estimated data – this can help to process more data quickly
connections: big data tools need to understand the network effect – how data adds value to other data and how meaning spans multiple pieces of data across many sources and periods of time
BTW: you’ve got a typo on Figure 1
Ray, great note – here are a few other characteristics of data that are useful to consider:
1. Level of aggregation – many companies continue to store aggregated data and discard the original “raw data,” which undermines the kinds of analyses that Big Data solutions now permit.
2. Metadata – likewise, the extent and completeness of metadata (source, time, location, etc.) dramatically enhance the kinds and usefulness of analyses one can do.
3. Contextual data – with the appropriate metadata (esp. time and location), contextual data can be appended (weather, economic data, etc.), which makes it possible to “control for” what would otherwise appear as random fluctuations.
One other important characteristic, difficult to know in advance, is the Signal-to-noise ratio. All things being equal, “weak” or faint signals are more difficult to detect – in many areas, like fraud prevention, cybersecurity, healthcare and M2M, detecting weak signals sooner rather than later offers significant benefits.
Dr. Phil Hendrix, immr
@phil_hendrix
Phil – thanks for taking the time to comment. Signal to Noise is only getting worse. Hopefully big data helps us with that – Ray
[...] in Beyond The Three V’s of Big Data, R “Ray” Wang blogged about two additional [...]
Great to see the industry finally adopting the “3V”s of big data over 11 years after Gartner first published them. For future reference, and a copy of the original article I wrote in 2001, see: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/.
Since then, we’ve expanded the model to 12 dimensions, including viscosity among others (discussed at last year’s Gartner BI Summit).
–Doug Laney, VP Research, Gartner, @doug_laney
Doug – thanks for sharing your thoughts and historical POV on the 3V’s. From my understanding, the concepts behind the 3V’s (especially V1 and V2) have been around for over the past 2 decades but not as fashionable =). It’d be nice to see your complete 12 dimensions outside the Gartner paywall and get your points of view. See you on the BI/Big Data circuit! – Ray
Leave a comment