Big data, by its very name, often conjures up notions of greater and greater volumes of information. However, as big data evolves there seems to be a growing sense that bigger is not always better. There are a number of factors that contribute to an understanding of the benefits and promise of big data.
An “ancient” (in our world) but important 2012 article by MIT Professor Erik Brynjolfsson, Big Data: The Management Revolution, identified the then “three Vs” of big data – volume, velocity and variety. Since then, an important additional factor has been added to the list – veracity.
Volume: “Scale of data”
The article above highlighted an astounding fact, that the amount of data created daily is doubling “every 40 months or so”. Perhaps this early realization helped focus most attention on the volume pillar of big data perpetuating a myth that big data and great volumes of data are one and the same.
At the same time, it is hard to ignore the pure volume of data that we collect. We create 2.5 quintillion bytes of data every day and 90% of the data in existence today was created within the past two years!
Volume does matter.
The pure number of data points today gives companies a much greater opportunity to work with many petabytes of data both from the internet and elsewhere. Walmart is building the world’ largest private cloud with a goal of processing over 2.5 petabytes of customer transaction data hourly. This is the “equivalent of 20 million filing cabinets’ worth of text.”
Variety: “Different forms of data”
An annual survey conducted for the past six years by NewVantage Partners asked industry leaders what they felt were the most important drivers for big data success. Sixty-nine percent felt that increased data variety was the most important factor. Twenty-five percent of the corporate executives chose volume and only 6 percent felt velocity was the key driver. The consensus was that big data success will be achieved by “integrating more sources of data, not bigger amounts.”
Many see tremendous opportunity with recent programs designed to capture traditional legacy data sources, largely ignored in the past. These “long tail” data sets have, until now, lived outside of the grasp of traditional data warehouses. Traditional processing has been aimed at numbers and variables, but not the words and pictures from text and documents.
There is a growing focus among firms on integrating this ” unstructured data”. The new-found ability to process unstructured data will broaden the analytical ability of big data by combining quantitative metrics and qualitative content.
Early on, the capture of social media and behavioral data was undertaken by a few obvious firms that directly benefited from the results, such as eBay and Facebook. However, as firms develop more encompassing big data programs, we will see more that are taking advantage of untapped social data opportunities.
For example, a growing number of firms are utilizing feedback from social data such as mobile device recommendations, which have the potential to yield immediate results. As more traditional companies make commitments to big data, it is likely that the variety of data sources selected for analysis and the capability to do so will expand.
Velocity: “Analysis of streaming data”
No matter how much volume and variety of data a company is able to “gather”, the velocity with which that company can make valuable use of that data is key. If data can’t be utilized in a timely fashion, a big data bottleneck is the result.
Take an example. Using real-time alerting, “Walmart was able to identify a particular Halloween novelty cookie” as popular in the vast majority of its stores. Unfortunately, there were two locations where the cookie wasn’t selling at all. With this information, Walmart was able to promptly investigate and learn that the cookies were not even on the shelves at these stores due to a rather simple stacking error.
Had Walmart discovered this problem following Halloween, the information’s value would have been worthless. A small example perhaps, but imagine how this type of problem plays out across the retail landscape every day. But data velocity isn’t just applicable to the retail industry. The principle applies to almost every business model and operation.
Veracity: “Uncertainty of data”
One in three corporate leaders doesn’t trust the information they use to make decisions. It is estimated that poor quality of data costs the U.S. economy about $3.1 trillion a year. It is therefore not surprising that data veracity is seen as one of the Vs with the greatest need for improvement. This poses a big challenge for big data.
As seen above, with such a tremendous amount of data available, and with that number continuing to grow exponentially, the ability to assure that the data is both relevant and of high quality becomes paramount. As companies invest more into cybersecurity technology, the focus must be carefully planned, and these funds must also be aimed at data veracity.
Conclusion
It is important to an understanding of the “four Vs” not to lose sight of the end game. How can big data improve help improve my product, cut my costs, satisfy my customers or earn the loyalty of my employees? In the end, the success of big