Volume, Variety & Velocity
In the big data world, the data has a few properties if it is big data. These are termed in 3V’s of big data, namely Volume, Variety and Velocity. Let’s take a look at each in detail.
The price of storage per megabyte or gigabyte is drastically decreased over the last decade. The amount of data that would cost, say $1 now, might cost $0.01, just in a matter of ten years. Reliability in storage has also improved proportionally. Network storage architectures have helped improve that. This has resulted in a dramatic rise in capturing and storing data. The ‘Volume’ of data has increased. The amount of data stored is referred to as the volume of data.
Previously when data storage was not so cheap, the data stored was critical, meaning the only critical components of the data were stored for, e.g., Sales data, Transactional data…etc.
After abundant storage space availability, data that was not so critical to business but was generated and thrown away also was stored. For, e.g., server logs, user behavior … etc.
e.g., of server log:
126.96.36.199 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
With the revolution of IT in all the fields, the data storage demands for the Medical, I0T, Manufacture, SocialMedia …etc., have also increased, giving rise to increased volume in data storage.
The format in which data is available today is very wide. Data is available in text, image, audio, video or a combination of any of these formats. The ‘Variety,’ i.e., the diversity in representation of the data, refers to here.
One might argue that we do not need to store data in its original format. It can always be transformed and stored, getting rid of the variety, but in doing so, we lose information. For, e.g., let’s say there is a 5 min audio conversation. We can convert it to text and store it, but it will not be as good as the original conversion. The signals, such as the speaker’s emotions, tone and other signals that make sense in vocal communication, are lost. And since storage is getting cheaper and cheaper, it’s beneficial to store varied data.
Velocity is the rate at which data is generated. The data generated by big internet companies are at terabytes/day or petabytes/week. We need to account for those needs and have systems in place to store data at that velocity. Some data might be generated fast. Some might be generated slow. Depending upon the application in question, the data velocity might differ.
The 3 V’s govern data and it is a good practice to have those in mind while devising a solution.
For any questions and inquires visit us on https://thinkitive.com/