Big Data and Analytics - Forward Leading
(Image credit: Wikimedia Commons)
Every facet of business today is data-led, shaped and assessed by a level of data collection and analysis that would have seemed unfathomable just a handful of years ago. While the most sophisticated computer systems in operation still struggle to approximate human thought, even everyday smartphones far outstrip our ability when it comes to analytics. 
And when you get to the highest level of data analysis — the titular big data being the industry that surrounds it — you run into the thorny problem of trying to place it all into context so that a human might be able to comprehend its significance. That’s where the 4 Vs of big data enter the equation: they’re the high-level dimensions that data scientists use to break everything down.
But what are the 4 Vs, and how can you apply them to your data analysis to better understand the meaning and value of your data and the conclusions you derive from it? Let’s find out.
Ever-escalating levels of cross-platform and cross-channel integration ensure that more data is available on any given day than on the day before. Consequently, data scientists aren’t limited to collecting data from just one source: they can collect it from numerous sources. Think about the potential of social platforms: drawing data not just from Facebook, but also from Twitter, Snapchat, Instagram, LinkedIn, Pinterest, YouTube, Twitch, Tumblr, and various others.
For big data, variety concerns the breadth of the types of data collected, going all the way from studies and sources that factor in just one data type (an Instagram post, for instance) to those that take many into account (tweets, Facebook updates, Pinterest pins, etc.). It’s an important dimension because it affects the significance of the inferences made from the data.
How to use this dimension for your data studies
When you’re collecting data to analyse for your business, think carefully about what you’re trying to learn from it. Are you trying to determine which social media channel drives the most conversions? Are you looking to see what people from certain demographics think of your brand? Not only do you need to decide how you’re going to pull in enough data to achieve your goal, but you also need to think about how platforms and channels differ — you’ll need to consider, for instance, that teenagers using LinkedIn are likely not directly comparable to teenagers on Snapchat.

Differing from regular old-fashioned data studies, today’s data science doesn’t seek to gather data over time then carry out a singular analysis. Its analysis is live and ever-changing, driven by constant streams of data. Velocity concerns the rate at which this data being generated, distributed, and collected. The more sensors are present on IoT-enabled devices, and the more people are using the internet, the higher the velocity of data analysis will be.
This dimension is so significant because the faster data can be acquired and processed, the more valuable it will be to begin with, and the longer it will retain its value — but the system you use to analyse it must be up to the task or be left behind. Consider what has happened in the fintech industry, with banks and investment firms spending vast sums on developing systems that can parse and act upon financial information fractionally faster than their rivals can, allowing them to make money through buying and selling stocks within less than a second.
How to use this dimension for your data studies
How pressing is your need for data analysis? If it simply cannot wait and must be live and in-depth, then so be it, but in many cases data analysis does not need to be live, or even imminent. Sometimes it’s more useful to steadily collect data and then look at it closely at a point when you truly factor everything in (something that data science tools struggle with). It’s better to take your time and get it right than to be swept along in hysteria and form some ill-advised ideas about how to proceed.

How much can you trust the quality and accuracy of the data you’re relying on to drive valuable conclusions? It depends on various factors, including where the data comes from, how it’s collected, and how it’s analysed. The veracity of your data concerns how reliable and significant it really is, and you need high-quality data. When analyzing Twitter data, for instance, the data should be extracted directly from the site (though the API or not), not through a third-party system for collecting tweets, because you can’t trust the latter.
Then there’s the data that’s collected accurately but doesn’t necessarily mean anything, such as data from poorly-designed surveys. Everyday analytics can easily get stuck on vanity or arbitrary metrics that don’t hold any significance, and big data is just as susceptible: while it’s hard for a computer to draw inaccurate conclusions, it’s easy for a person to fail to define the data range strictly enough, or to have mistaken assumptions about the quality of their data.
How to use this dimension for your data studies
This part is simple enough: be extremely careful about the data you collect! Vet it as thoroughly as you can before you do anything with it. Use native APIs wherever possible, run tests to ensure that everything is passing muster, and identify the metrics that really matter. Just because a given metric seems to be a great result, that doesn’t mean that it’s actually significant. If you’re not sure about the value of a metric, ignore or remove it.

Very simply, volume is how much data is being generated and collected all the time. It isn’t just the pace that has increased astoundingly, but also how much data there is. There are more than 2.2 billion active users on Facebook, many of them spending hours each day writing updates, liking posts, commenting on images, playing games, clicking on ads, and doing numerous other things that can be analysed. And that’s just one social media site.
Imagine the level of analysis that goes into perfecting something like Black Friday marketing — how much data must be sourced from ecommerce sites, social media conversation, forum posts, identified trends, surveys, and (of course) standard retailers, all to figure out the perfect price points for flatscreen TVs across one long weekend. Now think about the kind of volume high-end enterprises and governments must use for devising predictive models. We’re looking at absurd levels of data analysis, only made possible through supremely powerful computers.
How to use this dimension for your data studies
When you’re in the planning stages of a big data analysis campaign, know what kind of data volume you’re expecting, and take steps to ensure that your system can handle that much data. If you try to carry out a study but your system collapses under the weight of the traffic and data halfway through, you’ll end up with limited data at best, and complete campaign failure at worst.
To recap, the 4 Vs of big data are variety (how varied the data sources and types are), velocity (how quickly the data is being produced and collected), veracity (how accurate and valuable the data is), and volume (how much data there is).
Whether you’re aspiring to enterprise-level data science or merely dabbling in it at a lower level, these dimensions should prove critical for making sense of an immensely-complex campaign. Think about what you’re trying to achieve with your data analysis, and take steps accordingly.

Patrick Foster is a writer and ecommerce expert for Ecommerce Tips. He loves what big data can do, but doesn’t much like reading spreadsheets. Visit the blog, and check out the latest news on Twitter @myecommercetips.

It’s easy to get onboard and start benefitting instantly. Either enrol at a Summit or sign up for an annual membership.