Data Is Not the New Oil
If you work in data science or a related field, you probably have heard this quote before: “Data is the new oil.”
The quote goes back to 2006 and is credited to mathematician Clive Humby, but it has recently picked up more steam after the Economist published a 2017 report titled “The World’s Most Valuable Resource Is no Longer Oil, but Data.”
This all sounds pretty exciting. But is it really true?
Equating data to oil might make sense at first glance, given the data-driven success of tech companies, but the analogy breaks down as soon as you dig a little deeper.
The thing with oil is, once an oil company find it in the ground, it knows more or less which steps it has to follow to turn that oil into profits: drill, extract, refine, sell. This is far from the reality that faced when dealing with data: When dealing with data, it is far from clear how exactly to turn that data into profits.
If you run a business, and you want to do anything with your data, the first thing you need to do is create the infrastructure required to store and query that data. Data does not live in spreadsheets.
Let’s assume that you run a travel booking website. Data is generated anytime someone searches, books a trip, clicks on an ad, or interacts in any other way with the content on the website. In order to capture all of that data, you need to hire data engineers and set up something like a Hadoop cluster that allows resilient data storage and rapid querying. This is a big investment you would need to make up front.
Making Sense of Data
Let’s assume you did all that. The database would tell you when which user clicked on what, what flights they booked, what hotels they booked, where they are traveling to, and when they are going. Maybe you even have user profiles that contain demographic information: where users live, what age they are, and so on. An incredibly rich dataset.
But what do you do with all that data?
Well, you need to hire data scientist to figure out ways to turn that data into business insights, which in turn might be leveraged for profits. That way, you will have a chance to get a return on your investment.
Enter Data Science
Data is incredibly noisy. Data scientist are trained to make sense of noisy data by looking at it under the following angle: What hypothesis can I make about the process in which the data is generated? How can I test that hypothesis against the data? What insights can I deduce from my hypothesis test?
Notice how the work flow starts with an idea about the business process and only then goes to the data. This is because the data is too noisy to provide intrinsic value. The data scientist’s work flow rarely starts with the data itself.
Don't miss out on the latest technology delivered to your email monthly. Sign up for the Data Science and Digital Engineering newsletter. If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.
19 May 2020
15 May 2020