How Much Science Is There in a Data Science Project?
Recently, the hype around artificial intelligence (AI) and machine learning (ML) caused several people to ask me how much of a project is actual machine learning. Based on man-hours spent on the project, I estimate that only about 5% of the effort is spent directly on data-science-related activities. The rest of the project is spent on four major topical areas:
- How to get data into and out of the data science software
- Information technology (IT) security and IT policies
- Presenting results, determining benefits, and attributing the earnings
- Project and change management
It is particularly the last item that consumes more than 50% of the total effort, in my experience. This is also the part where most errors are made, most particularly by not taking it seriously enough.
It is important to note that a technology necessarily relies on people and procedures to use the technology in a beneficial way. One cannot even try out a technology without trying out the full work flow.
So, when we are talking about an ML project, we are talking about a full project that is based on an ML method. While the ML method is pivotal to the whole project, the project will be mostly concerned with other things.
The final economic outcome will be largely dependent on the management aspects of the project. While the ML method must work, the uncertainty our industry has for these methods is actually founded only partially on a skepticism toward mathematics. It is mostly founded on a skepticism of being able to introduce the ecosystem that the ML method requires in order to provide the advertised benefit.
When operators talk about piloting ML technology, they are piloting new business models that rest on internal procedures functioning differently from before and people working in this new environment. They are talking about measuring benefits in a new way and, therefore, learning a lot about how business was done and could be done. The technology is the enabler and, therefore, the heart of the matter. It must work, and the science behind it must be examined.
As with the human heart, however, while it is important, it is but a small piece of the whole body and, by itself, is capable of nothing. The ecosystem of AI/ML or data science must be taken seriously in scheduling both effort and budget. Otherwise, these projects are doomed to fail even if the technology works fine.
Patrick Bangert is the founder and CEO of Algorithmica Technologies, a machine learning company specializing in oil and gas applications. He holds a PhD degree in applied mathematics from University College London. After a few research positions at Los Alamos National Laboratory and NASA’s Jet Propulsion Laboratory, Bangert became assistant professor of applied mathematics at Jacobs University Bremen in Germany. In 2005, he founded Algorithmica to bring the machine learning methods down from the ivory tower into practice. Bangert is publication chairman for SPE’s Digital Energy Technical Section.
Join the SPE Digital Energy Technical Section
If you are not yet a member of the Society for Petroleum Engineers, please join SPE. You will be a member of a vibrant community of more than 164,000 professionals in the oil and gas industry. SPE provides active value to its members through conferences, publications, webinars, and a huge industry network. Once you are an SPE member, you can join sections (geographically local groups) and technical sections (topical special interest groups). If you have an interest in all things digital, then please join the Digital Energy Technical Section (DETS), one of 16 SPE technical sections at present. You can join by going to your online membership profile and selecting “Professional Online Communities.” There you will have the opportunity to edit your selection. Below the technical communities, you will find the technical sections. Please select “Digital Energy” and be sure to click “Save” at the bottom. Welcome to DETS!
Don't miss out on the latest technology delivered to your email monthly. Sign up for the Data Science and Digital Engineering newsletter. If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.
19 May 2020
15 May 2020