TWA Crowdsourcing

Download: Contest Winning Poster on Data Wrangling Workflow To Standardize Drilling Data Cleaning Process

The poster titled “Workflow Description and Data Wrangling Procedures To Enable Easy Application of Data Analytics To Improve Drilling Operations” won the Poster Competition under the Academia category at the recently held SPE Gulf Coast Section Data Science Convention organized by the section’s Data Analytics Study Group.

The convention focused on the theme of transforming the upstream oil and gas industry with advanced data science solutions. The winning poster was selected among 40 submissions after being screened for the relevance of the topic to the conference theme and whether the novel idea being proposed would solve a standing pain-point existing within the industry.

“We did not want to restrict the submissions to just machine learning and analytics topics but were open to topics pertaining to data cleaning and adoption of cloud technologies that enable these machine learning algorithms,” said Anisha Kaul, a member of the convention committee who organized and screened the submitted abstracts. Judges Vikram Jayaram, Pioneer Natural Resources; Ana Krueger, Bluware; and Sribharath Kainkaryam, TGS, selected the winners based on these key criteria: Is the work reproducible at an industry level? How is it better than existing solutions? Do the methodology and the results  provide enough evidence of the project being technically sound? Commercially heavy content was discouraged and points were awarded for a well-structured presentation.

Click here to download the poster as a pdf.

Daniel Braga, the first author of the poster, shares the story behind the poster:

The idea for this poster came after I started working with drilling data and realized that the data cleaning part was taking much more time than estimated in our project timeline. During the project planning phase, we had our estimates in a Gantt chart, and at the time I thought “I keep reading that data cleaning and preparation is by far the most time-consuming part in any data science project. And I’ve heard also that drilling data quality is not good. These people should be over-reacting. I mean, besides some outlier cleaning and simple manipulations, we should be able to work with it easily in a couple of weeks.” Full 4 months had passed, and I was still cleaning the data.

The team involved in the project includes me, a mechanical engineer with around 3 years of experience in offshore drilling, and Branum Stephan from Legacy Directional Drilling, who has been working around the same time as me, but more focused in onshore directional drilling. We were backed by  expert advisors from Louisiana State University and Legacy Directional Drilling, who combined have more than 100 years of drilling experience. The point we wanted to show is that you don’t need a floor full of data scientists in order to do applied data science. Specially, if they will have to be trained in the specific domains that we operate in such as drilling, reservoir, and production, which can take a lot of time and effort. Of course, their knowledge can be very helpful, and I am sure that it would take much less time to do what we did here if we had a data scientist in our team. But in my opinion, the biggest bottleneck in our process was to identify what was wrong with the drilling data in the first place. We read a lot of papers, but most of them focus on the final – and the most interesting part – the analysis.

So, after the frustration of cleaning nonsense values like 100,000 gpm of pump flow rate, or understanding “why the hole depth is increasing if the bit depth is decreasing?” we decided to publish our “recipe” for cleaning drilling data. We are not saying that this is the only way of doing it, neither it is the best. But at least, if a petroleum engineer working for a company has drilling data in his hands and wants to get some insights from it, we think that he/she should be able to do it quickly. At the end of the day, most of the data problems are the same, no matter the electronic drilling recorder model you have.

By publishing our workflow, we expect that people from the industry will read it, use it, or even better, improve it. Ideally this would help standardize the drilling data cleaning process, if this gets backed by our industry experts. After I wrote the poster, I saw the SPE paper authored by Paul Pastusek from ExxonMobil and 13 other experts saying the same thing, that we have to create open source models for repetitive work like data cleaning. The recognition as the winning poster at the SPE Gulf Coast Section Data Science Convention tells us that initiatives like this one are welcomed.

I would like to express my gratitude to Legacy Directional Drilling for their support on this project and permission to publish our work. If you would like to talk about it, feel free to contact me at I would love to hear your feedback.

Daniel Braga is a master’s student at Louisiana State University. His work interest is in the application of data science to drilling operations, and upon his graduation in 2019, Braga expects to apply his data science skills to help companies drill wells more efficiently. An active SPE member, Braga is the recipient of the STAR Scholarship Award in 2012 and participated in the Latin American and Caribbean Region Student Paper Contest in 2014. Braga holds a BS in mechanical engineering from the University of Campinas.


Stay Connected

Don't miss our latest content, delivered to your inbox monthly. Sign up for the TWA newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.