The Case for Collaboration: Data Science Is Done Best When an Operator Works With a Data Scientist


In the past year, the number of presentations and papers submitted to SPE conferences and similar events in the oil and industry has sky-rocketed. Where 2–3 years ago, we could hardly find any papers related to machine learning, now a conference might have between 30 and 40% of its papers directly related to empirical model making, deployment, and use cases. What strikes the casual observer is that these projects fall into three rough categories:

  • An operating company analyzed its own data internally and pilots its results with some success.
  • A data-science company or university analyzed some data without an operator, celebrates its use case, and looks for an application.
  • An operator teamed up with a data-science company, created models, and deployed them in the field.

By now, a sufficient number of papers exists in each of these categories to draw some conclusions. See Fig. 1 for an incomplete list of some of the use cases discussed. In my opinion, most studies conducted by an operator alone have yielded some benefit but have used either substandard or outdated machine-learning methods. This means that money is being left on the table. It is also difficult to operationalize homemade models in environments such as R or Python to the standard control-system architecture of an operator.

Fig. 1—Some of the machine-learning use cases
presented in the past year at oil and gas conferences.

Studies conducted by a data-science company alone tend to lack domain knowledge and, thus, emphasize points that are often moot or unrealistic. The data used to model the phenomena often is synthetic because the authors do not have access to real empirical data. It is difficult, therefore, to trust the conclusions, and the models will need to go through substantial reality checks before anyone can adopt them.

The studies conducted through collaboration between an operator that knows the physical reality and a data-science company that knows the best machine-learning methods yield good practical results. This shows the difficulty of having one company possess expertise in both domains. The third component in the mix is the right software infrastructure to deploy the model in a control system so that it can actually deliver whatever benefit the theoretical analysis has found. While a custom architecture can be made, that can be time-consuming and error-prone.

A case in point: Artificial lift has been one of the main topics of the recent rise of machine learning in oil and gas. Many papers can be found on OnePetro and elsewhere that analyze the dynamometer card of a sucker-rod pump. Almost all of these papers fall into the first two categories mentioned previously. Model accuracies tend to be in the high 80% range, competitive with human beings. Apart from a handful of exceptions, these studies have remained offline studies and have not been operationalized anywhere despite the business case being made by many. The one exception to all this is a study done by Tatweer Petroleum, the operator of the Bahrain oil field, who worked in collaboration with data scientists, achieved a model accuracy of 99.9%, and actually deployed it in the field while reaping concrete business benefits. This is documented in papers SPE 194949 and SPE 195295.

Empirical evidence suggests that collaboration is worthwhile to obtain good models, achieve good business results, and operationalize the models obtained.


Don't miss out on the latest technology delivered to your email monthly.  Sign up for the Data Science and Digital Engineering newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.