Data-Driven Management Strategy Can Reduce Environmental Effect of Production Plants

You have access to this full article to experience the outstanding content available to SPE members and JPT subscribers.

To ensure continued access to JPT's content, please Sign In, JOIN SPE, or Subscribe to JPT

This paper highlights the results of a test campaign for a tool designed to predict the short-term trends of energy-efficiency indices and optimal management of a production plant. The developed tool represents a step toward digital transformation of production plants through the integration of data analytics and machine-learning methodologies with expert domain knowledge.


The tool, called the Energy-­Efficiency Predicting and Optimizing Digital-System Tool, was developed as an in-house product for the purpose of helping operators select a series of corrective measures and optimized management actions for an oil and gas production plant. The entire procedure relies on the definition of several key performance indicators (KPIs). The KPIs represent a combination of process parameters useful for understanding, summarizing, and comparing the performance of entire plants or individual equipment units. Comparing the actual KPI vs. past behavior, or a target value, allows operators to understand how the plant and equipment are performing and whether their energy performance can be improved.

Specifically, the tool consists of a machine-learning-based forecasting model and a series of aggregated analytics. The machine-learning model, based on a gradient-boosting regression (GBR) algorithm, predicts the global KPI known as the Stationary-Combustion CO2 Emission Index, allowing operators to estimate future energy efficiency. Along with the forecast, the tool shows aggregated statistics for KPIs of individual equipment units.

Materials and Methods

Case Study. The producing field considered within the current project is onshore southern Europe. The central processing facility includes five production lines (trains) implemented in separated phases to treat the multiphase flow from the wells.

The multiphase flow comes from 27 producers and consists of three main phases: gas, oil, and water. The composition of the oil differs according to the formation in which a well has been drilled. This oil has different characteristics from other concessions in Europe, including H2S ranging from 0.5 to 1.5% mol, and CO2 ranging from 5 to 30% mol.

The final scope of the plant is to produce stabilized oil, treated gas, and liquid sulfur, which are then commercialized with the following strict specifications:

  • The oil is sent by means of a 100-km-long pipeline to a refinery.
  • The gas is sold to the national gas grid, managed by a third party.
  • The liquid sulfur, with a purity of 99.9%, is sold to the pharmaceutical and explosives industries.

These energy conversions generate CO2 emissions from the stationary combustion of fuel gas.

In accordance with the Paris Agreement charge to limit the rise in global temperature to below 2°C compared with preindustrial levels, reducing CO2 emissions from this asset is critical. Real-time monitoring and prediction can help meet this goal.

Data Set. The data set is composed of time series from sensors archived in what is known as the historian system and stored with the possibility to choose different sampling frequencies. To retain the greatest amount of information, the authors retrieved the data set with the minimum available sampling frequency of 5 minutes. The raw data set contained approximately 200,000 samples from different plant equipment units and covered a continuous time span of approximately 12 months.

Model-Development Work Flow

Two primary data-preparation steps have been established: a data-cleaning step to handle missing values and a resampling step.

Feature Engineering. This step is pivotal in every machine-learning project, enabling the maximum extraction of data for better forecasting performance. The multidisciplinary team of data scientists and process engineers has chosen the following transformational steps considered appropriate for the context of energy-consumption problems applied to time-series data.

Autoregressive Features Generation. A method to introduce the temporal relationship between each observation is the time-lagged feature extraction.

Seasonal Features Generation. Trend information resulting from seasonal characteristics of the signals can be exploited by adding directly information such as hour of the day and day of the week.

Exogenous Features Generation. In energy-related problems, energy consumption is often correlated to several exogenous variables. These values have been retrieved together with the same dates of the data set, averaged, and then added to the data set.

Feature Selection. The primary objective of feature selection is reducing the complexity of the model by removing irrelevant and redundant features in order to improve the prediction performance of the model; reducing data-set cardinality; and gaining a better understanding of the underlying process that generated the data.

Two feature-selection steps have been performed, before and after feature engineering. Both were made using the gradient-boosting features-selection algorithm. The algorithm, like other tree models, can execute feature scoring that indicates how useful or valuable each feature was in the construction of the decision trees within the model. With the algorithm trained on the data set, the most-scored features have been selected, and, consequently, the number of the features has been reduced from approximately 200 to approximately 40.

Data Training. During the explorative analysis, many algorithms have been tested. Ultimately, the best choice was gradient-boosting tree regression (GBR) as a tradeoff between accuracy and transparency. The boosting methodology is an ensemble technique in which the predictors are not made independently, but sequentially. In particular, GBR tries to fit the new predictor to the residual error made by the previous predictor and is characterized by using the gradient descent to minimize the loss function. The regression task faced in this work is to predict the consumption KPI value of the plant for the next 3 hours. The consumption is also a KPI defined as the sum of all asset energy consumption.

The final model was trained using the time series cross-validation technique. Evaluation metrics and the model dashboard are discussed in detail in the complete paper.

Model-Development Pipeline

Plot Results and Comments. The model was heavily used for 1 month during common operation in order to test its performance and user experience. The results were very promising and aligned with expectations. Thanks to model usage, the site engineer saved approximately 1% of total stationary-combustion CO2 emissions. The digital tool has highlighted anomalous energy consumption, providing recommendations related to the values of the operating parameters to reduce such consumption and, thus, CO2 emissions.

The average saving achieved during this preliminary test has been approximately 0.9% of the site’s total CO2 emission per day, with a peak savings of 1.3% of total CO2 emission per day.

The optimizations performed primarily involved steam reduction on gas sweetening and oil-stabilization reboilers. All sale-products specifications have been achieved while performing these optimization actions. Hence, the model supports the site team in their efforts to reduce energy consumption and preserve production targets.

The model also can be used to define the best setup after a change in operating conditions. Thus, it not only can be applied for energy savings but also as a support for defining and applying faster the best steam-flow rate to the various equipment in new production scenarios. Moreover, the tool helped quickly identify suboptimal consumption after a transition, allowing a faster restoration of the optimal steady-state configuration per the digital-system recommendations.

Evaluation Metrics. The performance test was carried out with the digital model online for 4 weeks in a collaboration between production engineers, site operators, and data scientists. The general work flow can be summarized in four main steps:

  • The model output was evaluated frequently to identify the reboilers with atypical consumption.
  • Once identified, reboiler consumption and main process parameters were monitored and evaluated to confirm the possibility of optimizing the selected reboiler per digital model advice (this activity continues even after the final step).
  • Control-room operators worked with shift supervisors to modify operative parameters optimizing energy consumption and maintaining unit performance per specifications.
  • Savings were calculated as a ratio between steam saved/product for each reboiler.

Savings were monitored and recorded continually. Table 1 shows the results achieved by the model during Week 3, when the model performed best. In the total raw savings identified at the bottom of the table, the hourly, daily, and yearly savings are calculated considering all four steps as ongoing, while the absolute savings is the actual savings, considering that the first three actions were ongoing since Week 2 and the action on L4 began at the end of the test.


This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 195615, “A Data-Driven Management Strategy To Reduce the Environmental Impact of Upstream Production Plants,” by Luca Cadei, SPE, Danilo Loffreno, Giuseppe Camarda, Marco Montini, Gianmarco Rossi, SPE, Piero Fier, Davide Lupica, Andrea Corneo, Lornzo Lancia, Diletta Milana, Marco Carrettoni, and Elisabetta Purlalli, Eni, and Francesco Carduccu and Gustavo Sophia, The Boston Consulting Group GAMMA, prepared for the 2019 SPE Norway One Day Seminar, Bergen, Norway, 14 May. The paper has not been peer reviewed.

Data-Driven Management Strategy Can Reduce Environmental Effect of Production Plants

01 August 2019

Volume: 71 | Issue: 8