Water, Water Everywhere: Using ML and Game Theory To Win at Produced-Water Forecasting

Source: Getty Images.

You have access to this full article to experience the outstanding content available to SPE members and JPT subscribers.

To ensure continued access to JPT's content, please Sign In, JOIN SPE, or Subscribe to JPT

In the Bakken-Three Forks play of the Williston Basin, many oil wells that once produced some water have become water wells that produce some oil, as average oil production has flatlined while water production continues to increase.

A complex interplay of stepping out beyond the core to where water saturations are higher, upsizing completions, tighter spacing, and dealing with greater parent-child effects and potential ­changes in relative permeability is significantly increasing volumes of produced water. Similar situations have been occurring in other basins.

This is a serious threat for the US unconventional oil and gas industry, for which produced water has become a $34-billion industry and exposes operators to numerous operational, environmental, and economic risks.

Software developers are beginning to adapt advanced machine-learning (ML) methods that have proved successful in forecasting oil production and upgrading them with aspects of game theory to make quick and accurate work of understanding and predicting water production in unconventional plays. Novi Labs (Novi), an Austin-based software company focusing on unconventional shale, has developed a solution that is being applied successfully in the Williston Basin to help disentangle the complex interactions of larger completion designs, changing geology, and tighter spacing that contribute to increased water production. According to the firm’s model, fluid per foot ranks as the most important input variable for increasing water production, ranking above proppant per foot and geologic parameters.

Ted Cross, technical advisor at Novi and former senior geologist with ConocoPhillips, presented a paper that discussed the produced water challenge at the 2020 Unconventional Resources Technology Conference (URTeC).

“How often does an engineer dash off a simple water forecast, doing something like applying a flat water-to-oil ratio to their water prediction?” Cross asked rhetorically. In unconventional oil fields, water forecasting and pre-drill water predictions have not received attention commensurate with their economic importance. “Quick is still common,” he said.

Operators, regulators, and water-disposal companies often rely on simplistic water-cut ratios or basin-level extrapolations that ignore the complex interplay of geology, completions, and spacing decisions on water production. “They are starting to realize and incorporate the fact that water-to-oil ratios will evolve over a well’s life, usually increasing. But the critical thing is that the time they spend on water prediction is a tiny fraction of what they spend on oil, and the methods are less sophisticated and developed,” Cross said.

Part of the reason, he explained, is that in the early development of the Eagle Ford, Permian, and some core parts of the Williston, either the original wells didn’t produce large volumes of water or the existing infrastructure around conventional plays was sufficient to handle it. But with field evolution and changing completion designs, water has become a critical issue.

A Williston Tale

The Bakken-Three Forks play ­provides an example to study unconventional water production because of the high-quality, publicly available well-level oil-and-water-production data, fluid-and-proppant volumes, and stage counts, and the long history across multiple vintages of completion designs testing both the fringe and core of the play.

First commercial oil production from the Williston Basin began in 1951 and first production from the Bakken came in 1953. Approximately 450 million bbl of oil were produced from the Bakken and Three Forks formations between 2008 and 2013.

From 2012 through mid-2019, the average 90-day cumulative water production from Bakken and Three Forks wells increased nearly 400% (Fig. 1). Many factors contributed to this situation, including stepouts away from the play core, increasing completions intensity, and tighter ­spacing. The complexity and time-variant nature of average water production means that commonly applied methods such as gridded water cuts may be improved upon with more sophisticated approaches such as machine learning that can disentangle complex variable interactions.

Fig. 1—Water vs. oil production in the Bakken and Three Forks formations in North Dakota. Source: URTeC 2756.

The Cost of Produced Water

A major question in the industry right now is, what is the ultimate impact of water currently and going forward from a cost perspective? Reuters recently referred to produced water, once managed individually by producers but now a $34 billion-per-year business, as private equity’s “new black gold” in US shale.

“It’s easy to ignore water, but unexpectedly high production can damage well economics, or—in the worst cases—force shut in if disposal capacity is full,” Cross said, noting that all stakeholders are beginning to pay attention and work to understand and mitigate the risks.

Water plays a larger role in unconventional shale plays than it does in conventional fields, because it factors into both the initiation stage—as a mode of proppant and additive transport and base fluid for hydraulic fracturing—and the production stream—as produced water. This duality combined with the relative immaturity of water-related infrastructure in key shale basins brings up key questions from an operational sustainability standpoint.

Operators face several costs for produced water. The incremental cost of disposal, usually through injection into a reservoir with porosity in the subsurface, typically ranges from $0.25 to $1.50 per barrel of water, with the distance that trucks must travel to haul the water an important factor in that cost. The cost of building a gathering-and-transport structure to handle the water can be hundreds of millions of dollars of investment, depending on the scale. Overbuilding or underbuilding water-handling facilities and infrastructure can add significant costs. If no options for gathering, transporting, and disposal of water are available, play development can slow or even stop. Then, there is what Cross calls the “ESG-type” costs—those that impact a company’s environmental, social, and governance ratings and, thus, its investment potential. Produced water may be toxic and damaging; disposing of it in saltwater disposal wells has been shown to induce seismicity or cause drilling hazards such as higher-than-expected pressure or even a saltwater kick when drilling through a shallow disposal zone. There is also risk of damage to the gathering-and-transport system, such as a pipe rupture that could ruin a farmer’s field. Because produced water often contains heavy metals, radioactive material, and high total dissolved solids, regulators and local stakeholders have taken a keen interest in ensuring its proper handling and disposal.

One bit of relatively good news is that the current industry slowdown is presenting a “breather” for infrastructure to catch up to both oil and water uptake. “For new developments, including new areas or basins, this is critical,” said Cross.

Why an ML Model?

Forecasting water production is difficult in part because actual water production has been evolving over the life of unconventional fields. While oil forecasting has improved with advanced statistical methods including machine learning, water forecasting has lagged.

Reducing costs and mitigating risks require an understanding of how design choices impact water production. Software developers have spent several years developing ML techniques for forecasting oil production. Those techniques can be adapted to improve forecasting and understanding of water production and why it is increasing, so various stakeholders in handling and disposal can understand their options for impacting it, said Cross.

“We usually see that applying a flat water-to-oil ratio or some other basic curve on that ratio is about half as accurate as applying a very predictive ML model,” he said. “In some situations, the inaccuracy results in ­significantly underestimated water production that way exceeds existing infrastructure for a pad. There have been times when produced water has had to be put on trucks at costs as high as $4/bbl on the spot market and transported all the way across the basin to where disposal capacity was available,” he explained. And rarely, he continued, wells have had to be shut in until disposal capacity becomes available.

ML models identify analogs like a reservoir engineer would, and ­intelligently group and filter well sets at computerized speeds across many parameters simultaneously, learning as they go what drives production.

Williston is an interesting example for training a machine-learning model, Cross said, because it is ahead of other shale plays in terms of development history and has seen a wider range of well designs.

“You can look at Williston as representative of the future of some of the other plays. Some of the things that our models are picking up in the Williston are things operators in the Permian will need to think about as they move farther away from the core to develop Tier 2 and 3 acreage,” he said. “They will produce more water than they expected—more, even, than the parent wells drilled on those pieces of acreage.”

Novi built a multitarget, decision-tree-based machine-learning model and trained it using completions parameters, geology, and spacing parameters to predict water, gas, and oil production at 30-day increments for the first 2 years of a well’s production. The model uses a proprietary subsurface dataset built from publicly available completions information and well logs from the North Dakota Industrial Commission. The firm’s algorithms analyze and extract log data, conduct a principal component analysis, and use that information for model training. The models predict a 24-point vector going from initial production to 720 days, vs. a single-point estimate, which doesn’t capture the shape of the curve. When combined with Shapley Additive Explanations (SHAP) values, this workflow generates rock quality maps using machine learning. A model artifact called geoSHAP is produced to represent rock quality. The SHAP values are expressed in units of what the model is targeting—in this instance, barrels of water per day or per foot. And, said Cross, the process takes a day, not months.

“As models are building trees, they are looking for variables and cutoffs that are predicted to discriminate between higher and lower producers of water across an ensemble of trees. We use Shapley values to explain how the model is coming up with these predictions,” said Cross. The error rate is generally consistent across the life of the well.

Game Theory—A Competitive Advantage

Shapley values come out of game theory, which explains how machine-learning models come up with their decisions. The Shapley values explain how much each variable contributes to the model prediction and transport the machine learning into a database that engineers can use for sensitivities and variables.

Game theory originally focused on how a team came up with its decisions. In the past few years, the machine-learning community developed the math to adapt game-theory assessments to show how a model comes up with its decisions, based on the different information that it has been sorting through and analyzing. This has led to a major advancement in explaining machine-learning models.

“Imagine that the simplest model you can make for production from a well is to average all the production profiles. But as you learn more about each well, you can move that prediction away from the average by a certain number of barrels—water in this instance—by knowing things like amount of proppant, that can inform the prediction,” explained Cross.

Each variable that goes into the model has a different impact. The Shapley values are useful not only on the well level but also at regional scale, to answer such questions as “how does the impact of fluid loading on every well compare to proppant, or a geology feature, or well spacing?”

A common way to display SHAP values is in what Cross calls the SHAPnado chart. “It’s similar to a standard tornado chart, but with an extra layer of information,” he explained. Each dot represents a well, and each row represents a feature that goes into the model. The middle of the chart reflects zero impact. The right is positive; the left, negative.

“With our tree-based models, the simplest prediction you could make would be the average of the input data. With additional information, you can then move the prediction away from the average by some positive or negative barrels of water. SHAP values, then, are expressed in units of production and represent a delta away from the model set average,” said Cross (Fig. 2).

Fig. 2—This SHAPnado chart shows the impact of each variable on the water predictions. Red indicates high value for the variable. Position on the chart indicates relative importance. The spread of the cloud of wells shows data density. Moving down the chart, each variable moves the prediction up or down away from the average of the production dataset. The sum of the SHAP values is the difference between the prediction and the average. Source: URTeC 2756.

Learning From the Model

Paper URTeC 2756 shows how a machine-learning model upgraded with aspects of game theory is providing produced-water prediction information and lessons for stakeholders in the Williston Basin that will likely be applicable to other unconventional basins as well.

  • Fluid per foot ranks as the most important input variable for increasing water production, ranking above proppant per foot and geologic parameters (Fig. 3).
  • Completions size and design can impact the evolution of water cut through time, with high-intensity completions causing dramatic increases during the first 180 days, followed by decreasing impact out to 2 years of production. Operators interested in correctly sizing their facilities or tailoring completions to minimize water-disposal costs can use this method, as can midstream companies looking to scenario-test water production from lease to basin level.
  • Where rock is very water-saturated, propping open a fracture will produce more water. The reverse is true in oil-prone rock.
  • As fracture design has shifted from crosslinker to slickwater, the average fracture job now uses approximately four times more water than it did in 2013.
  • Stage spacing has much more muted impact than fluid or proppant loading, although interaction between spacing and fluid loading does exist, and proppant shows greater interaction with geology in large fracture designs.
Fig. 3—SHAP values show water loading contributes more to water prediction than does proppant loading—approximately 17 bbl/ft increase in 720-day cumulative water production as fluid loading goes from 0 to 20 bbl/ft. Source: URTeC 2756.


Will produced water volumes continue to increase? That will depend largely on operator choices, according to Cross. With operators stepping out beyond the core with tight well spacing, completions design has the potential for larger increases.

“Of course, oil production and completion cost considerations will continue to dominate design choices, but operators, water-handling companies, and regulators must consider the likelihood of further increases to produced water volumes,” Cross said.

For Further Reading

Cross, T., Sathaye, K., Darnell, K., Niederhut, D., and Crifasi, K. 2020. Predicting Water Production in the Williston Basin Using a Machine-Learning Model. Paper presented at the Unconventional Resources Technology Conference (URTeC), Virtual conference, 20–22 July. URTeC 2756.

Basu, S., Cross, T., and Skvortsov, S. 2019. Salt Water Disposal Modeling of Dakota Sand, Williston Basin, to Drive Drilling Decisions. Paper presented at the Unconventional Resources Technology Conference (URTeC), Denver, 22–24 July. URTeC 488.

Lundberg, S., Erion, G., and Lee, S. 2018, Consistent Individualized Feature Attribution For Tree Ensembles, arXiv, accessed 25 May 2020.

Sharma., A. and Thomasset, I. 2019. Data-Driven Approach To Quantify Oilfield Water Lifecycle and Economics in the Permian Basin. Unconventional Resources Technology Conference (URTeC), Denver, 22–24 July. URTeC 968.

Miller, J. 2019. Wastewater–Private Equity’s New Black Gold in U.S. Shale. Accessed 17 May 2020.

Bratvold, R. and Koch, F. 2011. Game Theory in the Oil and Gas Industry. The Way Ahead (published 15 January 2011).

Water, Water Everywhere: Using ML and Game Theory To Win at Produced-Water Forecasting

Judy Feder, Technology Editor

01 September 2020

Volume: 72 | Issue: 9



Don't miss out on the latest technology delivered to your email weekly.  Sign up for the JPT newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.