AI/machine learning

Physics Unlocks Big Data for Asset-Heavy Industries

There is often an assumption that big data, together with machine learning, will solve whatever problems asset-heavy industries such as oil and gas face. This is not the case; big data alone isn’t enough. We need something else to solve these problems, and the answer lies in the world of physics.

bigdataassetheavy.jpg

We often hear talk about how machine learning (ML), artificial intelligence (AI), and hybrid ML will solve industrial recurring problems or events that cause significant deferrals or production losses. Plainly put, it could be money down the drain.

There is often an assumption that big data, together with ML, will solve whatever problems asset-heavy industries such as oil and gas face. This is not the case, simply because of a lack of events and a lack of human intervention in said events. Traditional ML approaches don’t work in a heavy-asset context, regardless of whether they’re applied in industrial processes where failure is not an option (e.g., pressure safety valves) or in production optimization (e.g., optimizing hydrocarbon production from a decades-old oil field). Big data alone isn’t enough for asset-heavy industries. We need something else to solve these problems, and the answer lies in the world of physics.

The Shortcomings of Traditional Machine Learning

While we might have time-series data spanning years or even decades, they’re not enough. That’s because the events that these data deal with rarely happen (a few times a year if at all), and, even when they do occur, one event may be drastically different from the next. Complicating things further, in many cases, human intervention modifies the natural progression of an event, rendering traditional ML approaches ineffective.

We have two common examples that we often encounter. The first involves big data and no (read: zero) events. There are many industrial processes where failure is simply not an option; were an event to occur, it would be a catastrophe. These processes typically involve safety-critical equipment. One example is pressure safety valves (PSVs), the last line of defense in averting major accidents in the petrochemical industry. In many cases, the equipment has been operating for years or decades (giving us our big data), but failure events are uncommon or, ideally, nonexistent. In a case that involves safety critical equipment with a PSV, the potential failure of the valve must be considered, and big data alone doesn’t tell the full story.

The second example involves production optimization. The problem with production optimization and big data is that they are practically contradictory. Let’s assume that we want to optimize the hydrocarbon production from an oil field that has been in operation for several decades. Here, again, we’re talking big data—massive data, even. The problem is that the oil field is constantly operating under transient conditions. So, we seldom, if ever, have data for optimal production conditions. Furthermore, as the reservoir is depleted, all its associated variables change (e.g., reservoir pressure and fluid composition). The older the data is, the less representative it is of the current conditions. And, of course, we have zero data for tomorrow’s operating conditions, which is exactly what production optimization is directed toward—moving the needle to operate in a region where we have not operated before. Having plenty of data in that region means that we are operating close to that optimal threshold and any improvements might not be that valuable.

Finding Value in Big Data for Asset-Heavy Industries

Innovative companies are now successfully combining ML algorithms with more-traditional methods to leverage big data. This is what is referred to sometimes as “hybrid ML.” I prefer to call it “physics-guided ML.” Why? Because we recognize that physics is highly relevant or effective when we need to solve complex problems. If we then apply traditional ML approaches on top of that, we can generate powerful solutions.

Physics simulators. Physics simulators are widely used in industrial applications. They used to be the standard go-to solution before ML and data-driven algorithms became an alternative. Simulators complement ML algorithms, filling data gaps, simulating parameters not available as sensor outputs, and enabling us to extrapolate to new operating conditions.

One example is the operational envelope of equipment fitted with PSVs. Using physics simulators, we obtained the full spectrum of conditions for safety-critical equipment and used it as input for hybrid ML algorithms. Suddenly the use case involving PSVs described earlier is solvable.

Physics-Guided, Data-Driven ML Algorithms. There are multiple ways to inject physics into ML algorithms. One is through physical principles. If you are an engineer in one of the traditional fields (e.g., civil, mechanical, or electrical), you have physical principles embedded in your subconscious: conservation of energy, mass, momentum (and a few others I can’t remember off the top of my head). These principles cannot be violated. Unfortunately, pure ML algorithms aren’t engineers; they don’t know that. We need to tell them. By adding physical principles to our solutions, we constrain the predicted outcome, giving the algorithms a much higher probability of success.

Another way to inject physics is through physical quantities. Transforming available data from multiple sensors into physical quantities has proven to be a powerful tool. For example, some industrial processes involve the separation of mixed substances (e.g., oil/water emulsions). Companies are investing heavily in research and technology to improve these processes constantly. Nevertheless, separation processes often fail to deliver the required quality, and identifying the root cause is a highly complex and time-consuming task for engineers working in processing plants. By enriching ML algorithms with physical quantities such as the formation and size distribution of droplets in separators or across choke points, we can increase the accuracy of our models significantly to identify the root cause. This information can help engineers take action to fix problems with the separation process and avoid production deferrals.

A third way to include physics is feature engineering or dimensional analysis. We often deal with sensor data to describe or understand physical phenomena. For example, ambient pressure and temperature can be used to describe weather conditions or patterns. The same applies to industrial phenomena. For example, pressure and temperature can be used to help describe the flow inside a pipe. If we use the knowledge of the underlying physics, it is possible to reduce the dimensionality of the data. A well-known example in the field of fluid dynamics is two-phase flow (air/water). We typically deal with about 8–12 parameters, but, by using feature engineering or dimensional analysis, we can reduce this to about four. With fewer degrees of freedom, the available data exposes patterns that can be used to train ML algorithms.

Physics-Guided ML: The Way Forward in Asset-Heavy Industries

We’re confident that big data can become Big Data for asset-heavy industries. But, for it to succeed, we must leverage proven concepts and tools, physics and chemistry principles, and pre-existing knowledge from subject-matter experts. That way, we can enrich the data available to us and feed the information to machine learning algorithms that solve complex problems and make heavy-asset industries more efficient, sustainable, and safer for workers.


zarruk.jpg
Gustavo A. Zarruk is a principal machine learning engineer at Cognite, responsible for driving the machine learning and data science development of solutions across multiple business verticals and integrating them with Cognite’s product core components and applications. He holds MS and PhD degrees in environmental fluid dynamics from Cornell University and has more than 20 years of international experience in academic and industrial research in the fields of multiphase flow (oil and gas), turbulence, ocean and coastal engineering, naval hydrodynamics, and hydrocarbon reservoirs. Zarruk has worked extensively in laboratory and field (onshore and offshore) experimental and data acquisition campaigns, providing him with a broad experience in data acquisition and measurement technology, sensors, time-series analysis, and data-quality assessment.