Data Management Evolves, But Challenges Persist


This article discusses the role of data management in the context of exploration and production (E&P). From the 1990s, when building centralized databases was the mainstream, to the end of the second decade of the 2000s, new perspectives in data management have emerged, but some challenges and expectations remained the same.

For this literature review, articles were selected that effectively were case studies or involved surveys and interviews with experts from companies related to E&P. In doing so, it was possible to analyze the discourse of the data managers, geologists, and geophysicists and understand the trends and dominant ideas that were present over the years. 

Main Topics Covered by Data Management Departments Over the Years

In 1997, interviews conducted with three oil companies demonstrated the kinds of activities expected from data-management teams (Beham et al. 1997). From the perspective of the Venezuelan company PDVSA, data management had a few main tasks: the creation of a common database; data migration (from one system to another or from one database to another); data validation, which can be understood as data quality; and putting in place a work flow to capture, load, and check the data and update interpretations of it. In addition, one of the interviewees explained that it was vital to maintain the role known as “data translators,” namely, employees able to speak the languages of both data management and geoscience.

According to the point of view of the BP employees interviewed, structuring data was an important role of data management teams. They recognized, for example, maps and reservoir simulation as unstructured data. This kind of data was considered small in volume and very valuable but still a challenge to store and retrieve. Another task expected of such teams was improving the “rapid turnaround of data interpretation” in order to make the interpreted data available for the next project. The respondents also pointed out that problems related to multiple databases, the diversity of data formats, and putting in place integration and manipulation processes should be priorities. The second line of activities that is related, at least partially, to data management is the administration of specialist applications. It comprises mapping work flow and supportive applications, suggesting when some applications should be discontinued in order to avoid redundancy, and building the links between these applications. Data quality, again, was a hot topic. According to one of the interviewees, “What adds value is information about uncertainties inherent in the data, so that you know what are high-quality data and what are of lesser certainty.” Therefore, some solution to rationalize different types and qualities of data was mentioned because of the effect this would have on the overall reservoir model.

From the point of view of Conoco (Beham et al. 1997), the construction of a central database emerged as an important initiative of data management. As expected, quality control was mentioned as a challenge and problems with data standards also were mentioned. Another task was related to the movement of data from the centralized database to the project database and how to ensure that the updates recorded in the project database would be propagated to the central database. It is clear that, at the beginning of 1990s, a massive part of the data was not in digital form. Therefore, the biggest challenge for data management teams was the development of digital databases and design processes, procedures and policies to guarantee the integration of the new technology with work flows.

The concern with interpreted data is very clear. The interviewees said that, even in a context of outsourcing (adopted by BP), if the data management function were closer to an interpretation role, then it would become more of a core function. To clarify, the interviewee gave an example of someone called a geodata specialist, who would be part of the exploration team and able to set up projects, generate maps, and load data. Interestingly, this role was part of the exploration process. Going further, another interviewee identified a lack of capture interpretations. By this, they meant that there was no explanation (documents) regarding interpretation criteria and, consequently, nobody could reconstruct an interpretation. Also interesting is that, 22 years ago, the data were already recognized as not just a resource but also evidence. Finally, from Conoco’s perspective, the vital task of data management was loading data and making it available for a project, with metadata attached in a short period of time.

In 2004, an article about the Kuwait Oil Company (KOC) depicted the main activities of the data management function. In this case study, the company dealt with 50 years of legacy data (Harrison and Safar 2004). In addition, the implemented solutions needed to be sustainable to ensure current data flow. The data journey started in 1995 when the company decided to adopt a centralized database. In 1998, it began migrating all of the data siloed in disparate databases to the centralized one.

At the same time, the company figured out the necessity of constructing stable mechanisms and processes to ensure data capture and storage for the long term and not only for the duration of the project of loading and archiving the legacy data. Therefore, a data management department was established. The main tasks of the data management in this context were

  • Constructing the centralized database
  • Identifying the legacy data, validating it (data quality), and loading it into the new database
  • Standardizing the names of objects (e.g., wells, fields, reservoirs)
  • Creating standardized forms for data entry and retrieval
  • Formulating data-flow processes recognizing each data class
  • Carrying out data modeling in order to plan the customization of the database model
  • Leading the construction of a solution for retrieving data and ensuring access control
  • Guaranteeing connectivity between databases and applications

The data management model of KOC was based on a solid foundation of formal mechanisms for planning, capturing, retrieving and using the data. According to Harrison and Safar (2004), “The main challenges today are the human aspects of data management, the proactive sustenance of inflow of new data, and the transformation of available data into useful information and knowledge in support of the company’s goals.”

In 2011, a roundtable meeting promoted by Common Data Access, a subsidiary of Oil & Gas UK, and involving six oil and gas organizations (Apache, Dana Petroleum, Shell, BP, Fairfield Energy, and Total) was gathered to discuss the challenges and opportunities in petroleum data management (Hawtin and Lecore 2011). The participants in the discussion agreed that a necessity was minimizing the time interpreters spent chasing data. This would also affect engineers working in the real-time domain. This left no question that this was an issue for data management teams.

Another topic covered by the panelists was the connection between the success of the business, new technological solutions, and data management. According to Dave Kemshell with Shell, the partnership between the data management function and the business function was vital. For Tom Ruijgrovk of Total, it was expected that data management would raise issues and propose new solutions. However, Ruijgrovk also pointed out that data management staff tended to come from a librarian background and that it could be challenging to promote a new mindset among them.

From a different perspective, the meeting recognized that the integration of data management and business should happen around the leadership table. Kemshell said, “You want the data management discipline to have that strong relationship with the business but also to have a strong relationship back with their function.” He said he also saw data management forums as an important way to solve data issues in an interdisciplinary group. Simon Hendry with BP said that, after a period of outsourcing data management, the company brought it back in-house and gave the function the authority to tell the company what was necessary to improve efficiency and to operate in a new license. This set of arguments, together with the belief that data management needs to be part of the exploration work flow, supports the idea, expressed by the panelists, that outsourcing is not a good practice or, at least, should be planned carefully before being applied in some areas of data management.

The necessity of storing the data produced by interpreters was also pointed out as being key. The capture of this interpreted data (e.g., horizons, markers, faults, geological models, reservoir models, analysis) with added value was recognized as having to be a combined effort between the explorationists and the data manager. The link between static and dynamic data, and different ways of dealing with such data, also was said to be challenging. The data management function were expected to guarantee the reuse of the data and propose where the data should be stored and naming conventions. Moreover, the importance of controls and audit in data were mentioned, for example, in terms of the retention period (that could be different for raw and interpreted data) or if certain data could not be found. It was also mentioned that the regulatory regime would influence data management, at least in terms of the time for which the operator would have to keep the data stored.

Regarding the data management model, at Total, the model took the form of a transverse domain. This means that geology, geophysics, the reservoir and information technology (IT) were all integrated. At Shell, they had work streams that covered all technical data. When a specific problem occurred, the issue would be addressed by a particular work stream (e.g., development or well or project engineering). BP and Total had data management as a separate discipline, and, for Total, the data management teams were composed of geoscientists and IT professionals.

Finally, all of the companies recognized the importance of running some kind of checklist, especially in subsurface models, to determine the quality and level of documentation (metadata) and whether it was being stored correctly. The necessity of closing a project with all data completed, validated, and ready to be reused, at any time, was considered highly important. The data management function, they felt, should also be involved in documentation management, version control, and making the data available and visible.

Since 2010, several publications have established a relationship between data management and data analytics in the petroleum industry (Feblowitz 2013; Holdaway 2014; Perrons and Jensen, 2015). The effect of data management in artificial intelligence and predictive analytics (AIPA) in the context of E&P was demonstrated by Bravo et al. (2014). In a survey, 72% of the respondents said that data management and integration was a challenge in data analysis. Moreover, the top five items identified as major challenges were a lack of integration between work processes, a lot of manually driven tasks, large computation times, large volumes of data, and data management. Therefore, a new client (the data scientist) and demands emerged because poor data management would affect not only the flow of geological simulation and modeling but also the potential represented by AIPA.

In 2018, a case study (Mikalsen and Monteiro 2018) was conducted in an international company’s exploration unit in order to discuss data handling in an exploration context. Twenty-five interviews were conducted, with interviewees including central data managers, project data managers, explorationists, process owners, and IT management. Moreover, the authors made direct observations on the basis of accessing documents, being part of presentations of systems and processes, and observing explorationists’ and data managers’ interactions.

The main activities of the data managers (understood as the data management function) were divided into five streams: keeping file identifiers synchronized, determining ownership of data, developing tools for monitoring the state of the data, developing contingent procedures, and negotiating data needs (Mikalsen and Monteiro 2018). The first task was related, in essence, to “divergent identifiers (IDs) for the same data (e.g., well data) across the databases.” Consequently, in order to identify the same data across dispersed databases, data managers must be skilled in using the data IDs across systems. The root cause of problems here is when two or more applications are able (by mistake) to record the same data in the corporate database or different databases. This is very dangerous because technical reports can be filled with different data when, in reality, the data should be the same. One of the ways to fix this is called data migration, which is when all records are centralized in one database.

The second activity can be understood as compliance. In order to share data with companies that are part of a consortium, the data manager has to investigate the historical access rights of the companies. According to Mikalsen and Monteiro (2018), “a license is granted by the government to a consortium of companies working together to share the costs and risks. To determine who can legally access the new survey, [a given company] had to determine if the partners in the new license had legal access rights to the existing surveys.” Similar issues are discussed in terms of the ownership of samples. Some oil and gas authorities claim the ownership of all samples collected by operators from wells, outcrops, or the seabed. In some cases, the ownership of the products generated from these samples (e.g., thin sections) can be controversial, and the data management function will have to ask for guidance from the regulatory body.

The third task was the responsibility data managers had for archiving quality-controlled interpretations. In order to know which interpretation is the official one, the data manager has to “bother” the explorationist (Mikalsen and Monteiro 2018) to obtain the answer. Here cab be a source of tension between data managers and explorationists. A question that arises is why the explorationists cannot properly archive the official version of the interpreted data themselves. In addition to faults, horizons, and picks, archiving maps, models, tables, and marks is necessary to make them available for reuse. In this model, a data manager is assigned the task because the interpreter is likely to be very busy. Nevertheless, getting an answer regarding official data can be difficult. The irony in this scenario, on the other hand, is that the same explorationists who do not facilitate the storage of the data they generate will often be the ones requiring the availability of other information needed as inputs to their processes. Here, the data management function must be able to mediate among different profiles of explorationists to ensure the completeness of the data lifecycle. Coming back to the case described by Mikalsen and Monteiro (2018), as the responsibility for archiving the interpreted data lay with the data managers, they also had to develop a tool to control the status of the data: “The development of this tool shows how data management becomes increasingly entangled with exploration management and how data management does not only include using technology in innovative ways but also in developing new technology to solve relevant problems.”

The fourth task was described as the set of processes required to load data into a new project. These processes involved accessing databases, searching for data in working projects, filtering database results, and giving feedback to the explorationists (Mikalsen and Monteiro 2018). In this model, the project data manager would do more than just archive interpreted data but also would prepare the entire project for the explorationist. Another person, called the “central data manager,” would be responsible for maintaining the core database management system. The person in this role would support the project data manager. The outcome of the task would be to have all the data loaded and to ensure the basic quality of the data.

The last task identified as being performed by the data management function was called negotiating data needs. This basically involved finding data that were not visible to the explorationists (Mikalsen and Monteiro 2018). Therefore, the data managers, in this scenario, would work closely with the explorationist to find petrophysical data, structural maps, samples, backstripping analyses, and stratigraphic models, for example, that were not being stored properly. According to Mikalsen and Monteiro (2018), “Far short of an automatized system that finds and displays relevant data, the data manager must both know the technology needed to provide an overview and use it to know as much about the data in an area as possible.”           

Discussion and Conclusions

Analyzing the literature reveals that the mission of the data management function has evolved over time but not dramatically. The main activities and expectations observed in the 1900s still exist today. Reflecting on what is new and what remains the same reveals the possibility of starting with a new client of data management: the data scientist. In the 1990s, the necessity of putting data management into the exploration work flow was clear so that explorationists demanded a lot from the data managers. In the first decade of the 2000s, data analytics showed the fragilities of petroleum data management and the data scientists started not only to make demands on data managers but also to take on some of their work (e.g., quality control). Despite the existence of this new client, it identifying from the literature whether new data management tasks will emerge from this interaction is not possible.

The data management function is responsible for constructing databases. This is mostly an IT task that should be performed in partnership with a business area. New technologies and approaches are changing the scenario. The task is the same (constructing and maintaining databases), but data lakes, the cloud, and other forms of hybrid data storage are bringing new challenges that will affect the current standards, procedures, and policies.

The necessity of attending to the data needs of explorationists remains the same. Two models can be identified by analyzing the literature. In the first, the data manager works to keep the database healthy by

  • Controlling the quality of the data
  • Ensuring that data are flowing from the source to the database
  • Designing and maintaining the work flow of unstructured data and metadata
  • Working to avoid the duplication of data
  • Structuring the data
  • Auditing and putting controls (dashboards) in place in order to guarantee the data lifecycle

In this approach, the data managers are concerned with guaranteeing the foundation of data management (as in the KOC example) and the data consumers will search, find, and archive the data themselves.

In the second model, the data management function provides technical assistance to the explorationists and the data managers work with them closely. The tasks involve searching for data, preparing for projects, and archiving the interpreted data. Clearly, these different models require data managers to have different skills. Nevertheless, both models expect data managers to have the ability to mediate between IT (the custodians of the data) and the domain specialists.    

Activities related to the development of systems and compliance were cited, but not as core tasks of the data management function. Here, one could argue that the contribution of data management, in terms of the management of specialist applications, can be peripheral in the sense of pointing out whether applications are recording the same data in the database or in the case of an application overlapping with another in supporting the same work process. On the other hand, one could also argue that data management should be able to develop technologies in order to meet data needs and ensure some level of control. In terms of compliance, the data management function must construct—and be part of decisions regarding the understanding of—the data that should be disclosed to authorities and the data that should be shared (or not) with other companies in a consortium and to contribute in cases of divestment when data (including samples) must be transferred to another company. Therefore, for both activities, data management should contribute to but not perform the daily operations involved.   


Beham, R., Brown, A., Mottershead C., et al. 1997. Changing the Shape of E&P Data Management. Oilfield Review 9 (2), pp. 21–33.

Bravo, C. E., Saputelli, L., Rivas, et al. 2014. State of the Art of Artificial Intelligence and Predictive Analytics in the E&P Industry: A Technology Survey. SPE J. 19 (04), 547–563. SPE-150314-PA.

Feblowitz, J., 2013. Analytics in Oil and Gas: The Big Deal About Big Data. Proc., SPE Digital Energy Conference and Exhibition, The Woodlands, Texas, USA, 5–7 March, SPE-163717-MS.

Harrison, G.H., and Safar, F. 2004. Modern E&P Data Management in Kuwait Oil Company. J. of Pet. Sci. and Eng. 42 (2–4), pp.79–93.

Hawtin, S., and Lecore, D. 2011. The Business Value Case for Data Management—A Study. CDA and Schlumberger. (Accessed 19 May 2019).

Holdaway, K.R. 2014. Harness Oil and Gas Big Data With Analytics: Optimize Exploration and Production With Data-Driven Models. New York: John Wiley & Sons.

Mikalsen, M., and Monteiro, E., 2018. Data Handling in Knowledge Infrastructures: A Case Study From Oil Exploration. Proc. of the ACM on Human-Computer Interaction 2, p. 123.

Perrons, R.K., and Jensen, J.W. 2015. Data as an Asset: What the Oil and Gas Sector Can Learn From Other Industries about Big Data. Energy Policy 81, pp. 117–121.

Dean Pereira de Melo, SPE, has worked in geosciences since 2007, most recently as data steward for Petrobras, with experience in seismic interpretation, well-log analysis, and geochemistry data analysis. Recently, his role includes business analysis focusing on business process management, glossaries, conceptual data modeling, and data quality, all of them related to well operations and formation evaluation.


Don't miss out on the latest technology delivered to your email monthly.  Sign up for the Data Science and Digital Engineering newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.