Trustworthiness Casts a Shadow Over Data
In the expanding landscape of oilfield data analytics, analysts must trust their data. Some experts say that trust in the data is as important, if not more important, than the data’s accuracy.
“You know your gas gauge is bad, right?” said Jay Hollingsworth at a recent data workshop, referring to the perennially inaccurate fuel gauges on cars. “Everybody knows, ‘Oh, it can go a little bit below this, it can go a little above that.’ You know that that is not an accurate reading. But, when it gets to the E, you still pull in to the gas station. Even though it’s not high-quality data, you act on it. … Trusted data isn’t the same as high-quality data.”
Hollingsworth, the chief technology officer for Energistics, spoke at a recent petroleum-data workshop held by the Professional Petroleum Data Management Association (PPDM). Energistics is the organization responsible for WITSML (wellsite information transfer standard markup language), the ubiquitous petroleum-data-transfer protocol.
PPDM Chief Executive Officer Trudy Curtis opened the workshop with a succinct summation of Hollingsworth’s point: “It’s about trust.” She pointed to PPDM’s Rules Library, which says, “Trust is based on transparency, consistency, and professionalism. Untrusted data results in wasted resources and suspect results.”
A recent study, however, suggests that executive are having a hard time trusting their data, and that can lead to a lot of problems. The “Guardians of Trust” report put together by KPMG shows that only 38% of the executives surveyed say they have a high level of confidence in their customer insights and only a third say they trust the analytics their businesses generate. Nonetheless, most respondents said these insights are critical to decision-making.
“You need to treat your data just like any other asset,” said Matt Becker, managing director at Sullexis. “A data-quality strategy isn’t just about the data quality itself, but it’s your data governance, it’s your data management. It’s going to be how you can report and monitor that. It’s going to be things like what tools are you going to use. … You can’t just assume the tool is going to fix the problem.”
In fact, Hollingsworth said that the tools used can be a source of mistrust in data. “Providing people tools actually expresses all of their data problems,” Hollingsworth said. He made his point with an anecdote from his earlier years in the industry:
“Somebody called me into their office and said ‘your program is crap,’” Hollingsworth said. “Why is the program crap?” he asked. “‘It shows I have a bunch of wells in the Bay of Guinea, west of São Tomé and Príncipe. I don’t have any wells offshore in the middle of the Bay of Guinea.’ Well, that’s true. You have a bunch of data whose latitude is zero and longitude is zero, and they’re plotted correctly right in the middle of the Bay of Guinea.”
Mistrust in data also stems from the work necessary to make the data useful. Hollingsworth pointed to a 1991 article that claimed people spent 60% of their time looking for data. “The truth is, people do not now, nor have they ever, spent 60% of their time looking for data,” he said. “But this number comes up over and over again. … What they’re really doing is validating the data, fixing it up …, but they call it ‘looking for data.’”
“And this is the real challenge for data managers: Delivering data where the end-users don’t spend all of this time in all this data prep. Now, if you give a user this set of data and they have to spend a month getting it ready so they can start doing their job, are they going to trust your source of data in the future?”
Building trust in data begins at the data’s origin and continues all the way to the end-user. “You need to look at data all the way from the point it is coming in to your organization. So, systems or records, those inputs,” Becker said.
“Many times, people will go—whether they still are using a data warehouse or they’ve got some sort of data lake, or data swamp, which is usually more the case—and they’re trying to fix the data there and they’re ignoring or they don’t want to deal with the root of the problem, which is the data in the source system,” he said. “You’ve got to be willing to take the time to look at that and have those difficult conversations about what do we mean when we say a ‘spud date,’ what do we mean when we say ‘completion date.’”
“What gives us trust in data,” Hollingsworth said, “is knowing where it came from, knowing what has happened to it since it was measured, and having confidence that it hasn’t been tampered with along the way by some person with malicious intent. If I know those things, I can trust the source of data. … I trust the data because I’m aware of its heritage. I know where it came from.”
Hollingsworth made a distinction between data trustworthiness and usefulness. “Trust in a data source is kind of a different thing than ‘all of my data is very good quality.’”
“You have the data you have,” he said. “The question is, can I use this data to make a decision? … Can I trust this data to make a decision? That’s what’s really important.”
Don't miss out on the latest technology delivered to your email monthly. Sign up for the Data Science and Digital Engineering newsletter. If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.
19 May 2020
15 May 2020