The Hidden Risk of AI and Big Data

With recent advances in AI being enabled through access to so much big data and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?

September 25, 2019

By KDnuggets

Data Science and Digital Engineering

Big data is suddenly everywhere. From scarcity and difficulty in finding data (and information), we now have a deluge of data. In recent years, the amount of available data has been growing at an exponential pace. This, in turn, is made possible by the immense growth in the number of devices recording data as well as the connectivity between all these devices through the Internet of things. Everyone seems to be collecting, analyzing, making money from, and celebrating (or fearing) the powers of big data. By combining the power of modern computing, it promises to solve virtually any problem—just by crunching numbers.

But, can big data really deliver on all this hype? In some cases, yes; in others, maybe not. On the one hand, there is no doubt that big data has already had a critical impact in certain areas. For instance, almost every successful artificial intelligence solution involves some serious number crunching.

The first thing to note is that, although AI is currently very good at finding patterns and relationships within big data sets, it is still not very intelligent. Crunching the numbers can effectively identify and find subtle patterns in our data, but it cannot directly tell us which of those correlations are actually meaningful.

Correlation vs. Causation

We all know that correlation does not imply causation. However, the human mind is hardwired to look for patterns, and when we see lines sloping together and apparent patterns in our data, it is hard for us to resist the urge to assign a reason.

Statistically, however, we can’t make that leap. In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, because of either coincidence or the presence of a certain third unseen factor (referred to as a “common response variable,” “confounding factor,” or “lurking variable”).

The power and limits of correlations

With enough data, computing power, and statistical algorithms, patterns will be found. But are these patterns of any interest? Not all of them will be, as spurious patterns could easily outnumber the meaningful ones. Big data combined with algorithms can be an extremely useful tool when applied correctly to the right problems. However, no scientist thinks you can solve the problem by crunching data alone, no matter how powerful the statistical analysis. You should always start your analysis based on an underlying understanding of the problem you are trying to solve.

Read the full story here.

The Hidden Risk of AI and Big Data

With recent advances in AI being enabled through access to so much big data and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?

Correlation vs. Causation

The power and limits of correlations

Topics

Tags