top of page

Dangerous Correlations

In an one of its latest pieces about digital strategy, McKinsey reminds readers of the benefits of big data: ‘Mining data greatly enhances the power of analytics, which leads directly to dramatically higher levels of automation—both of processes and, ultimately, of decisions.’ Amen! Some data fundamentalists go as far as proclaiming no less than the end of science. According to them, algorithms will make the data speak for itself and provide robust rules for prediction and action. Knowing ‘why’ will become superfluous.


Beyond the technical challenges of managing the 3 Vs of volume, velocity and variety which come with big data, one of the significant pitfalls which is faced by data analytics involves the treacherous relationship between correlation (to be understood as co-incidence) and causation. A website on ‘spurious correlations’ provides some entertaining illustrations: It shows for example that the total revenue generated by arcades correlates with computer science doctorates awarded in the US; or that the number of people who die by becoming tangled in their bedsheets almost perfectly correlates with per capita cheese consumption.


More seriously, a simple mathematical analysis shows that regressing a small set of randomly selected data leads to the identification of a decent explanatory model. In fact, in a paper entitled ‘The deluge of spurious correlations in big data’, the authors demonstrate that the probability to find misleading (i.e. random) correlations increases with the size of the available database. Worse, the majority of correlations are spurious when dealing with large sets of numbers. As is often heard and read in the world of big data, ’Raw data should be cooked with care’.


Edward O. Wilson, a scholar and renowned American biologist who won two Pulitzers, provides a powerful counterargument to those suggesting that data is everything in a WSJ article: ‘Many of the most successful scientists in the world today are mathematically no more than semiliterate. […] Pioneers in science only rarely make discoveries by extracting ideas from pure mathematics.’ No mathematical prowess will lead to productivity gains without good, common sense. Unfortunately, as noted by Voltaire, ‘Common sense is not so common’. Descartes would add that ‘[…] To be possessed of a vigorous mind is not enough; the prime requisite is rightly to apply it. The greatest minds, as they are capable of the highest excellencies, are open likewise to the greatest aberrations.


The relationship between the quantity of data and the quality of decisions is as flawed as the one that exists between correlation and causation. More data without more common sense is unlikely to bring much good to the world.

8 views0 comments

Recent Posts

See All

The World Is Getting Smaller

Much is said about the rise of protectionism, but global trade remains in good health. The latest statistics from the World Trade Organization (WTO) show that the value of world merchandise trade cove

Baboons "R" Us

When observing baboons, Robert Sapolsky, a renowned Standford University biologist and neurologist, noticed that these primates spend three hours per day quietly feeding themselves and the rest of the

The Trump Trade

Seeking to implement a ‘Trump trade’ is a hot topic for financial market strategists anticipating some discontinuity in market trends. But it is less easy than one might anticipate. In principle, two

bottom of page