top of page

Dangerous Correlations

In an one of its latest pieces about digital strategy, McKinsey reminds readers of the benefits of big data: ‘Mining data greatly enhances the power of analytics, which leads directly to dramatically higher levels of automation—both of processes and, ultimately, of decisions.’ Amen! Some data fundamentalists go as far as proclaiming no less than the end of science. According to them, algorithms will make the data speak for itself and provide robust rules for prediction and action. Knowing ‘why’ will become superfluous.


Beyond the technical challenges of managing the 3 Vs of volume, velocity and variety which come with big data, one of the significant pitfalls which is faced by data analytics involves the treacherous relationship between correlation (to be understood as co-incidence) and causation. A website on ‘spurious correlations’ provides some entertaining illustrations: It shows for example that the total revenue generated by arcades correlates with computer science doctorates awarded in the US; or that the number of people who die by becoming tangled in their bedsheets almost perfectly correlates with per capita cheese consumption.


More seriously, a simple mathematical analysis shows that regressing a small set of randomly selected data leads to the identification of a decent explanatory model. In fact, in a paper entitled ‘The deluge of spurious correlations in big data’, the authors demonstrate that the probability to find misleading (i.e. random) correlations increases with the size of the available database. Worse, the majority of correlations are spurious when dealing with large sets of numbers. As is often heard and read in the world of big data, ’Raw data should be cooked with care’.


Edward O. Wilson, a scholar and renowned American biologist who won two Pulitzers, provides a powerful counterargument to those suggesting that data is everything in a WSJ article: ‘Many of the most successful scientists in the world today are mathematically no more than semiliterate. […] Pioneers in science only rarely make discoveries by extracting ideas from pure mathematics.’ No mathematical prowess will lead to productivity gains without good, common sense. Unfortunately, as noted by Voltaire, ‘Common sense is not so common’. Descartes would add that ‘[…] To be possessed of a vigorous mind is not enough; the prime requisite is rightly to apply it. The greatest minds, as they are capable of the highest excellencies, are open likewise to the greatest aberrations.


The relationship between the quantity of data and the quality of decisions is as flawed as the one that exists between correlation and causation. More data without more common sense is unlikely to bring much good to the world.

8 views0 comments

Recent Posts

See All

Commodification

For over a century, capitalism has sought to optimize resource allocation by turning every asset or liability, whether physical or intangible, into a tradeable good. This process has extended even to

The Bridge

In musical composition, a bridge is a distinct song section that serves to contrast and enhance the otherwise welcomed repetitive structure of verses and choruses. This element, sometimes referred to

The Great Comeback

Asset managers seeking to invest in the retooling of the global economy to facilitate the energy, mobility, or food transition have had many opportunities to buy into assets with high growth prospects

Comments


bottom of page