The Book of Why: Chapter 2

This post continues the series of posts on Judea Pearl and Dana Mackenzie’s “The Book of Why.” This post covers Chapter 2: From Buccaneers to Guinea Pigs: The Genesis of Causal Inference.

Francis Galton

Galton’s development of regression arose from his interest in predicting the inheritance of traits between human generations. He first interpreted the patterns he found as causal. He eventually weakened his causal interpretation of regression to the mean by noting it occurred mathematically regardless of whether there was a cause or not.

Karl Pearson

Karl Pearson, Galton’s disciple, extended Galton’s work by further focusing on correlation rather than causation. Pearson subscribed to a positivist philosophy that viewed the universe as a product of the human mind. Science, then, was incapable of observing the world absent the human mind and was instead a description of human thought about the world. Causation “as an objective process that happens in the world outside the human brain” is not compatiable with the positivist philosophy (p. 67). Patterns of thought can be described by correlation, though.

Galton passed away and left money to establish a professorship in biometrics (statistics) at University College London, conditional on Pearson being the first holder. From that position of power, Pearson, who sounds like a controlling, domineering personality, led the development of the field of statistics for several decades, including supervising assistants like George Udny Yule.

Despite Pearson’s enthsuiasm for correlation, he did right papers mentioning spurious correlation and admitted it was easy to find examples of silly correlations

Udny Yule

Yule eventually broke with Pearson’s disdain for causal explanation after studying whether providing assistance to the poor in their own homes or in poorhouses affected poverty in London. The data showed districts with more at-home assistance were poorer, and Yule suspected the correlation was spurious because such districts might have more elderly people, who tended to be poor. But he then compared districts with similar proportions of elderly residents and found the same correlation. From this he concluded that higher poverty in such districts was due to more at-home poverty releif. But in a footnote, he hedged: “Strickly speaking, for ‘due to’ read ‘associated with.'”

Sewall Wright

Wright completed a doctorate in genetics in 1915 and took a US government job managing resaerch guinea pigs. From that position, he built a career theorizing about evolution, differing from Darwin by arguing evolution could happen in rapid bursts rather than gradually.

He worked on explaining the inheritance of coat color in guinea pigs, which did not follow Mendelian inheritance rules. Even highly inbred lines never produced controllable coat color, contradicting the Mendellian rule that a trait should become “fixed” after multiple generations of inbreeding.

Wright theorized that genetics were not the only determinant of coat color. He thought developmental factors might play a role, and he developed some mathematical theory and graphical representations of how genes and developmental factors interacted to produce coat color. These graphics were the roots of today’s causal path diagrams. Wright published a paper in 1920 with what was probably the first causal path diagram to appear in the scientific literature.

Wright’s diagram married two mathematical languages that had been separate: qualitative arrow information with quantitative data information. Wright’s work also separated causation and correlation into separate constructs. Prior to Wright, causation had been thought a special case of correlation = 1.

Unfortunately, Wright’s work went unnoticed in academia for decades until sociologists and then economists rediscovered it and developed similar ideas as structural equation modeling (SEM) and simultaneous equation modeling. However, these methods tend to obscure the scientific causal logic Wright emphasized behind automated procedures for estimating path coefficients.

One source of resistance to Wright’s diagrams might have been how they highlight the subjectivity of objectivity. Path diagrams encode assumptions about causal processes that are then used to make claims about how processes work. Those who favor presenting objectivity as outside the human mind might not have been interested in path diagrams that undermine the image of objectivity.

However, in the past few decades, even the most objectivity-minded fields have started to embrace approaches that make subjective assumptions explicit, like Bayesian analysis.