The Book of Why: Chapter 1

This post continues my notes on Judea Pearl and Dana Mackenzie’s The Book of Why. This post covers Chapter 1: The Ladder of Causation.

The question of the book:

How can machines (and people) represent causal knowledge in a way that would enable them to access the necessary information swiftly, answer questions correctly, and do it with ease, as a three-year-old child can?

Page 37

Pearl begins the chapter discussing the book of Genesis story of the Garden of Eden and how, as he wrote his book Causality in the 1990s, he reread Genesis again and realized a crucial distinction he had missed in prior readings. The distinction is between facts–the answer to “what” questions–and causes–the answer to “why” questions. Pearl describes how God asked Adam and Eve “what” questions that Adam and Eve answered with causes, not facts.

Pearl claims three implications from the fact/cause distinction’s appearance in Genesis:

  1. Humans realized early in our evolution that the world is made up not only of facts but also causal explanations that connect facts together in cause-effect relationships.
  2. Causal explanations make up the bulk of human knowledge and should be the cornerstone of machine intelligence.
  3. “Our transition from processors of data to makers of explanations was not gradual; it was a leap that required an external push from an uncommon fruit.”

Rather than religion, evolutionary science also provides evidence for a “major unexplained transition.” Pearl claims in the past 50,000 years of human evolution, there is evidence for a cognitive revolution in which “humans acquired the ability to modify their environment and their own abilities at a dramatically faster rate.” Pearl’s evidence for this is that humans have created complex tools, and he refers to the transition through an engineering analogy: the “super-evolutionary speedup” in tool creation.

Evolution has endowed us with the ability to engineer our lives, a gift she has not bestowed on eagles and owls, and the question, again, is ‘Why?’ What computational facility did humans suddenly acquire that eagles did not?

Page 25

Pearl’s theory is that the transition occurred because humans leaped from fact-based to causal-based thinking, which enabled humans to make mental models of causes and effects and, consequently, plans.

The mental models humans suddenly could make are the causal diagrams Pearl believes should be “the computational core of the ‘causal inference engine'” at the heart of the causal revolution described in the Introduction.

Pearl proposes three nested levels of causation, forming the Ladder of Causation:

  1. Seeing
  2. Doing
  3. Imagining

Seeing nests within Doing, and both nest within Imagining. All three levels must be mastered for causal cognition.

Seeing involves collecting information from one’s environment. Doing involves predicting the outcomes of actions, given one’s understanding of the environment accumulated through Seeing. Imagining involves theorizing why actions produce outcomes and what to do when actions do not produce expected outcomes.

The ladder of causation starts with Seeing, where the outcomes are viewed through association. One rung higher is Doing, where outcomes are viewed through intervention. The final rung is Imagining. At the Imagining rung, outcomes are viewed through Counterfactual reasoning.

Statistics–and machine learning–occurs at the Seeing rung and deals with identifying associations between outcomes. Even the most advanced artificial intelligence deals only with associations, using observations about outcomes (Seeing) to calculate Associations between those outcomes.

Pearl argues machine learning needs to experience the same “super-evolutionary speedup” humans seem to have experienced at some point in their evolutionary history.

Moving up the the Intervention rung requires knowledge of different reasons for different associations. Each reason is a possible intervention into the association between observed outcomes, and outcomes will change in different ways depending on the intervention that occurs. Association thinking does not consider these different interventions, instead assuming that all interventions that produced the observed outcomes used to calculate associations will continue to happen in the future exactly as they did in the past.

The outcome effect of an intervention can be studied through experimental control. A question in this mode can be represented as P( outcome | do( intervention)). If no experiment is possible, interventions can be studied by specifying a causal model that substitutes for experimental control. A question in this mode can be represented as P(outcome | intervention), where the lack of the do() operator indicates lack of experimental control over the intervention; instead, the intervention happened outside the researcher’s control.

While Doing questions can answer questions of what will happen if an action is taken, they cannot answer why that intervention causes that outcome. Answering why questions requires the third rung of the ladder on which we Imagine why interventions lead to outcomes.

Because Pearl’s ultimate goal is machines acquiring causal knowledge, he turns to considering how causal knowledge can be represented in machine-readable form.

How can machines (and people) represent causal knowledge in a way that would enable them to access the necessary information swiftly, answer questions correctly, and do it with ease, as a three-year-old child can?

Page 37

Pearl answers this question with the causal diagram. He asserts that human brains use causal diagrams to store and recall association, intervention, and causal information.

Causal diagrams can encode probabilistic relationships between causes and outcomes, but the structure of the diagram itself also encodes causal information independent of probabilities. More strongly, Pearl claims “causation is not reducible to probabilities.”

To use a causal diagram:

  1. Translate the story under investigation into a diagram.
  2. Listen to the query.
  3. Modify the diagram according to whether the query is counterfactual, interventional, or associational.
  4. Use the modified diagram to compute the answer to the query.

The reason causation cannot be reduced to probabilities is that a probability-based causal approach does not distinguish association from intervention from causation. Because they cannot distinguish between them, they can only provide information about the most general of the three: association. And association, while sometimes trustworthy about causation, often gets causation wrong because it does not specify interventions and counterfactuals. The main problem is unobserved confounding in which an unobserved cause causes an association between two outcomes, but neither outcome causes the other. Probabilistic reasoning might conclude that one observed outcome causes the other, when that is wrong.

For example, when many people get sick with colds, the number of frozen pipes that burst in houses also increases. But people getting sick does not cause burst pipes, and burst pipes do not cause colds. Instead, there are unobserved causes of both outcomes that tend to make both outcomes increase around the same time of year: winter.


In Pearl’s second implication from the Genesis story machine intelligence comes into the discussion in a surprising and sudden way. There’s no argument for why, just a bald claim that machine intelligence should be based on causal explanations.

Pearl’s third implication from the Genesis story itself implies humans began as processors of facts to makers of causal explanations, and that transition required divine intervention through Satan’s convincing Eve and Adam to consume fruit from the Tree of Knowledge. This claim could use a counterfactual: why does Pearl believe that prior to consuming the forbidden fruit, Eve and Adam only thought in terms of facts and not explanations?

The ladder of causation seems to rest on an implicit theory of progress in which humans have moved from Seeing (Association) to Doing (Intervention) and now to Imagining (Counterfactuals). It’s not clear where this progress is going, or if there is an end state.

Pearl writes that the probabilistic approach is always vulnerable to confounding, because it’s impossible to know which confounds exist to control for them. Though Pearl does not address it, this weakness also seems to apply to causal diagrams: how to know which causes to include in the diagrams? Pearl does say that experimental control avoids the problem of confounding, such that P(outcome | do(intervention)) avoids confounding. But this avoids the question of whether causal diagrams, absent experimental control, improve on the probabilistic approach. Further, if the answer to that is no, then why do we need counterfactuals if experiments are the only method for causation?

The Book of Why: Introduction

This post continues my notes on Judea Pearl and Dana Mackenzie’s The Book of Why. This post covers “Introduction: Mind Over Data.”

If I could sum up the message of this book in one pithy phrase, it would be that you are smarter than your data. Data do not understand causes and effects; humans do.

Pearl positions his argument by stating that “data are profoundly dumb”, decades of statistical work assuming relationships could be understood from the data was misguided, and we need to discipline data with causal thinking. “Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why.”

Causal inference is a “new science” that has “changed the way we distinguish facts from fiction.” Pearl claims causal inference unifies all past approaches to separating fact from fiction into a single framework that is only twenty years old. The framework is “a simple mathematical language to articulate causal relationships.” Pearl claims this mathematical language is completely new to human history. Never before have humans developed a language in which they can mathematically communicate information about causal relationships. (They were close around the development of statistics, with the work of Sewall Wright in the 1920s, and others, but didn’t quite get there). Equations of the past specify relationships, but they do not encode causality.

The human mind is the center of causal inference, but we can program a computer to become an “artificial scientist” using the same causal inference as human minds have used for thousands of years. “Once we really understand the logic behind causal thinking, we could emulate it on modern computers.” (Pearl is a professor of computer science. It makes sense he takes the discussion of causal inference toward computing and data science.)

Pearl claims what has happened in the past two decades is that scientists discovered problems requiring a language that encodes causal relationships. Without this need, no language had been developed. Facing this need, scientists got to work creating the language as a tool to solve the new problem. “Scientific tools are developed to meet scientific needs.

He claims the book introduces a new “calculus of causation” consisting of two languages:

  1. Causal diagrams to express what we know
  2. Symbolic “language of queries” to express what we want to know

Causal diagrams describe the data generation process, “the cause-effect forces that operate in the environment and shape the data generated.”

The symbolic language of queries expresses the question we want to answer. For example, if we want to know whether multinational corporations’ foreign direct investments cause conflict in countries where investments are made, we can encode this in the symbolic language as P(C | do(FDI)), which can be read as, What is the probability (P) of conflict (C) if multinational firms are made to do foreign direct investment. The “made to do” component is crucial here, because use of do indicates control over which firms do and don’t engage in FDI. If there is no control over FDI and firms instead choose whether they engage in FDI, the symbolic language can communicate this as P(C | FDI). Absence of the do operator indicates absence of experimental control and possible presence of various selection effects that could confound causal analysis. P(C | do(FDI)) could be completely different than P(C | FDI), and the difference can be thought of as the difference between doing and seeing, respectively.

One of the greatest achievements of the calculus of causation is allowing researchers to approximate P(C | do(FDI)), which is extremely rare, fromP(C | FDI), which is extremely common.

We can approximate doing from seeing using counterfactual reasoning.

Causal inference engines are introduced. They take three inputs and produce three outputs. The inputs are:

  • Assumptions
  • Queries
  • Data

The outputs are

  • Decision: Can the query be answered given the causal model and assuming perfect data
  • Estimand: Mathematical formula to generate the answer from any hypothetical data
  • Estimate: The answer and some measure of the certainty of the answer

Pearl notes the importance of the causal model coming before data. He is a critic of current arguments in artificial intelligence that causality can come from data, rather than be specified in advance by theory.


Pearl’s book on counterfactual reasoning is a bit late, despite its claims to be the first to describe this “causal calculus.” Predecessors include:

  • Morgan, S. L., & Winship, C. (2007). Counterfactual and causal inference: Methods and principles for social research (Analytical Methods for Social Research) (1st ed.). Cambridge, England: Cambridge University Press.
  • Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.

There’s a bit of squeakiness in the counterfactual reasoning approach to causation that makes it seem a bit less of an advance over prior ways of thinking than Pearl portrays. The squeakiness arises from counterfactuals being imaginary. They are supposed to represent what would have happened in the absence of an action. For example, take a multinational corporation that engages in foreign direct investment. Observe whether the investment is associated with conflict in the country into which the investment is made. The counterfactual for this is whether conflict would have occurred if all else had happened exactly the same except the firm had not engaged in foreign direct investment.

This assumes the human mind can somehow know what the counterfactual would have been. But it can’t. We can never observe or confirm counterfactuals. They are fundamentally assumptions about what the world would have looked like in the absence of some event that actually did happen in the world.

We are right back to where we were before the causal calculus, to a world in which arguments about causation are not based on evidence but on untestable claims about the nature of alternative worlds. We’re back to the authority of the speaker being critical to whether the causal argument is credible, and that is what research methods requiring data were supposed to get us away from. Claiming the human mind is ultimately what determines causation, rather than data, risks reversing the scientific revolution’s rejection of faith and authority as bases for claims.