The Book of Why: Chapter 1

This post continues my notes on Judea Pearl and Dana Mackenzie’s The Book of Why. This post covers Chapter 1: The Ladder of Causation.

The question of the book:

How can machines (and people) represent causal knowledge in a way that would enable them to access the necessary information swiftly, answer questions correctly, and do it with ease, as a three-year-old child can?

Page 37

Pearl begins the chapter discussing the book of Genesis story of the Garden of Eden and how, as he wrote his book Causality in the 1990s, he reread Genesis again and realized a crucial distinction he had missed in prior readings. The distinction is between facts–the answer to “what” questions–and causes–the answer to “why” questions. Pearl describes how God asked Adam and Eve “what” questions that Adam and Eve answered with causes, not facts.

Pearl claims three implications from the fact/cause distinction’s appearance in Genesis:

  1. Humans realized early in our evolution that the world is made up not only of facts but also causal explanations that connect facts together in cause-effect relationships.
  2. Causal explanations make up the bulk of human knowledge and should be the cornerstone of machine intelligence.
  3. “Our transition from processors of data to makers of explanations was not gradual; it was a leap that required an external push from an uncommon fruit.”

Rather than religion, evolutionary science also provides evidence for a “major unexplained transition.” Pearl claims in the past 50,000 years of human evolution, there is evidence for a cognitive revolution in which “humans acquired the ability to modify their environment and their own abilities at a dramatically faster rate.” Pearl’s evidence for this is that humans have created complex tools, and he refers to the transition through an engineering analogy: the “super-evolutionary speedup” in tool creation.

Evolution has endowed us with the ability to engineer our lives, a gift she has not bestowed on eagles and owls, and the question, again, is ‘Why?’ What computational facility did humans suddenly acquire that eagles did not?

Page 25

Pearl’s theory is that the transition occurred because humans leaped from fact-based to causal-based thinking, which enabled humans to make mental models of causes and effects and, consequently, plans.

The mental models humans suddenly could make are the causal diagrams Pearl believes should be “the computational core of the ‘causal inference engine'” at the heart of the causal revolution described in the Introduction.

Pearl proposes three nested levels of causation, forming the Ladder of Causation:

  1. Seeing
  2. Doing
  3. Imagining

Seeing nests within Doing, and both nest within Imagining. All three levels must be mastered for causal cognition.

Seeing involves collecting information from one’s environment. Doing involves predicting the outcomes of actions, given one’s understanding of the environment accumulated through Seeing. Imagining involves theorizing why actions produce outcomes and what to do when actions do not produce expected outcomes.

The ladder of causation starts with Seeing, where the outcomes are viewed through association. One rung higher is Doing, where outcomes are viewed through intervention. The final rung is Imagining. At the Imagining rung, outcomes are viewed through Counterfactual reasoning.

Statistics–and machine learning–occurs at the Seeing rung and deals with identifying associations between outcomes. Even the most advanced artificial intelligence deals only with associations, using observations about outcomes (Seeing) to calculate Associations between those outcomes.

Pearl argues machine learning needs to experience the same “super-evolutionary speedup” humans seem to have experienced at some point in their evolutionary history.

Moving up the the Intervention rung requires knowledge of different reasons for different associations. Each reason is a possible intervention into the association between observed outcomes, and outcomes will change in different ways depending on the intervention that occurs. Association thinking does not consider these different interventions, instead assuming that all interventions that produced the observed outcomes used to calculate associations will continue to happen in the future exactly as they did in the past.

The outcome effect of an intervention can be studied through experimental control. A question in this mode can be represented as P( outcome | do( intervention)). If no experiment is possible, interventions can be studied by specifying a causal model that substitutes for experimental control. A question in this mode can be represented as P(outcome | intervention), where the lack of the do() operator indicates lack of experimental control over the intervention; instead, the intervention happened outside the researcher’s control.

While Doing questions can answer questions of what will happen if an action is taken, they cannot answer why that intervention causes that outcome. Answering why questions requires the third rung of the ladder on which we Imagine why interventions lead to outcomes.

Because Pearl’s ultimate goal is machines acquiring causal knowledge, he turns to considering how causal knowledge can be represented in machine-readable form.

How can machines (and people) represent causal knowledge in a way that would enable them to access the necessary information swiftly, answer questions correctly, and do it with ease, as a three-year-old child can?

Page 37

Pearl answers this question with the causal diagram. He asserts that human brains use causal diagrams to store and recall association, intervention, and causal information.

Causal diagrams can encode probabilistic relationships between causes and outcomes, but the structure of the diagram itself also encodes causal information independent of probabilities. More strongly, Pearl claims “causation is not reducible to probabilities.”

To use a causal diagram:

  1. Translate the story under investigation into a diagram.
  2. Listen to the query.
  3. Modify the diagram according to whether the query is counterfactual, interventional, or associational.
  4. Use the modified diagram to compute the answer to the query.

The reason causation cannot be reduced to probabilities is that a probability-based causal approach does not distinguish association from intervention from causation. Because they cannot distinguish between them, they can only provide information about the most general of the three: association. And association, while sometimes trustworthy about causation, often gets causation wrong because it does not specify interventions and counterfactuals. The main problem is unobserved confounding in which an unobserved cause causes an association between two outcomes, but neither outcome causes the other. Probabilistic reasoning might conclude that one observed outcome causes the other, when that is wrong.

For example, when many people get sick with colds, the number of frozen pipes that burst in houses also increases. But people getting sick does not cause burst pipes, and burst pipes do not cause colds. Instead, there are unobserved causes of both outcomes that tend to make both outcomes increase around the same time of year: winter.


In Pearl’s second implication from the Genesis story machine intelligence comes into the discussion in a surprising and sudden way. There’s no argument for why, just a bald claim that machine intelligence should be based on causal explanations.

Pearl’s third implication from the Genesis story itself implies humans began as processors of facts to makers of causal explanations, and that transition required divine intervention through Satan’s convincing Eve and Adam to consume fruit from the Tree of Knowledge. This claim could use a counterfactual: why does Pearl believe that prior to consuming the forbidden fruit, Eve and Adam only thought in terms of facts and not explanations?

The ladder of causation seems to rest on an implicit theory of progress in which humans have moved from Seeing (Association) to Doing (Intervention) and now to Imagining (Counterfactuals). It’s not clear where this progress is going, or if there is an end state.

Pearl writes that the probabilistic approach is always vulnerable to confounding, because it’s impossible to know which confounds exist to control for them. Though Pearl does not address it, this weakness also seems to apply to causal diagrams: how to know which causes to include in the diagrams? Pearl does say that experimental control avoids the problem of confounding, such that P(outcome | do(intervention)) avoids confounding. But this avoids the question of whether causal diagrams, absent experimental control, improve on the probabilistic approach. Further, if the answer to that is no, then why do we need counterfactuals if experiments are the only method for causation?