A causal diagram is a visual model of the cause and effect relationships between variables in a system of interest.1 Such a system might comprise the variables that are causally related to an activity, such as playing sport every weekend, and an outcome it may affect, such as blood pressure.
For the research question ‘does playing sport every weekend reduce the chance of high blood pressure’, imagine that we analysed a sample of patient blood pressure measurements, where all patients, regardless of age, were asked if they played sport every weekend.
A simplified system containing only three variables is shown in Figure 1 and describes how confounding might occur in this example. In this case, while playing sport might decrease the chance of high blood pressure, age may confound the observed relationship because older people are less likely to play weekend sport but more likely to have high blood pressure.
Figure 1 Simple causal diagram that describes possible confounding: Age as a confounder of playing sport every weekend causing a change in blood pressure
Put simply, causal diagrams can make it easier to draw realistic causal inferences. They can help by stimulating the identification of more potential confounders and sources of selection bias than might otherwise have been considered; and they can help to illuminate the set of assumptions that are made when inferring a result from the statistical analysis.
The causal diagram in Figure 1 is also an example of a directed acyclic graph, or DAG, by far the most common type of causal diagram used in health research. In this case, the word ‘graph’ refers to its meaning from mathematical graph theory: a set of points where some points are connected by lines;2 instead of meaning a chart or plot as commonly used in data analysis.
A directed graph is one in which the connecting lines represent a direction from one point to another, and a directed acyclic graph is a directed graph where it is not possible to move from one point to another, following the directed lines (usually drawn as arrows), and arrive back at the original point. In other words, one cannot follow the arrows along a path that forms a closed loop or cycle.
This is necessary for a causal model so that past events can cause future events but future events cannot cause past events. Variables that affect each other, including feedback loops, can still be represented, however, by including both variables at different points in time. An example is shown in Figure 2.
Figure 2. Causal effect of home blood glucose measurement frequency on changes in mean blood glucose over time
It is also common for DAGs to be drawn where time flows from left to right and variables are positioned accordingly.3 This can make it easier to both create and understand a DAG because it presents a causal story4 that aligns with English and other language speakers’ intuition that time flows from left to right.5 And the dominant view in cognitive science is that people understand the world largely by mentally constructing causal narratives or stories.6 7 8
Unlike most introductions to causal diagrams in epidemiology that include some of the formal language and procedures, we have instead attempted an alternative approach that avoids the mathematical terminology of DAGs unless it will hinder an initial understanding. It is hoped that most of the concepts can initially be understood using words from common English; and with fewer new words needing to be stored in working memory while reading this, an ease of understanding will hopefully be promoted.9
This approach stemmed from the influence that cognitive ease has on the decisions people make, such as whether to continue learning about causal diagrams. Once the core concepts have been understood, the more formal terms such as nodes, edges, vertices, d-separation and back-door criterion can easily be associated with those concepts.
This page was adapted from the PhD thesis “Understanding uncertainty and bias to improve causal inference in health intervention research” by Tim Watkins (2019). Available at http://hdl.handle.net/2123/20772References
- Greenland S, Pearl J, Robins JM. Causal Diagrams for Epidemiologic Research. Epidemiology. 1999;10(1):37-48. https://journals.lww.com/epidem/Abstract/1999/01000/Causal_Diagrams_for_Epidemiologic_Research.8.aspx
- Everitt BS, Skrondal A. The Cambridge Dictionary of Statistics. 4th ed. Cambridge, UK, New York: Cambridge University Press; 2010.
- Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
- Pearl J, Glymour M, Jewell NP. Causal inference in statistics: A primer. 1st ed. Hoboken, New Jersey: John Wiley & Sons; 2016.
- Semin GR, Garrido MV, Farias AR. How Many Processes Does It Take to Ground a Concept? In: Sherman JW, Gawronski B, Trope Y, eds. Dual-process theories of the social mind. New York: The Guilford Press; 2014:542-559.
- Kahneman D. Thinking, Fast and Slow. New York, NY: Straus & Giroux; 2011.
- Hastie R. Causal Thinking in Judgments. In: Keren G, Wu G, eds. The Wiley Blackwell Handbook of Judgment and Decision Making. Vol. 54. Chichester, UK: John Wiley & Sons, Ltd; 2015:590-628.
- Sloman SA, Lagnado D. Causality in Thought. Annual Review of Psychology. 2015;66(1):223-247. doi:10.1146/annurev-psych-010814-015135
- Pinker S. The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century. New York, NY: Penguin; 2014.