How to create and use a causal diagram (DAG)

Things for novices to consider

A causal diagram, or causal ‘directed acyclic graph’ (DAG), is a cognitive tool that can help you identify and avoid, or at least understand and acknowledge, some potential sources of bias that might alter your study’s findings.

As with other cognitive tools used in research, such as graphs and tables, the more thought that goes into constructing a DAG, the more useful it will be to your research and to the people you hope your research will have an impact on.

The following is a step-by-step guide to constructing a DAG. It is designed for beginners, with less jargon and more detail in each step. Links to some alternate guides and introductions are also below.

Step 0
Choose the software you will use to create the DAG, at least initially. See the software guide for options.

Step 1
Specify/define the exposure (variable of interest) and the outcome as precisely as possible, including when their values have been or will be determined

Step 2
Specify/define all other variables for which data is available or is expected to be

Step 3
For each variable, decide when the event occurred for each person that determined the value of that variable, for example

sex is generally determined at conception and age is determined by a person’s date of birth
marital status is usually determined when provided by the person and may change
developmental disorders are mostly determined at a young age and remain for life
most infections are determined when a person is infected and do not remain after a time
dementia is not determined at a single time point but may be defined as determined at first diagnosis (assuming it is correct), then remains

Step 4
Using the diagramming software of choice (or pen/pencil and paper), create the exposure and outcome variables in the diagram

Step 5
Add all other variables and position them in the diagram so that those with data determined or recorded earlier in time are to the left of those determined later

Where they are positioned in relation to the exposure and outcome helps determine if they are potential confounders, mediators or colliders

Step 6
Draw an arrow between any variables thought likely to be causally associated; indicating the direction of the causal relationship with the direction favouring the stronger causal effect if the variables affect each other over time but it is not clear which variable was determined earlier in the data

Step 7
If the study is longitudinal and a prior value of the outcome Y affects the exposure X, which then affects the following Y, each instance of the exposure and each measurement of the outcome must be shown as separate variables, for example: X₀ → Y₀ → X₁ → Y₁

Step 8
Do not draw an arrow between two variables if available knowledge and the plausibility of potential mechanisms suggests it is unlikely one may cause a meaningful change in the other

This also means that our research conclusions rest, in part, on our assumption that no causal relationship exists between them

Step 9
The causes of any one variable currently in the diagram may be included as additional (unmeasured) variables, but suspected causes of two or more variables should be included

This includes suspected unknown common causes of two or more variables, in which case a symbol such as U might serve as a label

Step 10
Use the DAG to decide which variables are potential confounders and need to be conditioned on (adjusted for)

Other guides

Other guides and sources of information on how to create and use a causal diagram include:

Barnard-Mayers R, Kouser H, Cohen JA, et al. A case study and proposal for publishing directed acyclic graphs: The effectiveness of the quadrivalent human papillomavirus vaccine in perinatally HIV Infected girls. Journal of Clinical Epidemiology. 2022;144:127-135.

Tennant PW, Murray EJ, Arnold KF, et al. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. International Journal of Epidemiology. 2021;50(2):620-632.

Digitale JC, Martin JN, Glymour MM. Tutorial on directed acyclic graphs. Journal of Clinical Epidemiology. 2022;142:264-267.

Williamson EJ, Aitken Z, Lawrie J, Dharmage SC, Burgess JA, Forbes AB. Introduction to causal diagrams for confounder selection. Respirology. 2014;19(3):303-311.

Williams TC, Bach CC, Matthiesen NB, Henriksen TB, Gagliardi L. Directed acyclic graphs: a tool for causal studies in paediatrics. Pediatric research. 2018;84(4):487-493.

Ferguson KD, McCann M, Katikireddi SV, et al. Evidence synthesis for constructing directed acyclic graphs (ESC-DAGs): a novel and systematic method for building directed acyclic graphs. International Journal of Epidemiology. 2020;49(1):322-329.

Shrier I, Platt RW. Reducing bias through directed acyclic graphs. BMC Medical Research Methodology. 2008;8(1):70.

Suzuki E, Shinozaki T, Yamamoto E. Causal Diagrams: Pitfalls and Tips. Journal of epidemiology. 2020;30(4):153-162.

An Introduction to Directed Acyclic Graphs - Malcolm Barrett

DAG resources - Murray Causal Decision Lab at Boston University (Eleanor (Ellie) Murray and her team)

Hernán MA, Robins JM. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC; 2020. https://www.hsph.harvard.edu/miguelhernan/causal-inference-book/