Commonly used classifications of research bias have two problems readily apparent:
- the same terms can have very different meanings, such as ‘selection bias’
- the same type of bias is often known by different names, for example, see Table 1
This can lead to both misunderstandings in communication between researchers and the confusion of students in epidemiology and biostatistics.1
Cochrane Bias Domain Epidemiologic Term Selection bias Confounding or Selection bias Performance bias Biased direct effect or Confounding Detection bias Measurement bias Attrition bias Selection bias Reporting bias Non-structural bias that cannot be represented in causal diagrams
Language is full of ambiguity,3 however. But this feature of language may have evolved because of our need to communicate with the least effort needed and so, instead, the meaning of what we say or write relies heavily on context.4 This natural ambiguity of language implies that any attempt to unify the meaning of terms used for different types of bias is unlikely to succeed.
But in a series of papers from 2002 to 2009,567 Miguel Hernán and colleagues took an entirely different approach by defining types of bias using causal diagrams. They did not avoid terminology, but they were able to give precise definitions for the standard epidemiological terms of confounding, selection bias, and measurement bias; calling it the “structural classification of bias”.8
Before defining the types of bias, we need to understand how to use a causal diagram once the variables and arrows have all been added. In a DAG, the arrows represent the belief that one variable causes another, and in a DAG with many variables, a causal pathway can be traced by following the arrows from one variable to another, and this can indicate how one variable might influence another further down the causal pathway.
An association, on the other hand, does not have a direction, and in a DAG, an association will exist between two variables if a path can be traced along some arrows, regardless of the direction of the arrows.9
In terms of the structural definition of bias, an association between two variables in a study can be explained by one of three possible causal structures. With an intervention and an outcome as the two variables of interest, these are:10
- Cause and effect: The intervention caused changes in the outcome, or the outcome caused changes in the intervention, on average, in the study population
- For example, a randomised trial with a true causal effect (Figure 1)

- A shared cause: A third variable, a confounder, caused either the receiving of the intervention, or the type of the intervention received, and also caused changes in the outcome
- For example, Figure 2 depicts an observational study where poor health makes it more likely that a study patient was given a particular intervention, for example, an expensive treatment drug, but poor health also makes it more likely that the patient will die, producing an association between receiving the intervention and the outcome (which may, in this case, cancel out an association produced by the intervention causing a reduction in the chance of death)

- A shared effect: A third variable that was conditioned on† was affected by both the intervention and the outcome; that is, a third variable, called a collider, was affected by either receiving the intervention, or by the type of the intervention received, and the collider was also affected by the chance of experiencing the outcome; called selection bias or collider bias
- For example, in a randomised controlled trial depicted in Figure 3, patients with poor health are more likely to die (the outcome), and receiving the treatment drug instead of the placebo (the intervention) was more likely to produce side effects (the shared effect) that led to withdrawal from the study, which is the same as conditioning on patients not withdrawing from the study

In Figure 2, the association between the intervention and the outcome can be blocked by conditioning on the confounder, often achieved by stratifying or including the confounder in a regression model.
A common practice with causal diagrams is to place a border around variables that are conditioned on, such as in Figure 4; and also done in Figure 3, where the results of the study are conditioned on patients remaining in the study, hence a border is around the variable ‘Withdrawal from study’. But in this case, the effect on bias is the same as conditioning on whether the patients got side effects, and this is why it is called collider bias, because the arrows ‘collide’ at the collider.
With this example, however, the selection bias from dropout can be removed by conditioning on poor health, thus blocking the associational pathway highlighted in red. With the structural classification of bias, both selection bias and confounding result in a lack of exchangeability, or non-comparability, with statistical adjustment achieved using the same type of methods for both types of bias.11
The remaining type of bias is measurement bias, and Hernán and Cole (2009)12 identified 4 general types using causal diagrams. However, because there is no apparent confusion of terminology regarding measurement bias, we won’t explore this type of bias any further.
Finally, there is sometimes confusion about the difference between confounding and effect modification,13 so an effect modifier was added to the causal diagram in Figure 5. A fundamental difference is that confounding is a bias that we aim to either prevent by design or remove by conditioning, whereas effect modification is a property of the causal effect being studied and ideally, we would like to estimate and describe it.14
In the example in Figure 5, poor health is a suspected confounder of the relationship between taking the treatment drug and the chance of dying. However, it is also suspected that the causal effect of the drug will vary depending on how quickly the drug is metabolised and that is determined by each patient’s genotype, though not in a way that can be tested. Hence, the drug’s metabolism in each patient does not affect their chance of receiving the treatment.
Effect modification is especially important for the generalizability of any findings, because if the intervention only works, or is only safe for some people, then such effect modifiers need to be identified. Hence, another term for effect modification is effect heterogeneity.15 An intervention is also likely to work better for some individuals than for others, potentially leading to different decisions on whether to use it if information were available to be able to predict someone’s outcome.
It is important to note, however, that causal diagrams are limited in how well they can portray effect modification, where we cannot usually distinguish between multiple possible modifications of the effect.16 And in general, it is not possible to show how variables might interact using causal diagrams, though some work has been done to suggest exceptions may exist.17 There have also been proposals to modify causal diagrams so that interactions could be displayed, but this would mean they would no longer be directed acyclic graphs.18
The main advantage of using the structural classification system to define biases like confounding and selection bias is that, although terminology still plays a role, the use of a causal diagram to guide decisions about the study design, analysis or interpretation, means that the terminology a researcher uses for these biases should not affect such decisions. In this way, the problem of ambiguity can be avoided.
But even if a researcher does not use causal diagrams, this classification system might provide the rigorous, formal definitions of confounding and selection bias that will appeal to some researchers, especially those unhappy with the uncertainty that can surround whether a bias should be called confounding or selection bias.
† The term ‘conditioned on’ or ‘conditional on’ derives from probability theory and intuitively means that the data or the results of the analysis depend on information contained by the variable(s) conditioned on. This might occur by restricting the data to a specific value of a variable, such as including only patients who did not withdraw from a study, or it might occur by adjusting the results of the analysis to remove the effect of (‘condition on’) confounding variables, usually by including the variables in a regression model or stratifying. Conditioning on a variable can also be described as narrowing the scope of the discussion to those situations where the variable is a given value; in other words, where the variable is held constant.19
- Schwartz S, Campbell UB, Gatto NM, Gordon K. Toward a Clarification of the Taxonomy of “Bias” in Epidemiology Textbooks. Epidemiology. 2015;26(2):216-222. https://doi.org/10.1097/EDE.0000000000000224
- Mansournia MA, Higgins JPT, Sterne JAC, Hernán MA. Biases in Randomized Trials: A Conversation Between Trialists and Epidemiologists. Epidemiology. 2017;28(1):54-59. https://doi.org/10.1097/EDE.0000000000000564
- Saeed JI. Semantics. Fourth edition. Chichester, West Sussex: Wiley Blackwell; 2016.
- Piantadosi ST, Tily H, Gibson E. The communicative function of ambiguity in language. Cognition. 2012;122(3):280-291. https://doi.org/10.1016/j.cognition.2011.10.004
- Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal Knowledge as a Prerequisite for Confounding Evaluation: An Application to Birth Defects Epidemiology. American Journal of Epidemiology. 2002;155(2):176-184. https://doi.org/10.1093/aje/155.2.176
- Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615-625. https://doi.org/10.1097/01.ede.0000135174.63482.43
- Hernán MA, Cole SR. Invited Commentary: Causal Diagrams and Measurement Bias. American Journal of Epidemiology. 2009;170(8):959-962. https://doi.org/10.1093/aje/kwp293
- Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615-625. https://doi.org/10.1097/01.ede.0000135174.63482.43
- Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
- Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615-625. https://doi.org/10.1097/01.ede.0000135174.63482.43
- Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615-625. https://doi.org/10.1097/01.ede.0000135174.63482.43
- Hernán MA, Cole SR. Invited Commentary: Causal Diagrams and Measurement Bias. American Journal of Epidemiology. 2009;170(8):959-962. https://doi.org/10.1093/aje/kwp293
- Rothman KJ. Causes. American Journal of Epidemiology. 1976;104(6):587-592. https://doi.org/10.1093/oxfordjournals.aje.a112335
- Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott, Williams & Wilkins; 2008.
- VanderWeele TJ. Confounding and effect modification: distribution and measure. Epidemiologic Methods. 2012;1(1):55-82. https://doi.org/10.1515/2161-962X.1004
- Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
- VanderWeele TJ. On the Distinction Between Interaction and Effect Modification. Epidemiology (Cambridge, Mass.). 2009;20(6):863-871. https://doi.org/10.1097/EDE.0b013e3181ba333c
- Weinberg CR. Can DAGs Clarify Effect Modification? Epidemiology (Cambridge, Mass.). 2007;18(5):569-572. https://doi.org/10.1097/EDE.0b013e318126c11d
- Greenland S, Pearl J. Causal Diagrams. In: Lovric M, ed. International Encyclopedia of Statistical Science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011:208-216.