Access the full text.
Sign up today, get DeepDyve free for 14 days.
Uncontrolled confounding due to unmeasured confounders biases causal inference in health science studies using observational and imperfect experimental designs. The adoption of methods for analysis of bias due to uncontrolled confounding has been slow, despite the increasing availability of such methods. Bias analysis for such uncontrolled confounding is most useful in big data studies and systematic reviews to gauge the extent to which extraneous preexposure variables that affect the exposure and the outcome can explain some or all of the reported exposure-outcome associations. We review methods that can be applied during or after data analysis to adjust for uncontrolled confounding for different outcomes, confounders, and study settings. We discuss relevant bias formulas and how to obtain the required information for applying them. Finally, we develop a new intuitive generalized bias analysis framework for simulating and adjusting for the amount of uncontrolled confounding due to not measuring and adjusting for one or more confounders. INTRODUCTION Observational studies play a central role in the health sciences (47â49). They are used for etiologic research, prediction research (e.g., to identify high-risk groups), prognostic research, and diagnostic research. Observational studies are becoming increasingly important for causal analysis given the practical and ethical costs of conducting randomized trials coupled with the increasing availability of secondary big data (1, 24, 25, 27). Both clinical and public health researchers rely heavily on observational studies. With the advent of big data, large observational studies are becoming common, and their sample size reduces the importance of sampling error or random variation to a secondary role relative to systematic error or bias. Uncontrolled confounding is one crucial source of systematic error or bias. It arises when variables that are not mediators of the effect under study, and that can explain part or all of the observed association between the study exposure and the outcome, are not measured and controlled for during study design or analysis (42, 49). For an unmeasured variable that is not a mediator (or not a consequence of the exposure more generally) to lead to uncontrolled confounding, once measured confounding variables are already adjusted for, the unmeasured variable must either: (a) be a cause of the outcome through a pathway other than the exposure and also be associated with the exposure, or (b) be a cause of the exposure and be associated with the outcome through a pathway other than the exposure. These criteria subsume a scenario in which the unmeasured variable is a common cause of the exposure and outcome (35, 42, 65). Scope This review focuses on uncontrolled confounding in studies that consider the total effect of one or more interventions (3, 29, 49, 63). The methods discussed here can also be used in mediation settings for exposure-outcome confounding only. Nonetheless, we refer the reader to the relevant literature for detailed considerations on bias analysis for uncontrolled mediator-outcome confounding (60â62). Bias analysis for unmeasured confounders under interaction analysis (64) is also not covered here. Speciï¬c methods for use in survival settings also exist and are the subject of ongoing research (26, 33, 61), but they are not discussed here. Also not covered here is the special case of uncontrolled confounding in multilevel or mixed model settings (10, 32, 39). Throughout, we focus on study settings in which the exposure-outcome association or effect is quantiï¬ed using risk difference, mean difference, risk ratio (as in cohort studies), or odds ratio (as in case-control studies). Deï¬nitions, Notation, and Assumptions Let C, X, and Y stand respectively for measured confounder(s), exposure (or treatment), and outcome. Unless stated otherwise, we assume binary C, X, and Y, although the methods apply more generally. We use U to denote an unmeasured confounding variable (or a set of such variables). U may or may not be a known confounder; herein we only assume that it is unmeasured and uncontrolled in ensuing analysis. Capital and small letters are used to represent variables and their realized values, respectively. The reference levels of U, C, and X are denoted by uâ , câ , and xâ . Let Yx denote the potential value of the outcome when exposure X is set to x, perhaps contrary to fact. Uncontrolled confounding is assumed to be present if Yx is independent of X given both C and U but not given C only (41, 63, 66). That is, there is at least an open backdoor between X and Y that is not closed by conditioning on C but could be closed by conditioning on U had U been measured. See Figures 1â3 for directed acyclic graphs (DAGs) that depict data-generating processes or causal structures whereby U and C confound the effect of X on Y. [There are now accessible introductions and detailed resources on DAGs (22, 42, 49).] Figures 1â3 report information regarding U in relation to the observed X, Y, and possibly C; omitting U leads to uncontrolled confounding and bias analysis using some of the methods described here. Figure 4 represents an intractable scenario 24 Arah C U X Y Figure 1 Directed acyclic graph (DAG) in which the exposure X and outcome Y share two common causes or confounders, namely a measured variable C and an unmeasured variable U. in which measuring and controlling for U in addition to C does not allow us to estimate the effect of X on Y without bias. In Figure 4, conditioning on U would have controlled for confounding by U (via the path XâUâY ) but would have introduced a new bias (called collider-stratiï¬cation bias) by opening up the colliding arrowheads at U (Xââ[U] ââY ) (22, 42). Estimation of the total effect of X on Y in Figure 4 requires additional measurements beyond C and U to fully control for confounding and eliminate any collider-stratiï¬cation bias introduced by conditioning on U. CONSEQUENCES OF UNCONTROLLED CONFOUNDING Before considering how to adjust for the bias left by an unmeasured confounder U, it can be instructive to see how not controlling for U leads to bias in more than just the effect of the exposure. Suppose we have study data on exposure X, outcome Y, and confounder C from the underlying causal structure in Figure 1. Were U known and had it been measured, it could have been used by an investigator to specify the following model and estimate the conditional risk difference for the effect of X on Y, Î± X + Î± X C c, when C = c: E(Y |x, c , u) = Î±0 + Î± X x + Î±C c + Î±XC xc + Î±U u. C U X Y Figure 2 Directed acyclic graph (DAG) in which the exposure X and outcome Y share a measured common cause or confounder C and an unmeasured confounder U that is a cause of Y but is only associated with X through an unmeasured common cause depicted by a bidirectional dashed arrow. www.annualreviews.org â¢ Analysis of Uncontrolled Confounding 25 C U X Y Figure 3 Directed acyclic graph (DAG) in which the exposure X and outcome Y share a measured common cause or confounder C and an unmeasured confounder U that is a cause of X but is only associated with Y through an unmeasured common cause depicted by a bidirectional dashed arrow. In the absence of U, the investigator ends up ï¬tting the following model in which Ï x is a biased estimate of the conditional risk difference for the conditional total effect of X on Y given C: E(Y |x, c ) = u E(Y |x, c , u)P(u|x, c ) = Î±0 + Î± X x + Î±C c + Î± X C xc + Î±U (Î»0 + Î» X x + Î»C c + Î» X C xc ) = Î±0 + Î±U Î»0 + (Î± X + Î±U Î» X )x + (Î±C + Î±U Î»C )c + (Î± X C + Î±U Î»XC )xc = Ï0 + Ï X x + ÏC c + ÏXC xc Here, the model relating the unmeasured confounder U to the exposure X and measured confounder C is given by E(U |x, c ) = Î»0 + Î» X x + Î»C c + Î»XC xc . In this model (assuming a binary U), Î» X x + Î»XC xc is the risk difference in U due to one-unit change in the exposure X (that is, moving from X = 0 to X = 1 for binary X or moving from X = xâ to X = x for any X) conditional on C = c. From the model for Y given X and C only, it can be seen that Ï X âthe coefï¬cient of X or the estimate of the conditional association between X and Yâin the model adjusting for C only is biased for the true effect Î± X : in this case, Ï X = Î± X + Î±U Î» X . Similarly, the coefï¬cient of C U X Y Figure 4 Directed acyclic graph (DAG) in which the exposure X and outcome Y share two common causes or confounders C (measured) and U (unmeasured), the effect of U on X is confounded by an unobserved common cause of U and X, and the effect of U on Y is confounded by an unobserved common cause of U and Y [each bidirectional dashed arrow represents unobserved common cause(s)]. 26 Arah Table 1 Contingency table showing hypothetical study data on measured binary confounder C, exposure X, and outcome Y (with a binary unmeasured confounder U)a C = 1 X = 1 Y = 1 Y = 0 Total a C = 0 Total 3,472 4,528 8,000 X = 1 1,248 4,032 5,280 X = 0 1,056 5,664 6,720 Total 2,304 9,696 12,000 X = 0 1,008 1,872 2,880 2,464 2,656 5,120 True data generating process (N = 20,000): P(U = 1) = 0.6; P(C = 1) = 0.4; P(X = 1 | c, u) = 0.35 + 0.2c + 0.15u; P(Y = 1 | x, c, u) = 0.05 + 0.05x + 0.2c + 0.05xc + 0.2u. C, namely ÏC , is also biased for Î±C because ÏC = Î±C + Î±U Î»C . It should be noted that Î±C , the true or unbiased coefï¬cient of C, need not have a causal interpretation in the ï¬rst instance (63, 67). In Figure 3, for example, the association between C and Y conditional on X is not causal: The dashed bidirectional arc represents some unmeasured common cause of U and Y. It can also be seen that ÏXC , the coefï¬cient of the product term XC, which captures the heterogeneity in the effect of X on Y across levels of C, is also affected by the lack of control for U. Not controlling for U has at least two consequences for the estimates from the model regressing Y on X and C only: It leads to (a) confounding of the XâY effect and (b) collider-stratiï¬cation bias in the C-Y association, or, where a causal interpretation is warranted, in the (direct) effect of C on Y not through X. The latter bias arises because, conditional on X that is a collider (a variable with two arrowheads on it) between C and U, C is now additionally associated with Y through the pathway Câ[X] âUâY, where the square brackets indicate that the variable is conditioned on. Table 1 presents an illustrative study data with an unmeasured U, and Table 2 shows how the coefï¬cients of X and C are biased by not controlling for U in conditional risk difference models. Table 3 additionally shows the consequences of uncontrolled confounding for various marginal risk differences. There is a third type of bias that could also result from not controlling for U: This is bias ampliï¬cation of the uncontrolled confounding in the XâY effect if the investigator adds to the model an instrumental variableâa preexposure variable that is only a cause of or is associated with the exposure X but not with Y except through X (40, 43). This third bias can be avoided by not adjusting for an instrumental variable as a covariate in the regression model. Table 2 Conditional risk differences (95% conï¬dence intervals) from linear binomial risk models True (unbiased) model adjusting for C and Ua Biased model not controlling for confounding by Ub 0. 079 (0.065â0.094) 0.131 (0.109â0.153) 0.193 (0.173â0.212) Not applicable relating exposure X to outcome Y, adjusted for confounding Coefï¬cient of X when C = 0 Coefï¬cient of X when C = 1 Coefï¬cient of C (conditional on X) Coefï¬cient of U a b 0.050 (0.037â0.063) 0.100 (0.078â0.121) 0.200 (0.189â0.211) 0.200 (0.194â0.206) True (unbiased) model: P(Y = 1 | x, c, u) = 0.05 + 0.05x + 0.2c + 0.05xc + 0.2u. Biased model: P(Y = 1 | x, c) = 0.1571 + 0.0792x + 0.1929c + 0.052xc. www.annualreviews.org â¢ Analysis of Uncontrolled Confounding Table 3 Unbiased and biased conditional and marginal risk differences [95% conï¬dence intervals (CI)] from linear binomial risk models for the effect of the exposure X on the outcome Y Unbiased risk difference (95% CI), controlling for C and U Conditional risk difference for the conditional total effect at C = 0 Conditional risk difference RDYX at C = 1 Marginal risk difference for the average treatment effect in the total population (ATE) Marginal risk difference for the average treatment effect in the treated (ATT) Marginal risk difference for the average treatment effect in the untreated (ATU) 0.050 (0.037â0.063) 0.100 (0.078â0.121) 0.070 (0.058â0.083) 0.075 (0.062â0.087) 0.065 (0.053â0.077) Biased risk difference (95% CI), controlling for C only 0. 079 (0.065â0.094) 0.131 (0.109â0.153) 0.100 (0.088â0.112) 0.105 (0.093â0.117) 0.095 (0.083â0.107) To summarize, not controlling for a confounder like U in Figures 1â3 biases the estimate of the effect of the exposure X as well as estimates of the associations between measured confounders (such as C) and the outcome Y. BIAS FORMULAS FOR UNCONTROLLED CONFOUNDING One of the oldest and most commonly used methods for adjusting an association estimate for an unmeasured confounder is the use of a bias formula to calculate a bias factor (3â11, 14, 17, 19, 20, 23, 32, 44, 53, 68). The calculated bias factor is then subtracted or removed from the biased (partially adjusted) estimate relating the exposure to the outcome (3, 29, 49, 50, 63) to obtain an externally adjusted estimate that could have been obtained if the assumptions about the bias parameters used in calculating the bias factor had held. In the example in Figure 1, uncontrolled confounding due to not measuring U is transmitted through the backdoor from X to Y, XâUâY. A bias formula is a formula that is used to quantify the confounding via this backdoor. Assuming that the set of variables C and U were sufï¬cient to control for confounding when estimating the effect of X on Y, the relevant conditional risk differences (RDYX(target population)|c conditional on C but standardized to U in different target populations, namely the total, exposed X = x, and unexposed X = xâ ) and the marginal causal risk differences [RDYX(total) , RDYX(x) , and RDYX(xâ ), respectively, standardized to the joint distributions of C and U in the total, exposed and unexposed populations] would be given, without bias, by RDYX (total)|c = E(Yx |c ) â E(Yx â |c ) = u E(Y |x, c , u)P(u|c ) â u E(Y |x â , c , u)P(u|c ); RDYX (x)|c = E(Yx |x, c ) â E(Yx â |x, c ) = E(Y |x, c ) â u E(Y |x â , c , u)P(u|c , x) E(Y |x â , c , u)P(u|c , x); u = u E(Y |x, c , u)P(u|c , x) â Arah RDYX (x â )|c = E(Yx |x â , c ) â E(Yx â |x â , c ) = u E(Y |x, c , u)P(u|c , x â ) â E(Y |x â , c ) E(Y |x, c , u)P(u|c , x â ) â u u xâ = E(Y |x â , c , u)P(u|c , x â ); RDYX (total) = E(Yx ) â E(Y ) = c ,u E(Y |x, c , u)P(u|c )P(c ) â c ,u E(Y |x â , c , u)P(u|c )P(c ); RDYX (x) = E(Yx |x) â E(Yx â |x) = E(Y |x) â c ,u E(Y |x â , c , u)P(u|c , x)P(c |x) E(Y |x â , c , u)P(u|c , x)P(c |x); c ,u = c ,u E(Y |x, c , u)P(u|c , x)P(c |x) â RDYX (xâ ) = E(Yx |x â ) â E(Yx â |x â ) = c ,u E(Y |x, c , u)P(u|c , x â )P(c |x â ) â E(Y |x â ) E(Y |x, c , u)P(u|c , x â )P(c |x â ) â c ,u c ,u = E(Y |x â , c , u)P(u|c , x â )P(c |x â ). In the causal inference literature, RDYX(total) , RDYX(x) , and RDYX(xâ ) represent the risk differences for the average treatment effect in the total population (ATE), the average treatment effect among the treated (ATT), and the average treatment effect among the untreated (ATU), respectively (3, 63). Alternatively, these causal contrasts could have been deï¬ned as risk or odds ratios (2, 16, 29). For continuous U, integral signs and probability density functions replace the summation signs and probability mass functions, respectively, in these expressions. In the absence of U, the corresponding associational risk differences relating X to Y adjusted for C or standardized to the distribution of C are given by R DYX +(total)|c = E(Y |x, c ) â E(Y |x â , c ) = u E(Y |x, c , u)P(u|c , x) â u E(Y |x â , c , u)P(u|c , x â ); R DYX +(x)|c = E(Y |x, c ) â E(Y |x â , c ) = u E(Y |x, c , u)P(u|c , x) â u â E(Y |x â , c , u)P(u|c , x â ); R DYX +(xâ )|c = E(Y |x, c ) â E(Y |x , c ) = u E(Y |x, c , u)P(u|c , x) â u E(Y |x â , c , u)P(u|c , x â ); R DYX +(total) = = E(Y |x, c )P(c ) â c c E(Y |x â , c )P(c ) E(Y |x â , c , u)P(u|x â , c )P(c ); c ,u E(Y |x, c , u)P(u|x, c )P(c ) â c ,u R DYX +(x) = E(Y |x) â c E(Y |x , c )P(c |x) E(Y |x â , c )P(c |x) c â = c E(Y |x, c )P(c |x) â = c ,u E(Y |x, c , u)P(u|c , x)P(c |x) â c ,u E(Y |x â , c , u)P(u|c , x â )P(c |x); www.annualreviews.org â¢ Analysis of Uncontrolled Confounding R DYX +(xâ ) = c E(Y |x, c )P(c |x â ) â E(Y |x â ) E(Y |x, c )P(c |x â ) â c c = = c ,u E(Y |x â , c )P(c |x â ) E(Y |x â , c , u)P(u|c , x â )P(c |x â ). c ,u E(Y |x, c , u)P(u|c , x)P(c |x â ) â The bias [BiasRDYX(target population) ] due to not controlling for U in each of the risk differences is given in turn by the difference between the risk difference estimate not adjusted for U and the risk difference estimate adjusted for U: BiasRDYX(total)|c = R DYX +(total)|c â R DYX (total)|c = u [E(Y |x, c , u) â E(Y |x, c , u â )] [P(u|c , x) â P(u|c )] â u [E(Y |x â , c , u) â E(Y |x â , c , u â )] [P(u|c , x â ) â P(u|c )] ; BiasRDYX(x)|c = R DYX +(x)|c â R DYX (x)|c = u [E(Y |x â , c , u) â E(Y |x â , c , u â )] [P(u|c , x) â P(u|c , x â )] ; BiasRDYX(x â )|c = R DYX +(xâ )|c â R DYX (xâ )|c = u [E(Y |x, c , u) â E(Y |x, c , u â )] [P(u|c , x) â P(u|c , x â )]; BiasRDYX(total) = R DYX +(total) â R DYX (total) = c ,u [E(Y |x, c , u) â E(Y |x, c , u â )] [P(u|c , x) â P(u|c )]P(c ) â c ,u [E(Y |x â , c , u) â E(Y |x â , c , u â )] [P(u|c , x â ) â P(u|c )]P(c ); BiasRDYX(x) = R DYX +(x) â R DYX (x) = c ,u [E(Y |x â , c , u) â E(Y |x â , c , u â )] [P(u|c , x) â P(u|c , x â )] P(c |x); BiasRDYX(x â ) = R DYX +(xâ ) â R DYX (xâ ) = c ,u [E(Y |x, c , u) â E(Y |x, c , u â )] [P(u|c , x) â P(u|c , x â )] P(c |x â ). The contrast on the right-hand side of each of these expressions is usually presented as a bias formula that is used to obtain the numerical value or bias factor on the left-hand side (3, 45, 63). Several techniques based on the idea of bias formulas have been in use for more than half a century (3â11, 14, 17, 19, 20, 23, 32, 44, 53, 68) but were generalized only recently (3, 60). One advantage of these bias formulas is that they are general and can be used for general outcomes, exposures, and confounders (63). These bias formulas generally require specifying the following bias parameters: (a) the relation between U and Y conditional on C and X [for example, E(Y |x, c , u) â E(Y |x, c , u â )] ; (b) the distribution of U conditional on C and X [P(u|c , x) and P(u|c , x â )] ; and sometimes (c) the distribution of 30 Arah U conditional on C but not X [P(u|c )] . The second bias parameters P(u|c , x) and P(u|c , x â ) relate the exposure X to the unmeasured confounder U, conditional on the measured confounder(s) C. The ï¬rst bias parameter E(Y |x, c , u) â E(Y |x, c , u â ) relates U to Y conditional on C. Typically, the investigator speciï¬es the bias parameters and plugs them into the relevant bias formula to quantify the bias factor (for example, BiasRDYX(total)|c ), which is then used to adjust the biased risk difference R DYX +(total)|c to obtain the U-adjusted risk difference R DYX (total)|c : R DYX (total)|c = R DYX +(total)|c â BiasRDYX(total)|c . Parallel formulas for the risk ratio as well as approximate formulas for the odds ratio have also been derived and are reported elsewhere (3, 63). The bias formulas for uncontrolled confounding can be simpliï¬ed further in some cases if we are willing to make additional (usually parametric) assumptions, such as assuming homogeneity of the bias parameters across levels of the exposure and measured confounders (3, 63). To make the use of these formulas more concrete, consider the data in Table 1, in which U is assumed unobserved. The conditional linear risk model for estimating the effect of X on Y conditional on C and U is E(Y |x, c , u) = Î±0 + Î± X x + Î±C c + Î± X C xc + Î±U u = 0.05 + 0.05x + 0.2c + 0.05xc + 0.2u. The true unbiased conditional X-Y risk difference would have been 0.05 at C = 0. In the absence of U, the following is obtained by regressing Y on X and C only: E(Y |x, c ) E(Y |x, c , u)P(u|x, c ) = = Ï0 + Ï X x + ÏC c + Ï X C xc = 0.157 + 0.079x + 0.192c + 0.052xc . At C = 0, the estimated conditional risk difference R DYX +(x)|c for the association between X and Y is 0.079 and is biased for the true conditional risk difference of 0.05. The appropriate formula for a conditional linear risk model can be used to estimate the bias factor that can be subtracted from the biased estimate 0.079 to obtain the U-adjusted risk difference estimate. The formula for Bias R DY X (x)|c can be used in which x is 1, xâ is 0, u is 1, and uâ is 0. The bias formula requires the following bias parameters: (a) the risk difference E(Y |x â , c , u) â E(Y |x â , c , u â ) relating U to Y conditional on the exposure X and measured confounder C rewritten as E(Y |X = 0, c , U = 1) â E(Y |X = 0, c , U = 0) at U = 1 but as E(Y |X = 0, c , U = 0) â E(Y |X = 0, c , U = 0) = 0 at U = 0, using U = 0 as reference; and (b) the prevalence of each level of U among X = x and C = c, which can be expressed generally as P(U = 1|c , x) = Î»0 + Î» X x + Î»C c + Î» X C xc . In this case, we secretly know that P(U = 1|c , x) = Î»0 +Î» X x +Î»C c +Î» X C xc = 0.536+0.146x â0.036c +0.010xc . In real applications, this bias parameter model will not be known and must be obtained from an external source, as discussed in the next section. To apply the bias formula in this illustration, recall that, for the binary U, X, and Y used in this illustration, E(Y |x â , c , u) â E(Y |x â , c , u â ) = (Î±0 + Î± X Â· x â + Î±C c + Î± X C Â· x â Â· c + Î±U u) â (Î±0 + Î± X Â· x â + Î±C c + Î± X C Â· x â Â· c + Î±U Â· u â ) = Î±U . www.annualreviews.org â¢ Analysis of Uncontrolled Confounding 31 u Therefore, BiasRDYX (x)|c = u [E(Y |x â , c , u) â E(Y |x â , c , u â )][P(u|c , x) â P(u|c , x â )] = [E(Y |X = 0, c , U = 1) â E(Y |X = 0, c , U = 0)][P(U = 1|c , X = 1) âP(U = 1|c , X = 0)] + [E(Y |X = 0, c , U = 0) âE(Y |X = 0, c , U = 0)][P(U = 0|c , X = 1) â P(U = 0|c , X = 0)] = Î±U Â· [P(U = 1|c , X = 1) â P(U = 1|c , X = 0)] + 0 Â· [P(U = 0|c , X = 1) âP(U = 0|c , X = 0)] = Î±U Â· [P(U = 1|c , X = 1) â P(U = 1|c , X = 0)] = Î±U [(Î»0 + Î» X Â· 1 + Î»C c + Î» X C Â· 1 Â· c ) â (Î»0 + Î» X Â· 0 + Î»C c + Î» X C Â· 0 Â· c )] = Î±U (Î» X + Î» X C Â· c ) = Î±U (Î» X + Î» X C Â· 0) = Î±U Î» X = 0.2 Ã 0.146 = 0.029. The bias-adjusted X-Y risk difference R DYX (x)|c is then obtained by applying the formula R DYX (x)|c = R DYX +(x)|c â BiasRDYX(x)|c = 0.079 â 0.029 = 0.05. OBTAINING THE VALUES OF THE BIAS PARAMETERS Specifying the values for bias parameters can be formidable without deep prior knowledge or external data. In particular, the bias parameter values for P(u|c , x) and P(u|c , x â ) can be hard to determine, being usually less intuitive than the association between U and Y conditional on X and C, namely [E(Y |x, c , u) â E(Y |x, c , u â )]. Therefore, a particular challenge in applying bias formulas and related methods is how to obtain the magnitude and direction of the bias parameters needed for relating U to X and C as well as relating U to Y conditional on X and C (3, 29, 49, 55, 57, 63). Several sources can be used to obtain the bias parameters for use in the bias formulas. Beyond the investigatorâs background knowledge, a validation (sub)study that is internal or external to the primary study can be a source of bias parameters (12, 20, 30, 38, 52, 54, 56). An internal validation substudy is speciï¬c to the primary study and can be especially useful if it can spend more resources improving and expanding measurements that can inform the larger primary study. The validation study collects invaluable data that can be used to address selection bias (especially due to selective nonresponse), measurement error in variables, and uncontrolled confounding due to confounders not measured in the larger primary study (20, 29, 30, 56, 57). An external validation study is not a substudy of the primary study and can be used similarly where appropriate. Examples of external validation data can come from other published study data, systematic reviews, and meta-analyses. Although it can supply bias parameter values for one or more unmeasured confounders more readily than an investigatorâs background knowledge or intelligent guesses, a validation (sub)study can still be prone to similar sources of bias as the primary study. Therefore, it is important not to be overly optimistic about the value of the validation (sub)study, and the investigator should allow for such uncertainty in using bias parameters from the validation (sub)study (20, 30, 49). FIXED VERSUS PROBABILISTIC BIAS ANALYSIS After obtaining the bias parameters, the investigator can use them in the bias formulas for bias analysis in several ways. First, simple ï¬xed analysis involving a ï¬xed (one-time) value assignment to 32 Arah the bias parameters can be used to obtain single bias-adjusted estimates of the exposure-outcome association. This does not account for random error or even uncertainty in the values of the bias parameters. This is sometimes referred to as simple sensitivity analysis (29, 30, 49). Second, the investigator can repeat the simple bias analysis for several different ï¬xed values of the bias parameters and report several bias-adjusted exposure-outcome association estimates. As before, this so-called multidimensional bias analysis does not account for random error or for uncertainty in each ï¬xed bias parameter value. Third, to overcome the shortcomings of the preceding approaches, probability distributions, rather than ï¬xed values, can be assigned to the bias parameters in what is called probabilistic bias analysis to obtain a distribution of bias-adjusted exposure-outcome association estimates, while accounting for study random error and uncertainty in the choice of bias parameters (3, 29, 30, 49). OTHER BIAS ANALYSIS METHODS Scholars have developed methods other than the bias formulas for adjusting for uncontrolled confounding. These include: (a) the direct speciï¬cation of the bias factor (that is, the numerical value from the bias formula without specifying the underlying bias parameters relating U to X given C and relating U to Y given X and C) (21) or related methods (47, 48); (b) the simulation or imputation of U using external information (subsumed under missing data methods) (12, 20, 28); (c) propensity calibration using validation data (51, 56, 57); (d ) intensity scores (9, 46); (e) the use of negative controls (34); and ( f ) the use of bounding techniques (13, 18, 31, 36), among others. Some of these techniques are still evolving and have the potential to become routine in bias analysis for uncontrolled confounding. MULTIPLE UNMEASURED CONFOUNDERS With the exception of a few cases, such as propensity calibration and Bayesian bias analysis, many existing methods are not easily amenable to multiple unmeasured confounders (51, 54, 57, 59). Nonetheless, it is possible to view the bias formulas discussed in this article as being extensible to multiple unmeasured confounder settings by seeing U as a set of variables and adapting the formulas to reï¬ect the implied joint distribution of multiple Us. This could substantially increase the number of bias parameters needed. More work is needed in this area. GENERALIZED BIAS ANALYSIS FRAMEWORK To overcome some of the challenges facing the existing methods described above, we propose a novel generalized framework for bias analysis that simulates the amount of uncontrolled confounding due to one or more unmeasured confounders under one or more scenarios in which the exposure X has no effect or some effect on the outcome Y given U and C; Figure 5 provides an example. This new generalized bias analysis using simulated confounding is intuitive because it allows the investigator to reason in the direction of the arrows in the DAG or the information ï¬ow in the assumed data-generating process to quantify the amount of uncontrolled confounding due to the speciï¬ed values of assumed bias parameters. For example, instead of reasoning about bias parameters backward from X and C to U, as seen in the bias formula approach, in the simulated confounding approach, one reasons from U and C to X to obtain a new simulated exposure Xsim from P(xsim |c , u), from which P(u|c , xsim ) can be estimated by regressing U on C. The new Xsim can also be used in a bias formula if so desired (although this last part is not required). Similarly, www.annualreviews.org â¢ Analysis of Uncontrolled Confounding 33 C U X Y Figure 5 Directed acyclic graph (DAG) in which the exposure X and outcome Y share two common causes C (measured) and U (unmeasured), but X is assumed to have no effect on Y. U, C, and Xsim are used to generate a new outcome Ysim , assuming a null Xsim -Ysim association. Regressing Ysim on Xsim and C, but not U, yields a non-null association between exposure Xsim and outcome Ysim that is due to uncontrolled confounding by U. Overall, this simulated confounding framework entails the following algorithm: Step 1: Simulate the unmeasured confounder Usim from its marginal distribution P(u sim ); Ë Ë for example, simulate a binary U from P(U sim = 1) = Î¼U where 0 < Î¼U < 1. Step 2: Using the observed study data, obtain the parameters of the conditional distribution Ë Ë P(x|c ); for example, regress X on C to obtain P(X = 1|c ) = Î´0 + Î´C c . Ë Step 3: Simulate Xsim from P(xsim |c , u sim ) using the parameter Î¼U and Usim from step 1, Ë Ë Ë parameters Î´0 and Î´C from step 2, and externally obtained parameter Î´U for the assumed U-X associational risk difference given C; for example, simulate a binary Xsim from a Bernoulli Ë Ë Ë Ë Ë Ë Ë trial using P(X sim = 1|c , u sim ) = Î´0 + Î´C c + Î´U u sim â Î´U Î¼U , where the constant term Î´U Î¼U offsets the intercept to account for marginalizing over the unobserved U in the observed data used to obtain the parameters in step 2. Step 4: Use the observed study data to obtain the parameters of the conditional expression Ë Ë Ë P(y|x, c ); for example, regress Y on X and C to obtain P(Y = 1|x, c ) = Ï0 + Ï X x + ÏC c . Step 5: Simulate Ysim from P(ysim |xsim , c , u sim ) using Usim from step 1, Xsim from step 3, Ë and externally obtained parameter (risk difference) Î±U for the assumed conditional U-Y associational risk difference given X and C in the unobserved model for E(Y |x, c , u) = Î±0 + Î± X x + Î±C c + Î± X C xc + Î±U u. For example, simulate a binary Ysim from a Bernoulli trial Ë Ë Ë using P(Y sim = 1|xsim , c , u sim ) = Ï0 + 0 Â· xsim + ÏC c + Î±U u sim . Ë Ë Ë Step 6: Regress Ysim on Xsim and C to obtain P(Y sim = 1|xsim , c ) = Î³0 + Î³ X sim xsim + Î³C c , and Ë read off the coefï¬cient Î³ X sim of Xsim as the amount of confounding due to omitting Usim . Ë Note that Î³ X sim is also an estimate of the bias factor for the conditional association model. Ë Step 7: Repeat step 4 and use the simulated confounding estimate or bias factor Î³ X sim to offset the U-biased coefï¬cient of X in the model for P(Y = 1|x, c ) that is based only on the observed data, and thus to obtain U-bias-adjusted X-Y association. The programming of steps 3 and 5 can be simpliï¬ed when there are many measured covariates C by omitting the coefï¬cient(s) of set C while maintaining the coefï¬cients of U and X at the levels they would have attained under the omitted coefï¬cient set. This simulated confounding is quite general and can be used for difference and ratio measures, general outcomes, exposures and confounders, and multiple unmeasured confounders, and it can incorporate different functional forms into the models for the exposure and outcome. The new algorithm only simulates U, X, and Y using 34 Arah parameters taken from the observed study data, covariates from the observed data (optionally), and externally obtained bias parameters (not unlike the bias formula technique, although this method is more intuitive). The algorithm can be repeated in combination with bootstrapping or can be programmed into a more extensive probabilistic sensitivity analysis. This new algorithm is different from the semi-automated sensitivity analysis of Lash and Fink, which simulates U as a function of the observed X and Y (28). Their method, therefore, also involves the challenge of reasoning backward from X and Y to U as was seen in the bias formulas discussed earlier in this article. The new generalized bias analysis method introduced here avoids that challenge by appealing to the intuition encoded in the assumed DAG and reasoning forward from U to X and Y. CONCLUSION Bias analysis for uncontrolled confounding is crucial for causal inference studies that rely on observational data or less-than-perfect randomized controlled trials, as the ones seen in the health sciences. Concerns about uncontrolled confounding should accompany any covariate selection issue for confounding control in empirical quantitative analysis (2, 4, 15, 17, 23, 42, 58). Methodological development of bias analysis for uncontrolled confounding has been ongoing for more than half a century, and general methods and software have become more readily available in the last decade (3, 20, 29, 30, 37, 49, 55, 56, 63). Although its adoption and applications have been slow, bias analysis needs and applications are likely to rise with the growth of big data, computational platforms, causal inference tools, needs for replication, data sharing, and journal peer-review and reporting requirements. This article has provided a broad overview of the key approaches and introduced a new framework that hopefully contributes to faster adoption and further methodological development and reï¬nement. DISCLOSURE STATEMENT The author is not aware of any afï¬liations, memberships, funding, or ï¬nancial holdings that might be perceived as affecting the objectivity of this review. ACKNOWLEDGMENTS This work was partly supported by European Commission FP7 grant 241822, NIDDK grant R01DK095668-02, and NICHD grant R01HD072296-01A1. I thank Amy Chai, Vahe Khachadourian, Roch Nianogo, and the anonymous reviewers for all the feedback that helped improve this article. LITERATURE CITED 1. Angus DC. 2015. Fusing randomized trials with big data: the key to self-learning health care systems? JAMA 314(8):767â68 2. Arah OA. 2008. The role of causal reasoning in understanding Simpsonâs paradox, Lordâs paradox, and the suppression effect: covariate selection in the analysis of observational studies. Emerg. Themes Epidemiol. 5:5 3. Arah OA, Chiba Y, Greenland S. 2008. Bias formulas for external adjustment and sensitivity analysis of unmeasured confounders. Ann. Epidemiol. 18(8):637â46 4. Arah OA, Sudan M, Olsen J, Kheifets L. 2013. Marginal structural models, doubly robust estimation, and bias analysis in perinatal and paediatric epidemiology. Paediatr. Perinat. Epidemiol. 27(3):263â65 www.annualreviews.org â¢ Analysis of Uncontrolled Confounding 35 5. Axelson O, Steenland K. 1988. Indirect methods of assessing the effects of tobacco use in occupational studies. Am. J. Ind. Med. 13(1):105â18 6. Breslow NE, Day NE. 1980. Statistical Methods in Cancer Research, Vol. 1: The Analysis of Case-Control Studies. Int. Agency Res. Cancer Sci. Publ. 32. Lyon, Fr.: Int. Agency Res. Cancer 7. Bross IDJ. 1966. Spurious effects from an extraneous variable. J. Chronic Dis. 19(6):637â47 8. Bross IDJ. 1967. Pertinency of an extraneous variable. J. Chronic Dis. 20(7):487â95 9. Brumback B, Greenland S, Redman M, Kiviat N, Diehr P. 2003. The intensity-score approach to adjusting for confounding. Biometrics 59(2):274â85 10. Cai Z, Brumback BA. 2015. Model-based standardization to adjust for unmeasured cluster-level confounders with complex survey data. Stat. Med. 34(15):2368â80 11. Cornï¬eld J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. 1959. Smoking and lung cancer: recent evidence and a discussion of some questions. J. Natl. Cancer Inst. 22:173â203 12. Faries D, Peng X, Pawaskar M, Price K, Stamey JD, Seaman JW. Evaluating the impact of unmeasured confounding with internal validation data: an example cost evaluation in type 2 diabetes. Value Health 16(2):259â66 13. Flanders WD, Khoury MJ. 1990. Indirect assessment of confounding: graphic description and limits on effect of adjusting for covariates. Epidemiology 1(3):239â46 14. Gail MH, Wacholder S, Lubin JH. 1988. Indirect corrections for confounding under multiplicative and additive risk models. Am. J. Ind. Med. 13(1):119â30 15. Goto A, Arah OA, Goto M, Terauchi Y, Noda M. 2013. Severe hypoglycaemia and cardiovascular disease: systematic review and meta-analysis with bias analysis. BMJ 347:f4533 16. Greenland S. 1996. Basic methods for sensitivity analysis of biases. Int. J. Epidemiol. 25(6):1107â16 17. Greenland S. 2003. The impact of prior distributions for uncontrolled confounding and response bias. J. Am. Stat. Assoc. 98(461):47â54 18. Greenland S. 2004. Bounding analysis as an inadequately speciï¬ed methodology. Risk Anal. 24(5):1085â92 19. Greenland S. 2005. Multiple-bias modelling for analysis of observational data (with discussion). J. R. Stat. Soc. Ser. A 168(2):267â306 20. Greenland S. 2009. Bayesian perspectives for epidemiologic research. III: Bias analysis via missing-data methods. Int. J. Epidemiol. 38(6):1662â73 21. Greenland S. 2014. Sensitivity analysis and bias analysis. In Handbook of Epidemiology, ed. W Ahrens, I Pigeot, pp. 685â706. New York: Springer. 2nd ed. 22. Greenland S, Pearl J, Robins JM. 1999. Causal diagrams for epidemiologic research. Epidemiology 10(1):37â 48 23. Helmich E, Boerebach BCM, Arah OA, Lingard L. 2015. Beyond limitations: improving how we handle uncertainty in health professions education research. Med. Teach. 37(11):1â8 24. Jain SH, Rosenblatt M, Duke J. 2014. Is big data the new frontier for academic-industry collaboration? JAMA 311(21):2171 25. Kaufmann SHE, Fletcher HA, Guzman CA, Ottenhoff THM. 2015. Big data in vaccinology: introduction and section summaries. Vaccine 33(40):5237â40 26. KlungsÃ¸yr O, Sexton J, Sandanger I, Nygard JF. 2009. Sensitivity analysis for unmeasured confounding Ë in a marginal structural Cox proportional hazards model. Lifetime Data Anal. 15(2):278â94 27. Larson EB. 2013. Building trust in the power of âbig dataâ research to serve the public good. JAMA 309(23):2443â44 28. Lash TL, Fink AK. 2003. Semi-automated sensitivity analysis to assess systematic errors in observational data. Epidemiology 14(4):451â58 29. Lash TL, Fox MP, Fink AK. 2011. Applying Quantitative Bias Analysis to Epidemiologic Data. New York: Springer 30. Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. 2014. Good practices for quantitative bias analysis. Int. J. Epidemiol. 43(6):1969â85 31. Lee W-C. 2011. Bounding the bias of unmeasured factors with confounding and effect-modifying potentials. Stat. Med. 30(9):1007â17 32. Li L, Brumback BA, Weppelmann TA, Morris JG, Ali A. 2016. Adjusting for unmeasured confounding due to either of two crossed factors with a logistic regression model. Stat. Med. 35(18):3179â88 36 Arah 33. Lin DY, Psaty BM, Kronmal RA. 1998. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948 34. Lipsitch M, Tchetgen ET, Cohen T. 2010. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 21(3):383â88 35. Luna X De, Waernbaum I, Richardson TS. 2011. Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 98(4):861â75 36. MacLehose RF, Kaufman S, Kaufman JS, Poole C. 2005. Bounding causal effects under uncontrolled confounding using counterfactuals. Epidemiology 16(4):548â55 37. McCandless LC, Gustafson P, Levy A. 2007. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26(11):2331â47 38. McCandless LC, Richardson S, Best N. 2012. Adjustment for missing confounders using external validation data and propensity scores. J. Am. Stat. Assoc. 107(497):40â51 39. McCulloch CE, Searle SR, Neuhaus JM. 2009. Generalized, Linear, and Mixed Models. Hoboken, NJ: John Wiley & Sons 40. Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, et al. 2011. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am. J. Epidemiol. 174(11):1213â22 41. Pearl J. 2009. Causal inference in statistics: an overview. Stat. Surv. 3:96â146 42. Pearl J. 2009. Causality: Models, Reasoning and Inference. New York: Cambridge Univ. Press. 2nd ed. 43. Pearl J. 2011. Invited commentary: understanding bias ampliï¬cation. Am. J. Epidemiol. 174(11):1223â27 44. Phillips CV. 2003. Quantifying and reporting uncertainty from systematic errors. Epidemiology 14(4):459â 66 45. Porta M, ed. 2014. A Dictionary of Epidemiology. New York: Oxford Univ. Press. 6th ed. 46. Robins JM, Rotnitzky A, Scharfstein DO. 2000. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment, and Clinical Trials, ed. ME Halloran, D Berry, pp. 1â94. New York: Springer 47. Rosenbaum PR. 2002. Observational Studies. New York: Springer. 2nd ed. 48. Rosenbaum PR. 2010. Design of Observational Studies. New York: Springer 49. Rothman KJ, Greenland S, Lash TL. 2008. Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins. 3rd ed. 50. Schlesselman JJ. 1978. Assessing effects of confounding variables. Am. J. Epidemiol. 108(1):3â8 51. Schneeweiss S. 2006. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol. Drug Saf. 15(5):291â303 52. Stamey JD, Beavers DP, Faries D, Price KL, Seaman JW. 2014. Bayesian modeling of cost-effectiveness studies with unmeasured confounding: a simulation study. Pharm. Stat. 13(1):94â100 53. Steenland K. 2004. Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Am. J. Epidemiol. 160(4):384â92 54. Sturmer T, Glynn RJ, Rothman KJ, Avorn J, Schneeweiss S. 2007. Adjustments for unmeasured conÂ¨ founders in pharmacoepidemiologic database studies using external information. Med. Care 45(10 Suppl. 2):S158â65 55. Sturmer T, Rothman KJ, Avorn J, Glynn RJ. 2010. Treatment effects in the presence of unmeasured Â¨ confounding: dealing with observations in the tails of the propensity score distributionâa simulation study. Am. J. Epidemiol. 172(7):843â54 56. Sturmer T, Schneeweiss S, Avorn J, Glynn RJ. 2005. Adjusting effect estimates for unmeasured confoundÂ¨ ing with validation data using propensity score calibration. Am. J. Epidemiol. 162(3):279â89 57. Sturmer T, Schneeweiss S, Rothman KJ, Avorn J, Glynn RJ. 2007. Performance of propensity score Â¨ calibrationâa simulation study. Am. J. Epidemiol. 165(10):1110â18 58. Sudan M, Kheifets L, Arah OA, Olsen J. 2013. Cell phone exposures and hearing loss in children in the Danish national birth cohort. Paediatr. Perinat. Epidemiol. 27(3):247â57 59. Uddin MJ, Groenwold RHH, Ali MS, de Boer A, Roes KCB, et al. 2016. Methods to control for unmeasured confounding in pharmacoepidemiology: an overview. Int. J. Clin. Pharm. 38(3):714â23 60. VanderWeele TJ. 2010. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology 21(4):540â51 www.annualreviews.org â¢ Analysis of Uncontrolled Confounding 37 61. VanderWeele TJ. 2013. Unmeasured confounding and hazard scales: sensitivity analysis for total, direct, and indirect effects. Eur. J. Epidemiol. 28(2):113â17 62. VanderWeele TJ. 2016. Mediation analysis: a practitionerâs guide. Annu. Rev. Public Health 37:17â32 63. Vanderweele TJ, Arah OA. 2011. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22(1):42â52 64. Vanderweele TJ, Mukherjee B, Chen J. 2012. Sensitivity analysis for interactions under unmeasured confounding. Stat. Med. 31(22):2552â64 65. Vanderweele TJ, Shpitser I. 2011. A new criterion for confounder selection. Biometrics 67(4):1406â13 66. VanderWeele TJ, Shpitser I. 2013. On the deï¬nition of a confounder. Ann. Stat. 41(1):196â220 67. Westreich D, Greenland S. 2013. The table 2 fallacy: presenting and interpreting confounder and modiï¬er coefï¬cients. Am. J. Epidemiol. 177(4):292â98 68. Yanagawa T. 1984. Case-control studies: assessing the effect of a confounding factor. Biometrika 71(1):191â 94 Arah
Annual Review of Public Health – Annual Reviews
Published: Mar 20, 2017
Access the full text.
Sign up today, get DeepDyve free for 14 days.