Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Comparing performance between log-binomial and robust Poisson regression models for estimating risk ratios under model misspecification

Comparing performance between log-binomial and robust Poisson regression models for estimating... Background: Log-binomial and robust (modified) Poisson regression models are popular approaches to estimate risk ratios for binary response variables. Previous studies have shown that comparatively they produce similar point estimates and standard errors. However, their performance under model misspecification is poorly understood. Methods: In this simulation study, the statistical performance of the two models was compared when the log link function was misspecified or the response depended on predictors through a non-linear relationship (i.e. truncated response). Results: Point estimates from log-binomial models were biased when the link function was misspecified or when the probability distribution of the response variable was truncated at the right tail. The percentage of truncated observations was positively associated with the presence of bias, and the bias was larger if the observations came from a population with a lower response rate given that the other parameters being examined were fixed. In contrast, point estimates from the robust Poisson models were unbiased. Conclusion: Under model misspecification, the robust Poisson model was generally preferable because it provided unbiased estimates of risk ratios. Keywords: Log-binomial regression, Robust (modified) Poisson regression, Model misspecification, Risk ratio, Link function misspecification Background regression analysis were comparable to risk ratios [4]. Introduction However, it was found that these methods produced Logistic regression is the most widely-used modeling ap- prevalence greater than one [5]. Greenland [6] brought ex- proach for studying associations between exposures and tensive literature on valid model-based estimation of rela- binary outcomes. For rare events, the odds ratio estimated tive risks and other measures to readers’ attention. Of the from logistic regression approximates the risk ratio (RR). model-based approaches, log-binomial [5, 7, 8] and robust However, when events are common, odds ratios always (or modified) Poisson regression models [7, 9]are most overestimate risk ratios [1] Zhang and Yu [2]suggested a frequently applied to estimate risk ratios for common bin- correction for odds ratios to give a risk ratio in studies of ary outcomes. Barros et al. [7]and Zou[9]showedhow common outcomes. This method was subsequently shown risk ratios can be estimated by using robust Poisson re- to result in inconsistent point estimates as well as invalid gression with a robust error variance. confidence intervals [3]. Efforts were made to modify the In medical and public health research, log-binomial data so that estimated odds ratios from a logistic and robust Poisson regression models are widely used to directly estimate risk ratios for both common and rare outcomes. For example, they can be used to estimate the * Correspondence: Wansu.Chen@KP.org effect of clinical characteristics (e.g. obesity, smoking, Kaiser Permanente Southern California, Department of Research and Evaluation, 100 S. Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA history of stroke, exercise, or diet) on a health condition Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 2 of 12 (e.g. a cardiovascular event, mortality, or hospital admis- controlling for age, gender, race/ethnicity, number of sion). Our own Medline search of articles published be- aeroallergen sensitivities, clinical center, FEV %pre- tween 2005 and 2014 demonstrated that the popularity dicted (≥80% vs. < 80%) and ACT score (> 19, 16–19, of both models has increased significantly in the past < 16). Age remained as a continuous variable in the decade (Additional file 1). model. Model misspecification can lead to biased estimates, Based on the robust Poisson model, the risk ratio of resulting in erroneous and misleading conclusions. Re- having ≥7 SABA canisters over the past 12 months was gression models require that the relationship between 2.05 (95% CI, 1.03–4.05) in patients whose FeNO value the response and the explanatory variables conforms to was in the second quartile, compared to patients whose a particular functional form. Omitting important ex- FeNO value was in the first quartile (Additional file 2). planatory variables, failing to account for non-linear However, when the analysis was repeated by using the components or critical interaction terms, or making log-binomial model, the corresponding risk ratio was measurement errors can cause model misspecification 1.67 (95% CI, 0.83–3.57), a 19% reduction from the ro- and thus bias the parameter estimators of one or more bust Poisson estimate. Although the point estimates for of the predictors in a regression model. Under correct FeNO from the two models were in the same direction, model specification, the log-binomial and robust Poisson the interpretation of the results varied because the 95% methods have been shown to yield comparable point es- confidence interval of the estimated RR from the robust timates and standard errors [7, 9–12]. For instance, in a Poisson covered one, yet the log-binomial model did simulation study with sample size of 1000, both the not. The RR for patients whose FeNO values were in the log-binomial and robust Poisson models yielded similar third and fourth quartiles compared to the patients in unbiased estimates with good coverage probability [12]. the first quartile were also different between the two However, the results of the following example revealed a models, although the interpretations remained the same different paradigm. at the 95% level (see Additional file 2). The differences in point estimates of RR between the two models were A motivating example in the range of 12–19%. A cross-sectional study was conducted at Kaiser Perma- nente Southern California to assess the association of Current study fractional exhaled nitric oxide (FeNO), a marker of air- The inconsistent conclusions between the aforemen- way inflammation, and asthma burden among persistent tioned simulation studies [7, 9–12] and the results from asthma patients who were treated with inhaled cortico- the motivating example led us to further investigate steroids (ICS). It was hypothesized that high FeNO levels whether model misspecification and/or characteristics of were associated with greater asthma burden. In this the data resulted in such large discrepancies between study, the primary binary outcome of interest was log-binomial and the robust Poisson regression models. whether seven or more short-acting beta-agonist (SABA) An initial attempt was made to understand the impact canisters were dispensed in the past 12 months, which is of model misspecification on the performance of the two a validated administrative data surrogate for asthma im- regression models when an important explanatory vari- pairment. Previous studies based on similar patient pop- able was omitted, a higher order term of non-linear ex- ulations revealed that the prevalence of asthma planatory variable was ignored, or an interaction term impairment among persistent asthma patients was high was overlooked (Kaiser Permanente, Pasadena; Univer- [13]. The study population consisted of asthma patients sity of Southern California, Los Angeles; unpublished re- 12 to 56 years of age who had aeroallergen sensitization sults. Additional file 3). Although certain types of model and regular ICS treatment for at least one month were misspecification did bias the point estimates, both the enrolled during an allergy department visit (index visit). magnitude and the direction of the biases were compar- Information on patient demographics, FeNO, forced ex- able between the two models. The current simulation piratory volume in one second (FEV % predicted) and study was subsequently designed to examine the impact asthma control test (ACT) score was collected during of misspecified link functions and truncated probabil- the index visit, while the information on aeroallergen ities, one type of linear predictor misspecification. To sensitization and SABA canisters dispensed came from our knowledge, the extent of such an inconsistency electronic medical records. between the two models has not been systematically ex- FeNO was categorized into four quartiles. Multivari- amined. Inspired by the differences in the estimating able log-binomial and robust Poisson regression models equations of the two models, we also examined the im- were applied to estimate the risk ratio of having seven or pact of having observations with large probabilities of more SABA canisters in each of the FeNO quartiles developing the response on the performance of the two (using the lowest quartile as the reference group), models when the overall response rate varies. Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 3 of 12 The paper is organized as follows: The “Methods” −1 ðÞ tþ1 0 β ¼ðÞ X WX X Wz ð2Þ section presents the theory behind the two models to explain the differences in the estimation methods that ðÞ t could result in variations in the estimates. The "Methods" Y−P β ðtÞ n;k where z ¼ Xβ þ ; X ¼ x ∈R ; i; j section also provides the details of the simulation design. ðÞ t P β The "Results" section shows the results of simulation under various scenarios. Lastly, we summarize the find- Y−PðÞ β y −p ðÞ β y −p ðÞ β y −p ðÞ β 1 1 2 2 n n ¼ ; ; …; ; ings and provide recommendations for the use of these PðÞ β p ðÞ β p ðÞ β p ðÞ β 1 2 n models in future studies in the “Discussion” section. p ðÞ β and weight W ¼ Diag ; 1−p ðÞ β Methods i ¼ 1; 2; …; n; j ¼ 1; 2; …; k: Generalized linear models (GLM) originate from a signifi- cant extension of traditional linear regression models [14]. The iteration process continues until β stabilizes. The They consist of a random component that specifies the weights W used in the iterative process contain p (β)in conditional distribution of the response variable (Y)from the numerator and 1 – p (β) in the denominator, where an exponential family given the values of the explanatory p (β) = exp (x β) with a range from 0 to 1. When p (β)is i i variables X X ···,X , a linear predictor (or systematic) com- 1, 2, k a very small number, the weight approximates p (β). ponent that is a linear function of the predictors, When p (β) approaches one, the weight approaches in- ƞ=β +β X +β X +···+β X ,where β=(β ,β ,...,β ) is the 0 1 1 2 2 k k 0 1 k finity. This suggested that the IRLS approach is highly vector of the parameters, and a smooth invertible link influenced by observations that have large p (β). More- function that transforms the expectation of the re- over, the impact is also influenced by the average p (β), sponse variable, μ ≡ E(Y), to the linear predictors: or the average weight (lower average p (β) is associated g(μ)=ƞ=β +β X +β X +···β X . For example, the most 0 1 1 2 2 k k with lower average weight). For illustration, we con- common link for binary outcomes is the logit (i.e. log structed two hypothetical samples each with five obser- (μ/(1-μ))) in a logistic model, the log (μ) in a Poisson vations having the following probabilities: sample 1 model, or a log-binomial model. = {0.1, 0.3, 0.4, 0.5, 0.95} and sample 2 = {0.02, 0.03, 0.08, In the descriptions below, Y and x ¼ð1; x ; x ; …; x Þ i i1 i2 ik 0.15, 0.95}. The corresponding weights for the two sam- denote the binary outcome and the row vector comprised of ples were {0.25, 0.43, 0.67, 1, 19} and {0.02, 0.03, 0.09, th k predictors for the i individual (i = 1,2,…n), respectively. 0.18, 19}, respectively. In sample 2, the observation with The observations from the n individuals are independent. weight 19 will impact the point estimate more, com- pared the observation in sample 1 with the same weight. Log-binomial regression In the GLM framework, the conditional distribution of Y Robust Poisson regression given the predictor variables is binomial, with the mean re- In robust Poisson regression, a quasi-likelihood (QL) sponse related to the predictors by the link function log model can be applied to fit the data with a binary out- (μ ). In log-binomial regression, μ is often denoted as p,be- i i i come [14–18]. Quasi-likelihood was first introduced by cause E(Y ) is a probability with a value between zero and Wedderburn (1974) as a function that has properties one. Although there are other methods to obtain efficient analog to those of log-likelihood functions [18]. Similar estimators, the maximum likelihood approach is used to to ML method, maximum QL method can be used to es- generate asymptotically efficient estimators (maximum like- timate the QL estimates. In a maximum QL model, only lihood estimates (MLE)) in log-binomial regressions [5, 8]. the relationship between the mean and the variance (i.e. The MLE of log-binomial models are derived from an the variance is a function of the mean) needs to be spe- iteratively reweighted least squares (IRLS) approach [15]. cified instead of the underlying distribution of the data In a log-binomial regression, logðP ðβÞÞ ¼ x β where [15–19]. It can be shown that when Y comes from the p (β)=Pr(y =1|x ),0 ≤ p ≤ 1, and x β < 0 (constrained). i i i i exponential family, the quasi-score function is identical The log-likelihood is given by to the score function associated with the maximum like- lihood of the GLM. n n X X When the Poisson distribution is chosen, the 84%ℓðÞ β ¼ y logðÞ p ðÞ β þ ðÞ 1−y logðÞ 1−p ðÞ β i i i i quasi-score function can be simplified to S ðβÞ¼ i¼1 i¼1 ϕ i¼1 ðy −μ Þx , resulting in quasi-score estimating equations ð1Þ ij i i S ðβÞ¼ ðy −μ Þx ¼ 0, which is the same as the es- j ij i¼1 i i It can be proven that the MLE for β can be found by timating equations of the Poisson regression models. In the following iteration (Additional file 4) the two equations above, ϕ is the dispersion parameter, Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 4 of 12 and j = 1,2,…k. The final estimate from the Relative bias was defined as the average of the 1000 esti- mated RR in log scale minus the log of the true RR quasi-scoring procedure satisfies the condition S ðβÞ¼ 0 ^ divided by the log of the true RR. For θ , the estimated and β is a consistent and asymptotically unbiased esti- th log RR from the m dataset using either the mate of β [20]. β does not depend on ϕ. log-binomial model or the robust Poisson model, the rela- The quasi-likelihood estimators are not maximally and 1;000 θ − logðtrueRRÞ 1 m asymptotically efficient [14]. The robust Poisson regres- tive bias was defined as  100%. 1;000 logðtrueRRÞ m¼1 sion model uses the classical sandwich estimator under Standard error was defined as the empirical SE of the esti- the generalized estimation equation (GEE) framework to mated risk ratio in log scale over all 1000 simulations. The provide accurate standard errors for the elements [19–21]. MSE was calculated by taking the sum of the squared bias The variance-covariance matrix is in log scale and the variances, in which the bias was speci- 1;000 "#"#"# −1 −1 −1 n nhi n fied as θ −logðtrueRRÞ. X X X 1;000 m¼1 EI½ ðÞ β E ðS ðÞ β S ðÞ β EI½ ðÞ β i i i i Because both SE and MSE depended on the sample i¼1 i¼1 i¼1 size, the process described above was repeated for sam- ð3Þ ple of size 500 for all scenarios with RR = 3. ∂S ðβÞ where I ðβÞ¼ − is the information matrix [22]. A ∂β Simulated datasets consistent estimate of the variance can be obtained by Let Y be a common binary outcome (Y = 1 for disease evaluating the variance-covariance matrix at β. and Y = 0 for non-disease) and X be a binary exposure variable (X = 1 for exposure and X = 0 for non-exposure). Implementation First, uncorrelated random variables Z and Z following 1 2 Both regression models were implemented in SAS [23] the Bernoulli (0.5) and the Uniform [0, 1] distributions, (SAS Software Version 9.3 of the SAS System for Unix. respectively, were generated for 1000 subjects. These Cary, NC. SAS Institute Inc. 2011). The SAS codes can be distributions were chosen for their simplicity in the de- found in Additional file 5. For the log-binomial model, − 4 sign. Then, the exposure variable X based on the was set as the initial value of the intercept. For both subject-specific probability of exposure, defined by the models, the weighted least squares estimates (default) equation logit (P (X =1| Z , Z )) = − 1.0 + Z + Z with 1 2 1 2 were used as initial values of parameters. The convergence E(P(X =1| Z , Z )) = 0.5, was created for each subject. − 4 1 2 criterion was 10 (default). A well-known issue of All of the outcome variables defined below were condi- log-binomial models is failure to converge when the MLE tional on the exposure status and the covariates. For is located on the boundary of the parameter space (i.e. the exposed subjects, P (Y =1| X =1, Z , Z )=3× P (Y =1| 1 2 predicted probability of the outcome is equal to 1). To X =0, Z , Z ). The adjusted RR (i.e. P (Y =1| X =1, Z , 1 2 1 minimize the convergence issue, the COPY method was Z )) / P (Y =1| X =0, Z , Z )) was fixed at 3.0, chosen to 2 1 2 applied [24, 25] in which the number of virtual copies was reflect the effect size commonly seen in real-world set- set to 10,000. To ensure a fair comparison between the tings, and strong enough to yield observable differences in log-binomial and robust Poisson models, the evaluation performance between the two regression models. was conducted by only using the results based on exactly the same simulated data. If the COPY method did not converge for a dataset, the same dataset was then removed Scenarios to study the impact of truncation before the performance of the robust Poison models was The equations to generate Y took four different forms (Y , evaluated. The exclusion of datasets was very rare in this Y , Y , Y ) to enable the examination of the impact of trun- 2 3 4 study. Details on the number of excluded datasets can be cation. Unlike Y , which always had a perfect linear associ- found in the “Discussion” section. ation with its predictors (i.e. not truncated), Y , Y and Y 2 3, 4 were truncated such that the values of P (Y =1| X =0, Z , k 1 Measures of model performance Z ) depended on whether or not “Z + (beta of Z )× Z ” 2 1 2 2 For each simulated scenario, the simulation process was reached a threshold (k = 2, 3, and 4) (Table 1). For ex- repeated 1000 times. In each of the 1000 simulated data- ample, in Scenario I-2, the threshold was set at 0.15, sets, the log risk ratio was estimated from the requiring “Z +3× Z ” to be greater than 0.15 to impact 1 2 log-binomial model and the robust Poisson model, re- P (Y =1| X =0, Z , Z )). The defined threshold varied by 2 1 2 spectively. For each scenario, the relative bias, standard scenario and was chosen such that the percentages of ex- error (SE), and mean square error (MSE) in log scale for posed subjects at the maximum P (Y =1| X, Z , Z )) could 1 2 all three measures were calculated by summarizing the be controlled within the range of 1.4–5.8%. A truncation results from the 1000 datasets for each regression model. yielded a spike of observations at the maximum P (Y =1| Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 5 of 12 Table 1 Design of the simulation data Scenario Scenario Models to generate simulation datasets Model misspe-cified? Max P of Beta of Z Link % of exposed exposed Function at Max P I-1 0.75 3 log 0 log (P (Y =1| X =0, Z , Z )) = −1.38 – Z –3* Z No 1 1 2 1 2 I-2 0.75 3 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.23 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 I-3 0.75 3 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 1.08 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 I-4 0.75 3 log 5.8 log (P (Y =1| X =0, Z , Z )) = − 0.78 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 II-1 0.85 3 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.26 – Z –3* Z No 1 1 2 1 2 II-2 0.85 3 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.11 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 II-3 0.85 3 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.96 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 II-4 0.85 3 log 5.8 log (P (Y =1| X =0, Z , Z )) = − 0.66 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 III-1 0.95 3 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.15 – Z –3* Z No 1 1 2 1 2 III-2 0.95 3 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.00 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 III-3 0.95 3 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.85 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 III-4 0.95 3 log 5.8 log (P (Y =1| X =0, Z , Z )) = − 0.55 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 IV-1 0.95 2 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.15 – Z –2* Z No 1 1 2 1 2 IV-2 0.95 2 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.05 – max (Z +2 * Z , 0.10) Yes 2 1 2 1 2 IV-3 0.95 2 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.95 – max (Z +2 * Z , 0.20) Yes 3 1 2 1 2 IV-4 0.95 2 log 5.8 log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.75 – max (Z1 + 2 * Z2, 0.40) Yes V-1 0.95 4 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.15 – Z –4* Z No 1 1 2 1 2 V-2 0.95 4 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 0.95 – max (Z +4 * Z , 0.20) Yes 2 1 2 1 2 V-3 0.95 4 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.75 – max (Z +4 * Z , 0.40) Yes 3 1 2 1 2 V-4 0.95 4 log 5.8 log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.35 – max (Z1 + 4 * Z2, 0.80) Yes VI-1 0.95 3 logit 0 logit (P (Y =1| X =0, Z , Z )) = − 0.76 – Z –3* Z Yes 1 1 2 1 2 VI-2 0.95 3 logit 1.4 logit (P (Y =1| X =0, Z , Z )) = − 0.61 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 VI-3 0.95 3 logit 2.8 logit (P (Y =1| X =0, Z , Z )) = − 0.46 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 VI-4 0.95 3 logit 5.8 logit (P (Y =1| X =0, Z , Z )) = − 0.16 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 VII-1 0.95 3 probit 0 probit (P (Y =1| X =0, Z , Z )) = − 0.48 – Z –3* Z Yes 1 1 2 1 2 VII-2 0.95 3 probit 1.4 probit (P (Y =1| X =0, Z , Z )) = − 0.33 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 VII-3 0.95 3 probit 2.8 probit (P (Y =1| X =0, Z , Z )) = − 0.18 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 VII-4 0.95 3 probit 5.8 probit (P (Y =1| X =0, Z , Z )) = − 0.12 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 Maximum P (Y =1| X =1, Z , Z ) k 1 2 Models to generate Y for unexposed subjects. For exposed subjects, P (Y =1| X =1, Z , Z ) =3*P (Y =1| X =0, Z , Z ). k = 1, 2, 3, and 4 k k 1 2 k 1 2 Model was defined as misspecified when the link function was not ‘log’ or the % of exposed at maximum P (Y =1| X =0, Z , Z ) was greater than 0 k 1 2 X, Z , Z )) for both exposed and unexposed subjects. values set above (0.75, 0.85, and 0.95) were 1.4, 2.8, and 1 2 When other parameters were fixed, a spike goes higher 5.8%. These values were derived by simulation and rep- with the increase of the threshold. This allowed us to resented the various levels of alteration of the linear study how the volume of large P (Y =1| X, Z , Z )) im- predictors. 1 2 pacted model performance. The intercepts were manually calculated to satisfy P (Y =1 | X =0, Z , Z ) ≤ 0.75/3, 0.85/3 and 0.95/3, re- k 1 2 Scenarios to study the impact of maximum P(Y = 1) spectively. For example, for the equation log (P (Y =1| First, scenarios I-1, II-1, and III-1 were created using the X =0, Z , Z )) = α – Z –3* Z in scenario I-1, α = 1 2 1 2 log link function. The maximum values of P (Y =1| X = log(0.75/3) = − 1.38 since the logarithm is an increasing 1, Z , Z )) were set to 0.75, 0.85 and 0.95, respectively, function and hence the maximum of P (Y =1| X =0, Z , 1 2 1 1 to study the impact of maximum P (Y =1| X =1, Z , Z ) is achieved when Z = 0 and Z = 0. For the same 1 1 2 1 2 Z )) (Table 1). The selected thresholds were set at 0.15, reason, for the equation log (P (Y =1| X =0, Z , Z )) = α 2 2 1 2 0.30, and 0.60 for Y , Y and Y , respectively. The per- – Z – max (Z +3* Z , 0.15) in scenario I-2, α = 2 3, 4 1 1 2 centages of exposed subjects who were at the maximum log(0.75/3) + 0.15 = − 1.23. The 12 scenarios designed to Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 6 of 12 study the impact of large P(Y=1) were listed in the first level of misspecification, measured by the percentage of section of Table 1 (in the first 12 rows). exposed subjects at maximum P, increased. Large P(Y = 1) was associated with an increased level of bias when the percentage of exposed subjects having Scenarios to study the impact of coefficient of Z the max P (Y =1| X =1, Z , Z ) was fixed; refer to the k 1 2 To study the impact of the entire distribution of first three rows of Table 2 and Fig. 1, Panel A. For ex- P(Y = 1) when large P(Y = 1) existed, eight more scenar- ample, when the percentage of exposed subjects whose ios were produced, with the beta coefficient of Z being maximum P (Y =1| X =1, Z , Z ) was fixed at 5.8, an k 1 2 2 and 4 (shown in the middle section of Table 1), to join increase of maximum P (Y =1| X =1, Z , Z ) from 0.75 k 1 2 the four scenarios for which the beta coefficient of Z (scenario I-4) to 0.95 (scenario III-4) resulted a change was set to 3 (i.e. III-1, III-2, III-3, and III-4). The distri- in relative bias from − 4.1% to − 11.7%. bution of P(Y = 1) was shifted towards zero as the beta The impact of average P(Y = 1) was displayed from the coefficient of Z increased. Thus, these scenarios allowed 4th to the 6th rows of Table 2 and in Panel B of Fig. 1. us to study the impact of the outcome distribution, or When the percentage of exposed subjects having the the average P(Y = 1). max P (Y =1| X =1, Z , Z ) was fixed, the log-binomial k 1 2 The intercepts and the thresholds were generated models were more vulnerable (larger absolute value of using the same approach as described in the previous relative biases) when the beta coefficient of Z increased section. Because Z follows the uniform distribution, the (i.e. average P (Y =1| X =1, Z , Z ) decreased). For ex- 1 2 thresholds increase proportionally with the beta coeffi- ample, when the percentage of exposed subjects whose cients. For example, the threshold to make 1.4% of ex- maximum P (Y =1| X =1, Z , Z ) = 2.8, the relative bias k 1 2 posed subjects reached the maximum P (Y =1| X =1, changed from − 3.8% to − 8.5% when the beta coefficient Z , Z ) = 0.1 when the beta of Z was 2 and increased to 1 2 2 increased from 2 to 4. This indicates that the value of 0.2 when the beta of Z was 4. the average P(Y = 1) impacts the performance of the log-binomial models. When there were enough large Scenarios to study the impact of misspecified link functions P(Y = 1), a low average P(Y = 1) away from large The link function was altered from log to logit and pro- P(Y = 1) was associated with a large relative bias. bit in scenarios VI and VII to assess the model perform- When the underlying distribution of data was logit, ance when the link functions were misspecified; refer to misspecifying the link function as ‘log’ did not signifi- the last section of Table 1. For scenarios VI-2, VI-3, cantly influence relative biases. However, when the VI-4, VII-2, VII-3, and VII-4, not only were the link underlying distribution of data was probit, the bias (− functions misspecified, but also the responses depended 5.9%) was noticeable even when the linear predictors on covariates with truncated probabilities. were properly specified (i.e. percentage of exposed sub- jects having the max P (Y =1| X =1, Z , Z ) was zero). k 1 2 Scenarios with a weaker association between exposure and Refer to the last three rows of Table 2 and Panel C of outcome (RR = 2) Fig. 1. Misspecifying the link function from log to probit To understand the impact of misspecified link functions in the presence of misspecified linear predictors had a and truncation when RR is different from 3.0, we also serious consequence. The relative bias was almost − 18% generated scenarios with parameters identical to those in when the percentage of exposed subjects at the max- III-1, III-2, III-3, III-4, VII-1, VII-2, VII-3 and VII-4, ex- imum P was only 2.8. cept that this time RR = 2.0 instead of 3.0. Standard error Results In all simulation scenarios, the SE of the two models Relative bias were comparable (Table 3). At the 2nd decimal point, The relative biases of the estimated RR in log scale from the SE derived from the log-binomial models were either the two models in each of the 28 scenarios when n = the same or slightly smaller compared to those of robust 1000 are shown in Table 2 and Fig. 1. Poisson models. The largest difference, 0.03 (=0.23– As expected, both models accurately estimated β or log 0.20), occurred when the data distribution was probit, (RR) when they were correctly specified, regardless of the maximum P (Y =1| X =1, Z , Z ) was 0.95, and the beta 1 2 value of the maximum P. When the models were misspe- of Z was 3 (scenario VII-1). When the models correctly cified (as shown in the shaded areas in Table 2), the rela- specified (in the unshaded area of Table 3), the SE of the tive biases of the robust Poisson models were negligible, two models were the same except for one scenario (V-1) while those of log-binomial models tended to negatively in which the SE from log-binomial models was smaller bias away from null. For the log-binomial models, the by 0.01. The SE of the log-binomial models were always magnitude of biases increased in all scenarios when the smaller compared to those of robust Poisson models at Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 7 of 12 Table 2 Relative bias (%) in log scale with and without model misspecification (n = 1000) Scenario 1 2 3 4 LB RP LB RP LB RP LB RP I- 0.4 0.6 1.0 1.5 -1.4 0.7 -4.1 1.1 II- 0.4 0.3 -0.5 0.8 -3.0 1.0 -7.1 0.9 III- 0.9 1.4 -2.3 1.0 -6.1 0.8 -11.7 0.5 IV- 0.2 0.6 -1.5 0.4 -3.8 0.2 -6.4 1.0 III- 0.9 1.4 -2.3 1.0 -6.1 0.8 -11.7 0.5 V- 0.1 0.7 -4.6 -0.2 -8.5 1.2 -15.4 1.2 III- 0.9 1.4 -2.3 1.0 -6.1 0.8 -11.7 0.5 VI- -1.7 0.6 -5.0 0.0 -6.8 1.2 -11.0 0.9 VII- -5.9 2.2 -12.8 0.9 -18.0 0.4 -21.9 1.0 Unshaded: Models were correctly specified Light shaded: Misspecified linear predictors or misspecified link function Dark shaded: Misspecified linear predictors and misspecified link function LB: log-binomial, RP: robust Poisson Change of scenarios: Increasing intercept: I→ II→ III; Increasing coefficient of β :IV→ III→ V; Change of link function: III (log), VI (logit), VII (probit) the 3rd decimal place, even if the models were correctly between the two models becoming larger from IV-3 to specified (data not shown). III-3, and to V-3. For scenarios III-4 to VII-4, the robust Poisson models consistently outperformed the Mean square error log-binomial models. For scenarios of VII (in which the As expected, when the models were correctly specified (in underlying data were generated using a probit link), the the unshaded area in Table 4), log-binomial models MSE of log-binomial models, compared to those of robust yielded the same or marginally smaller MSE compared to Poisson models, were slightly smaller when the linear pre- robust Poisson models. When the underlying distribution dictors were properly specified (such as in scenario VII-1) of data had a log or logit link (scenarios I-VI) and the per- and significantly larger when the linear predictors were centage of exposed subjects having the max P (Y =1| X = improperly specified, even when level of misspecification 1, Z , Z ) was 1.4% or 2.8% (in the columns labeled as ‘2’ of the linear predictors was small. For example, when the 1 2 and ‘3’ in Table 4), the MSE of the two models were still percentage of exposed subjects at the 0.95 (max P) was comparable, except for one scenario. The exception oc- only 1.4, the MSE were 0.051 and 0.038, from the curred in scenario V-3, where the superiority of the robust log-binomial model and the robust Poisson model, Poisson models was quite noticeable; the difference in respectively. MSE between the two models was 0.05. Recall that the average P(Y = 1) for V-3 was smallest among all three sce- Distribution of P(Y = 1) narios (III-3, IV-3 and V-3) and followed by that of III-3. The distributions of P(Y = 1) for all simulated data (one Thus, it is not surprising to observe the difference million data points for each scenario) are shown in Fig. 2. Fig. 1 Percentage bias in log(RR) scale. a From left to right: increasing intercept(scenario I→II →III); b From left to right: increasing coefficient of β (scenario IV→III→V); c From left to right: change of link function (scenarios III, VI and VII). Red lines: Robust Poison; Blue lines: Log-binomial 2 Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 8 of 12 Table 3 Standard error in log scale with and without model misspecification (n = 1000) Scenario 1 2 3 4 LB RP LB RP LB RP LB RP 0.20 0.20 0.18 0.18 0.16 0.17 0.14 0.15 I- II- 0.18 0.18 0.16 0.17 0.15 0.15 0.13 0.14 III- 0.17 0.17 0.15 0.16 0.14 0.15 0.12 0.13 IV- 0.13 0.13 0.13 0.13 0.12 0.13 0.11 0.12 III- 0.17 0.17 0.15 0.16 0.14 0.15 0.12 0.12 V- 0.18 0.19 0.15 0.16 0.15 0.16 0.12 0.13 III- 0.17 0.17 0.15 0.16 0.14 0.15 0.12 0.13 VI- 0.14 0.15 0.13 0.14 0.13 0.14 0.11 0.12 VII- 0.20 0.23 0.18 0.19 0.16 0.17 0.14 0.15 Unshaded: Models were correctly specified Light shaded: Misspecified linear predictors or misspecified link function Dark shaded: Misspecified linear predictors and misspecified link function LB: log-binomial, RP: robust Poisson Change of scenarios: Increasing intercept: I→ II→ III; Increasing coefficient of β :IV→ III→ V; Change of link function: III (log), VI (logit), VII (probit) Panel A shows the distribution of P(Y = 1) with varying and unexposed subjects increased in all scenarios when maximum P(Y = 1). When the link function and beta of the percentage of exposed subjects at the max P(Y = 1) Z were fixed at ‘log’ and 3, respectively, an increase of increased from 1.4, 2.8 to 5.8%, indicating a higher level max P(Y = 1) from 0.75 (scenario I) to 0.95 (scenario III) of violation of the linearity assumption. The distribu- stretched the spikes to the right (Fig. 2 Panel A). tions of P(Y = 1) for data with log and logit link func- Panel B shows the distribution with varying beta of Z . tions were similar; however, the distributions of P(Y = 1) When the beta of Z increased from 2 (scenario VI) to 4 for data with probit link function distinguished them- (scenario V) while the max P(Y=1) was fixed at 0.95 and selves significantly from those of log or logit. This obser- the linkfunctionwas set as ‘log,’ the distribution of P(Y = 1) vation explains the larger biases seen in Fig. 1 Panel C, became taller, thinner, and shifted towards zero. This when the underlying distribution of data were probit, provides the evidence that attributes the increase of biases compared to those of log and logit distributions, even to the downward shift of the distribution of P(Y=1) (i.e. when the predictors were perfectly linear. decrease in the prevalence of the outcome variable Y). Panel C shows the distribution with varying link func- Results of moderate sample size (n = 500) tion. Spikes (i.e. vertical thin lines in all figures except When the simulation was conducted based on samples those labeled with “0% at the max P”) for both exposed of moderate size (n = 500), the same pattern was observed Table 4 Mean square error (MSE) in log scale with and without model misspecification (n = 1000) Scenario 1 2 3 4 LB RP LB RP LB RP LB RP I- 0.040 0.040 0.032 0.033 0.026 0.028 0.022 0.022 II- 0.032 0.033 0.026 0.028 0.023 0.024 0.023 0.019 III- 0.028 0.031 0.022 0.025 0.023 0.021 0.031 0.016 IV- 0.017 0.018 0.016 0.018 0.016 0.016 0.017 0.014 III- 0.028 0.031 0.022 0.025 0.023 0.021 0.031 0.016 V- 0.032 0.036 0.026 0.027 0.031 0.026 0.044 0.018 III- 0.028 0.031 0.022 0.025 0.023 0.021 0.031 0.016 VI- 0.019 0.022 0.020 0.019 0.021 0.019 0.027 0.014 VII- 0.046 0.052 0.051 0.038 0.063 0.028 0.076 0.021 Unshaded: Models were correctly specified Light shaded: Misspecified linear predictors or misspecified link function Dark shaded: Misspecified linear predictors and misspecified link function LB: log-binomial, RP: robust Poisson Change of scenarios: Increasing intercept: I→ II→ III; Increasing coefficient of β :IV→ III→ V; Change of link function: III (log), VI (logit), VII (probit) 2 Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 9 of 12 Fig. 2 Distribution of P(Y = 1). Y-axis: Percent; X-axis: P(Y = 1). a From left to right: increasing intercept (scenario I→II→III); b From left of right: increasing coefficient of β (scenarion IV→III→V); c From left to right: change of link function (scenarios III, VI and VII) in terms of relative biases and SE (Additional file 6, Tables RR for common binary outcomes were examined when AF6.1-AF6.2). As expected, the SE based on the samples the link function was misspecified or when the probabil- of moderate size was larger compared to those derived ity of the response variables was truncated at the right from samples with n = 1000. The same pattern was also tail. Our findings suggest that point estimates from observed for MSE (Additional file 6, Table AF6.3); how- log-binomial models were biased when the link function ever, the differences between the two models were not as was misspecified or when the probability distribution of substantial as seen in the samples of sizes 1000. the response variable was truncated for even a small proportion of observations. For log-binomial models, the Results with a weaker association between exposure and percentage of truncated observations was positively asso- outcome (RR = 2) ciated with the presence of the bias. The bias was more When the simulation was conducted based on simula- significant if these observations came from a population tion datasets with RR = 2, the relative biases were similar in which the response rate was lower, given the other to those of the corresponding scenarios with RR = 3 parameters being examined were fixed. (data not shown). The standard errors derived from the For MLE based methods, misspecification can cause simulation datasets with RR = 2 were 12–36% smaller inconsistent estimators of parameters [20]. Lumley et al. compared to those of the corresponding scenarios with (2006) pointed out that compared to robust Poisson and RR = 3; however, the pattern remained the same. That is, other non-MLE based models, log-binomial models the standard errors of the two models were comparable (MLE based) have very large weights when p (referred with those from the robust Poisson models being only by authors as μ) is large ([26] Fig. 1). The same authors slightly larger than those from the log-binomial models. also pointed out that for log-binomial models, “a single The MSE yielded from the simulation datasets with point with μ close to 1 can have arbitrarily large influ- RR = 2 were 16–47% smaller compared to those of the ence despite having bounded covariate values”. Our ob- matching datasets with RR = 3. Nevertheless, similar to servation was consistent with that of Lumley et al. We the patterns observed for RR = 3, the robust Poisson had demonstrated that when the percentage of observations lower MSE compared to the log-binomial models, and the with large P increased, the magnitude of bias also in- differences were more dramatic when a probit link was creased. This may explain the less optimal performance used (versus a log link) and when the data had a higher of the model applied to data generated using a probit percentage of truncation. link compared to that of log or logit link when other pa- rameters were fixed (Panel C of Fig. 2). It is well known Discussion that log-binomial models may fail to convergence or In this study, the statistical performances of the two generate incorrect estimates when the covariate values most popular model-based approaches used to estimate in the data are not bounded by 1 [3, 5, 8]. However, we Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 10 of 12 believe that this is a different issue than what we have Although we did not evaluate data based on other link focused on, which has been the impact of large Ps. In functions that are also suitable for modeling binary out- Scenario VII-1, truncation was not applied and none of comes (e.g. complementary log-log or log-log), it is ex- the observations had predicted probabilities > 1, yet the pected that the results would have similar patterns. A point estimate was still biased. truncated distribution appears in many real-life datasets On the other hand, the point estimates from the robust where the collection of data is limited to those that are Poisson models were nearly unbiased in all the scenarios above or under a threshold. For example, a typical scale examined, including when they were applied to the data used in clinics or hospitals can measure height up to that were generated using a probit link, which yielded 200 cm and weight up to 250 kg. Subjects exceeding quite different probability distributions compared to those these values would be truncated to these limits. In the from a log link, and/or when the distribution of 5.8% of simulated datasets, the distributions of approximately the exposed subjects were altered. In Chen et al. [27], both 1.4, 2.8, and 5.8% of the exposed subjects were truncated the MLE generated by log-binomial models and the in that they no longer followed the distribution specified quasi-likelihood estimators produced by robust Poisson by the link function through a combination of linear models deteriorated when outliers were introduced [27]. predictors. The truncation rates (1.4, 2.8, and 5.8%) for However, in the current study, the biases in point esti- the exposed subjects were plausible values that can be mates based on robust Poisson models were negligible, related to real-life applications. even when both link functions and predictors were incor- In contrast to Chen et al. [27], in which no differences rectly specified. This interesting contrast can be explained were found at the second decimal point when the data by a major difference in the design of the two studies. In were not contaminated with outliers, we found small dif- the previous study [27], the association between the ex- ferences in the variances at the second decimal point be- posure and the outcome was weakened when the “out- tween the log-binomial and robust Poisson models under liers” were introduced, and thus the negative biases were some of the scenarios for both samples (n =1000 and n = observed for the robust Poisson models. Nevertheless, in 500) when the models were correctly specified. This find- the current study, the true RR was maintained at 3.0, (or ing is consistent with that of Petersen and Deddens [11], 2.0 for some scenarios), even when the link function was which was based on a sample with 100 observations and a misspecified and/or when the probabilities were trun- single independent variable with a uniform distribution. cated. Our simulations demonstrated that for robust Kauermann and Carroll [28] showed that variances of Poisson regression, the misspecification of the link func- sandwich estimators were generally less efficient than tion did not hinder its ability to find the true RR. This is variance estimates derived from parametric models. This likely due to the fact that the quasi-likelihood method en- weakness impacts the coverage probability, the probabil- ables regression coefficient estimation without fully spe- ity that a confidence interval covers the true RR, and cifying the distribution of the observed data. We thus the ability to reject a null when the alternative is examined exposure-outcome associations with RR 3.0 true. Hence, log-binomial models are preferred over the and 2.0. The magnitude of the observed bias in our simu- robust Poisson models when the log-binomial models lation results did not change much when the association are correctly specified. was reduced from 3.0 to 2.0; however, it is conceivable The COPY method was reported to have convergence that the bias could be reduced in scenarios when the as- issue when there are continuous covariates in the model sociation is smaller than 2.0. [11]. However, convergence was barely an issue in this Model misspecification does not always yield differences study as it converged completely (i.e. 1000 out of 1000 in point estimates between the two models. In fact, in a simulations) in 23 out of 28 scenarios when the sample previous examination (Additional file 3), we found when size was 1000, and 21 out of 28 scenarios when the sam- an important explanatory variable was omitted, a higher ple size was 500. In the 12 scenarios (five for sample size order term of non-linear explanatory variable was ignored, 1000 and seven for sample size 500) for which the COPY or an interaction term was overlooked, the two models method did not completely converge in all 1000 simula- produced comparable results regardless of the outcome tions, there was only one out of 1000 simulations that rate, risk ratio or the strength of association between the failed to converge for each scenario. The number of vir- exposure and the confounder or between the outcome tual copies used in the study, 10,000, was reported to be and the confounder. Only in the scenario where an inter- accurate to three decimal places [25]. action term was ignored did the models yield large biases. Misspecification tests were developed [29, 30]and This highlights the relative importance of observations proven to be able to maintain reasonable size across vari- with large weights, since in the previous examination, the ous settings in simulation when they were applied to logis- number of observations with large probabilities of having tic and beta-binomial regression models [30]. However, the response was small. the power to detect the types of alternatives commonly Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 11 of 12 observed in practice (e.g. alternative link functions) was Additional file 2: Comparison of robust Poisson and log-binomial low [30]. Blizzard and Hosmer [10] assessed model-fit of models in estimating risk ratio (RR) of ≥ 7 SABA canisters dispensed in the past year. (DOCX 12 kb) log-binomial models by applying the Hosmer-Lemeshow Additional file 3: Comparing robustness to model misspecification test (a commonly used goodness-of-fit test for logistic re- between robust Poisson and log-binomial models for estimating risk gression models), the Pearson chi-square test, and the un- ratios: Initial simulation study. (DOCX 45 kb) weighted sum of squares test, finding that all three tests Additional file 4: Using Fish scoring (iteration) to estimate β. Proof of exhibited acceptable Type I errors yet low-to-moderate the iteration equation in Methods section. (DOCX 45 kb) power. Due to the lack of powerful diagnostic tools to de- Additional file 5: SAS codes used to estimate β for each regression model. (DOCX 12 kb) tect any forms of model misspecification, the robust Pois- Additional file 6: Results on simulated data when n = 500. (DOCX 18 kb) son model may be considered a good choice because of its ability to produce unbiased risk ratios. Efforts to establish Abbreviations efficient and robust parameter estimators are ongoing. A ACT: Asthma control test; FeNO: Fractional exhaled nitric oxide; FEV: Forced recent publication summarized issues with the current ap- expiratory volume; GEE: Generalized estimation equation; GLM: Generalized linear proaches within the GLM family to estimate relative risks models; ICS: Inhaled corticosteroids; IRLS: Iterative reweighted least squares; MLE: Maximum likelihood estimator; MSE: Mean square error; QL: Quasi-likelihood; and risk differences, and provided a possible alternative to RR: Risk ratio; SABA: Short-acting beta-agonist; SE: Standard error estimate relative risks and risk differences using a non-GLM approach [31]. The authors proposed to model Acknowledgements The authors would like to thank Dr. Mark Krailo for his valuable input on the relative risks as functions of baseline covariates. Validation methodology. of this approach is needed to determine its applicability to studies such as those presented here. Availability of data and materials The program to generate the simulated data can be made available from the corresponding author on reasonable request. Conclusions Authors’ contributions Given the vulnerability of log-binomial models when they WC conceivedand carriedout thestudy, and drafted the manuscript. LQ are misspecified, a robust Poisson model should be con- participated in the design, data generation and interpretation of the analyses. JS participated in the design and interpretation of the analyses. MF participated in the sidered the preferred choice for estimating risk ratios. This design and provided guidance. All authors read and approved the final version. is especially the case when the prevalence of the outcome is low and the model contains continuous covariates. If Ethics approval and consent to participate The asthma study mentioned in the “A motivating example” Section was approved the result of a robust Poisson model approaches border- by the Institutional Review Board of Kaiser Permanente Southern California. line significance, consider performing a log-binomial re- gression as well, as the increased efficiency of the Consent for publication Not applicable. log-binomial model may increase the probability of detect- ing the effect with a given significance level. If the point Competing interests estimates of the two models are inconsistent and the The authors declare that they have no competing interests. log-binomial model is preferred, categorize continuous variables and re-fit the model. If the data contain trun- Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in cated values, examine the distribution of the data carefully published maps and institutional affiliations. and consider converting them into categorical variables if such a conversion is clinically meaningful. The robust Author details Kaiser Permanente Southern California, Department of Research and Poisson model does not work well for samples that are Evaluation, 100 S. Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA. very small because the sample-based sandwich estimators Department of Preventive Medicine, Keck School of Medicine, University of tend to underestimate the true standard errors [32]. Southern California, 1975 Zonal Ave, Los Angeles, CA 90033, USA. In summary, we found evidence to favor the robust Received: 25 August 2017 Accepted: 8 June 2018 Poisson model under various scenarios when models were misspecified. Future studies to develop model misspecifi- cation and/or goodness-of-fit tests that are powerful and References 1. Greenland S. Interpretation and choice of effect measures in epidemiologic convenient to apply for log-binomial models are analyses. Am J Epidemiol. 1987;125(5):761–8. warranted. 2. Zhang J, Yu KF. What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA. 1998;280(19):1690–1. 3. McNutt LA, Wu C, Xue X, Hafner JP. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol. 2003; Additional files 157(10):940–3. 4. Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vanderbroucke JP. Risk ratio and rate ratio estimation in case-cohort Additional file 1: Popularity of log-binomial and robust Poisson designs: hypertension and cardiovascular mortality. Stat Med. 1993;12(18): regression models – A Medline search. (DOCX 18 kb) 1733–45. Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 12 of 12 5. Skov T, Deddens J, Petersen MR, Endahl L. Prevalence proportion ratios: estimation and hypothesis testing. Int J Epidemiol. 1998;27(1):91–5. 6. Greenland S. Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case- control studies. Am J Epidemiol. 2004;160(4):301–5. 7. Barros AJ, Hirakata VN. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol. 2003;3:21. 8. Wacholder S. Binomial regression in GLIM: estimating risk ratios and risk differences. Am J Epidemiol. 1986;123(1):174–84. 9. Zou G. A modified poisson regression approach to prospective studies with binary data. Am J Epidemiol. 2004;159(7):702–6. 10. Blizzard L, Hosmer DW. Parameter estimation and goodness-of-fit in log binomial regression. Biom J. 2006;48(1):5–22. 11. Petersen MR, Deddens JA. A comparison of two methods for estimating prevalence ratios. BMC Med Res Methodol. 2008;8:9. 12. Knol MJ, Le Cessie S, Algra A, Vandenbroucke JP, Groenwold RH. Overestimation of risk ratios by odds ratios in trials and cohort studies: alternatives to logistic regression. CMAJ. 2012;184(8):895–9. 13. Schatz M, Nakahiro R, Crawford W, Mendoza G, Mosen D, Stibolt TB. Asthma quality-of-care markers using administrative data. Chest. 2005;128(4):1968–73. 14. McCullagh P, Nelder JA. Generalized linear models. Second ed. Boca Raton: Chapman and Hall/CRC; 1989. 15. Green PJ. Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. J R Stat Soc Series B Stat Methodol. 1984;46(2):149–92. 16. Agresti A. Foundations of linear and generalized linear models. Hoboken: Wiley; 2015. 17. McCullagh P. Quasi-likelihood functions. Ann Stat. 1983;1:59–67. 18. Wedderburn RW. Quasi-likelihood functions, generalized linear models, and the gauss—Newton method. Biometrika. 1974;61(3):439–47. 19. Huber PJ. The behavior of maximum likelihood estimates under non- standard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. 1967;1:221–33. 20. White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982:1–25. 21. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42(1):121–30. 22. Carter RE, Lipsitz SR, Tilley BC. Quasi-likelihood estimation for relative risk regression models. Biostatistics (Oxford, England). 2005;6(1):39–44. 23. SAS. Software Version 9.3 of the SAS System for Unix. Cary: SAS Institute Inc; 24. Deddens JA, Petersen MR, Lei X. Estimation of prevalence ratios when PROC GENMOD does not converge. Proceedings of the 28th annual SAS users group international conference. 2003; Mar 30;30:270–28. 25. Petersen M, Deddens J. A revised SAS macro for maximum likelihood estimation of prevalence ratios using the COPY method. Occup Environ Med. 2009;66(9):639. 26. Lumley T, Kronmal R, Ma S. Relative risk regression in medical research: models, contrasts, estimators and algorithms. In: UW biostatistics working paper series 293. Seattle: University of Washington. http://www.bepress. com/uwbiostat/paper293. Accessed 13 June 2018. 27. Chen W, Shi J, Qian L, Azen SP. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study. BMC Med Res Methodol. 2014;14:82. 28. Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96(456):1387–96. 29. Presnell B, Boos DD. The IOS test for model misspecification. J Am Stat Assoc. 2004;99(465):216–27. 30. Capanu M, Presnell B. Misspecification tests for binomial and beta-binomial models. Stat Med. 2008;27(14):2536–54. 31. Richardson TS, Robins JM, Wang L. On modeling and estimation for the relative risk and risk difference. J Am Stat Assoc. 2017;112(519):1121–30. 32. Carroll R, Wang S, Simpson D, Stromberg A, Ruppert D. The sandwich (robust covariance matrix) estimator. Unpublished manuscript. 1998; http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Medical Research Methodology Springer Journals

Comparing performance between log-binomial and robust Poisson regression models for estimating risk ratios under model misspecification

Loading next page...
 
/lp/springer-journals/comparing-performance-between-log-binomial-and-robust-poisson-0mO7C3FgSF

References (35)

Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s).
Subject
Medicine & Public Health; Theory of Medicine/Bioethics; Statistical Theory and Methods; Statistics for Life Sciences, Medicine, Health Sciences
eISSN
1471-2288
DOI
10.1186/s12874-018-0519-5
Publisher site
See Article on Publisher Site

Abstract

Background: Log-binomial and robust (modified) Poisson regression models are popular approaches to estimate risk ratios for binary response variables. Previous studies have shown that comparatively they produce similar point estimates and standard errors. However, their performance under model misspecification is poorly understood. Methods: In this simulation study, the statistical performance of the two models was compared when the log link function was misspecified or the response depended on predictors through a non-linear relationship (i.e. truncated response). Results: Point estimates from log-binomial models were biased when the link function was misspecified or when the probability distribution of the response variable was truncated at the right tail. The percentage of truncated observations was positively associated with the presence of bias, and the bias was larger if the observations came from a population with a lower response rate given that the other parameters being examined were fixed. In contrast, point estimates from the robust Poisson models were unbiased. Conclusion: Under model misspecification, the robust Poisson model was generally preferable because it provided unbiased estimates of risk ratios. Keywords: Log-binomial regression, Robust (modified) Poisson regression, Model misspecification, Risk ratio, Link function misspecification Background regression analysis were comparable to risk ratios [4]. Introduction However, it was found that these methods produced Logistic regression is the most widely-used modeling ap- prevalence greater than one [5]. Greenland [6] brought ex- proach for studying associations between exposures and tensive literature on valid model-based estimation of rela- binary outcomes. For rare events, the odds ratio estimated tive risks and other measures to readers’ attention. Of the from logistic regression approximates the risk ratio (RR). model-based approaches, log-binomial [5, 7, 8] and robust However, when events are common, odds ratios always (or modified) Poisson regression models [7, 9]are most overestimate risk ratios [1] Zhang and Yu [2]suggested a frequently applied to estimate risk ratios for common bin- correction for odds ratios to give a risk ratio in studies of ary outcomes. Barros et al. [7]and Zou[9]showedhow common outcomes. This method was subsequently shown risk ratios can be estimated by using robust Poisson re- to result in inconsistent point estimates as well as invalid gression with a robust error variance. confidence intervals [3]. Efforts were made to modify the In medical and public health research, log-binomial data so that estimated odds ratios from a logistic and robust Poisson regression models are widely used to directly estimate risk ratios for both common and rare outcomes. For example, they can be used to estimate the * Correspondence: Wansu.Chen@KP.org effect of clinical characteristics (e.g. obesity, smoking, Kaiser Permanente Southern California, Department of Research and Evaluation, 100 S. Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA history of stroke, exercise, or diet) on a health condition Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 2 of 12 (e.g. a cardiovascular event, mortality, or hospital admis- controlling for age, gender, race/ethnicity, number of sion). Our own Medline search of articles published be- aeroallergen sensitivities, clinical center, FEV %pre- tween 2005 and 2014 demonstrated that the popularity dicted (≥80% vs. < 80%) and ACT score (> 19, 16–19, of both models has increased significantly in the past < 16). Age remained as a continuous variable in the decade (Additional file 1). model. Model misspecification can lead to biased estimates, Based on the robust Poisson model, the risk ratio of resulting in erroneous and misleading conclusions. Re- having ≥7 SABA canisters over the past 12 months was gression models require that the relationship between 2.05 (95% CI, 1.03–4.05) in patients whose FeNO value the response and the explanatory variables conforms to was in the second quartile, compared to patients whose a particular functional form. Omitting important ex- FeNO value was in the first quartile (Additional file 2). planatory variables, failing to account for non-linear However, when the analysis was repeated by using the components or critical interaction terms, or making log-binomial model, the corresponding risk ratio was measurement errors can cause model misspecification 1.67 (95% CI, 0.83–3.57), a 19% reduction from the ro- and thus bias the parameter estimators of one or more bust Poisson estimate. Although the point estimates for of the predictors in a regression model. Under correct FeNO from the two models were in the same direction, model specification, the log-binomial and robust Poisson the interpretation of the results varied because the 95% methods have been shown to yield comparable point es- confidence interval of the estimated RR from the robust timates and standard errors [7, 9–12]. For instance, in a Poisson covered one, yet the log-binomial model did simulation study with sample size of 1000, both the not. The RR for patients whose FeNO values were in the log-binomial and robust Poisson models yielded similar third and fourth quartiles compared to the patients in unbiased estimates with good coverage probability [12]. the first quartile were also different between the two However, the results of the following example revealed a models, although the interpretations remained the same different paradigm. at the 95% level (see Additional file 2). The differences in point estimates of RR between the two models were A motivating example in the range of 12–19%. A cross-sectional study was conducted at Kaiser Perma- nente Southern California to assess the association of Current study fractional exhaled nitric oxide (FeNO), a marker of air- The inconsistent conclusions between the aforemen- way inflammation, and asthma burden among persistent tioned simulation studies [7, 9–12] and the results from asthma patients who were treated with inhaled cortico- the motivating example led us to further investigate steroids (ICS). It was hypothesized that high FeNO levels whether model misspecification and/or characteristics of were associated with greater asthma burden. In this the data resulted in such large discrepancies between study, the primary binary outcome of interest was log-binomial and the robust Poisson regression models. whether seven or more short-acting beta-agonist (SABA) An initial attempt was made to understand the impact canisters were dispensed in the past 12 months, which is of model misspecification on the performance of the two a validated administrative data surrogate for asthma im- regression models when an important explanatory vari- pairment. Previous studies based on similar patient pop- able was omitted, a higher order term of non-linear ex- ulations revealed that the prevalence of asthma planatory variable was ignored, or an interaction term impairment among persistent asthma patients was high was overlooked (Kaiser Permanente, Pasadena; Univer- [13]. The study population consisted of asthma patients sity of Southern California, Los Angeles; unpublished re- 12 to 56 years of age who had aeroallergen sensitization sults. Additional file 3). Although certain types of model and regular ICS treatment for at least one month were misspecification did bias the point estimates, both the enrolled during an allergy department visit (index visit). magnitude and the direction of the biases were compar- Information on patient demographics, FeNO, forced ex- able between the two models. The current simulation piratory volume in one second (FEV % predicted) and study was subsequently designed to examine the impact asthma control test (ACT) score was collected during of misspecified link functions and truncated probabil- the index visit, while the information on aeroallergen ities, one type of linear predictor misspecification. To sensitization and SABA canisters dispensed came from our knowledge, the extent of such an inconsistency electronic medical records. between the two models has not been systematically ex- FeNO was categorized into four quartiles. Multivari- amined. Inspired by the differences in the estimating able log-binomial and robust Poisson regression models equations of the two models, we also examined the im- were applied to estimate the risk ratio of having seven or pact of having observations with large probabilities of more SABA canisters in each of the FeNO quartiles developing the response on the performance of the two (using the lowest quartile as the reference group), models when the overall response rate varies. Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 3 of 12 The paper is organized as follows: The “Methods” −1 ðÞ tþ1 0 β ¼ðÞ X WX X Wz ð2Þ section presents the theory behind the two models to explain the differences in the estimation methods that ðÞ t could result in variations in the estimates. The "Methods" Y−P β ðtÞ n;k where z ¼ Xβ þ ; X ¼ x ∈R ; i; j section also provides the details of the simulation design. ðÞ t P β The "Results" section shows the results of simulation under various scenarios. Lastly, we summarize the find- Y−PðÞ β y −p ðÞ β y −p ðÞ β y −p ðÞ β 1 1 2 2 n n ¼ ; ; …; ; ings and provide recommendations for the use of these PðÞ β p ðÞ β p ðÞ β p ðÞ β 1 2 n models in future studies in the “Discussion” section. p ðÞ β and weight W ¼ Diag ; 1−p ðÞ β Methods i ¼ 1; 2; …; n; j ¼ 1; 2; …; k: Generalized linear models (GLM) originate from a signifi- cant extension of traditional linear regression models [14]. The iteration process continues until β stabilizes. The They consist of a random component that specifies the weights W used in the iterative process contain p (β)in conditional distribution of the response variable (Y)from the numerator and 1 – p (β) in the denominator, where an exponential family given the values of the explanatory p (β) = exp (x β) with a range from 0 to 1. When p (β)is i i variables X X ···,X , a linear predictor (or systematic) com- 1, 2, k a very small number, the weight approximates p (β). ponent that is a linear function of the predictors, When p (β) approaches one, the weight approaches in- ƞ=β +β X +β X +···+β X ,where β=(β ,β ,...,β ) is the 0 1 1 2 2 k k 0 1 k finity. This suggested that the IRLS approach is highly vector of the parameters, and a smooth invertible link influenced by observations that have large p (β). More- function that transforms the expectation of the re- over, the impact is also influenced by the average p (β), sponse variable, μ ≡ E(Y), to the linear predictors: or the average weight (lower average p (β) is associated g(μ)=ƞ=β +β X +β X +···β X . For example, the most 0 1 1 2 2 k k with lower average weight). For illustration, we con- common link for binary outcomes is the logit (i.e. log structed two hypothetical samples each with five obser- (μ/(1-μ))) in a logistic model, the log (μ) in a Poisson vations having the following probabilities: sample 1 model, or a log-binomial model. = {0.1, 0.3, 0.4, 0.5, 0.95} and sample 2 = {0.02, 0.03, 0.08, In the descriptions below, Y and x ¼ð1; x ; x ; …; x Þ i i1 i2 ik 0.15, 0.95}. The corresponding weights for the two sam- denote the binary outcome and the row vector comprised of ples were {0.25, 0.43, 0.67, 1, 19} and {0.02, 0.03, 0.09, th k predictors for the i individual (i = 1,2,…n), respectively. 0.18, 19}, respectively. In sample 2, the observation with The observations from the n individuals are independent. weight 19 will impact the point estimate more, com- pared the observation in sample 1 with the same weight. Log-binomial regression In the GLM framework, the conditional distribution of Y Robust Poisson regression given the predictor variables is binomial, with the mean re- In robust Poisson regression, a quasi-likelihood (QL) sponse related to the predictors by the link function log model can be applied to fit the data with a binary out- (μ ). In log-binomial regression, μ is often denoted as p,be- i i i come [14–18]. Quasi-likelihood was first introduced by cause E(Y ) is a probability with a value between zero and Wedderburn (1974) as a function that has properties one. Although there are other methods to obtain efficient analog to those of log-likelihood functions [18]. Similar estimators, the maximum likelihood approach is used to to ML method, maximum QL method can be used to es- generate asymptotically efficient estimators (maximum like- timate the QL estimates. In a maximum QL model, only lihood estimates (MLE)) in log-binomial regressions [5, 8]. the relationship between the mean and the variance (i.e. The MLE of log-binomial models are derived from an the variance is a function of the mean) needs to be spe- iteratively reweighted least squares (IRLS) approach [15]. cified instead of the underlying distribution of the data In a log-binomial regression, logðP ðβÞÞ ¼ x β where [15–19]. It can be shown that when Y comes from the p (β)=Pr(y =1|x ),0 ≤ p ≤ 1, and x β < 0 (constrained). i i i i exponential family, the quasi-score function is identical The log-likelihood is given by to the score function associated with the maximum like- lihood of the GLM. n n X X When the Poisson distribution is chosen, the 84%ℓðÞ β ¼ y logðÞ p ðÞ β þ ðÞ 1−y logðÞ 1−p ðÞ β i i i i quasi-score function can be simplified to S ðβÞ¼ i¼1 i¼1 ϕ i¼1 ðy −μ Þx , resulting in quasi-score estimating equations ð1Þ ij i i S ðβÞ¼ ðy −μ Þx ¼ 0, which is the same as the es- j ij i¼1 i i It can be proven that the MLE for β can be found by timating equations of the Poisson regression models. In the following iteration (Additional file 4) the two equations above, ϕ is the dispersion parameter, Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 4 of 12 and j = 1,2,…k. The final estimate from the Relative bias was defined as the average of the 1000 esti- mated RR in log scale minus the log of the true RR quasi-scoring procedure satisfies the condition S ðβÞ¼ 0 ^ divided by the log of the true RR. For θ , the estimated and β is a consistent and asymptotically unbiased esti- th log RR from the m dataset using either the mate of β [20]. β does not depend on ϕ. log-binomial model or the robust Poisson model, the rela- The quasi-likelihood estimators are not maximally and 1;000 θ − logðtrueRRÞ 1 m asymptotically efficient [14]. The robust Poisson regres- tive bias was defined as  100%. 1;000 logðtrueRRÞ m¼1 sion model uses the classical sandwich estimator under Standard error was defined as the empirical SE of the esti- the generalized estimation equation (GEE) framework to mated risk ratio in log scale over all 1000 simulations. The provide accurate standard errors for the elements [19–21]. MSE was calculated by taking the sum of the squared bias The variance-covariance matrix is in log scale and the variances, in which the bias was speci- 1;000 "#"#"# −1 −1 −1 n nhi n fied as θ −logðtrueRRÞ. X X X 1;000 m¼1 EI½ ðÞ β E ðS ðÞ β S ðÞ β EI½ ðÞ β i i i i Because both SE and MSE depended on the sample i¼1 i¼1 i¼1 size, the process described above was repeated for sam- ð3Þ ple of size 500 for all scenarios with RR = 3. ∂S ðβÞ where I ðβÞ¼ − is the information matrix [22]. A ∂β Simulated datasets consistent estimate of the variance can be obtained by Let Y be a common binary outcome (Y = 1 for disease evaluating the variance-covariance matrix at β. and Y = 0 for non-disease) and X be a binary exposure variable (X = 1 for exposure and X = 0 for non-exposure). Implementation First, uncorrelated random variables Z and Z following 1 2 Both regression models were implemented in SAS [23] the Bernoulli (0.5) and the Uniform [0, 1] distributions, (SAS Software Version 9.3 of the SAS System for Unix. respectively, were generated for 1000 subjects. These Cary, NC. SAS Institute Inc. 2011). The SAS codes can be distributions were chosen for their simplicity in the de- found in Additional file 5. For the log-binomial model, − 4 sign. Then, the exposure variable X based on the was set as the initial value of the intercept. For both subject-specific probability of exposure, defined by the models, the weighted least squares estimates (default) equation logit (P (X =1| Z , Z )) = − 1.0 + Z + Z with 1 2 1 2 were used as initial values of parameters. The convergence E(P(X =1| Z , Z )) = 0.5, was created for each subject. − 4 1 2 criterion was 10 (default). A well-known issue of All of the outcome variables defined below were condi- log-binomial models is failure to converge when the MLE tional on the exposure status and the covariates. For is located on the boundary of the parameter space (i.e. the exposed subjects, P (Y =1| X =1, Z , Z )=3× P (Y =1| 1 2 predicted probability of the outcome is equal to 1). To X =0, Z , Z ). The adjusted RR (i.e. P (Y =1| X =1, Z , 1 2 1 minimize the convergence issue, the COPY method was Z )) / P (Y =1| X =0, Z , Z )) was fixed at 3.0, chosen to 2 1 2 applied [24, 25] in which the number of virtual copies was reflect the effect size commonly seen in real-world set- set to 10,000. To ensure a fair comparison between the tings, and strong enough to yield observable differences in log-binomial and robust Poisson models, the evaluation performance between the two regression models. was conducted by only using the results based on exactly the same simulated data. If the COPY method did not converge for a dataset, the same dataset was then removed Scenarios to study the impact of truncation before the performance of the robust Poison models was The equations to generate Y took four different forms (Y , evaluated. The exclusion of datasets was very rare in this Y , Y , Y ) to enable the examination of the impact of trun- 2 3 4 study. Details on the number of excluded datasets can be cation. Unlike Y , which always had a perfect linear associ- found in the “Discussion” section. ation with its predictors (i.e. not truncated), Y , Y and Y 2 3, 4 were truncated such that the values of P (Y =1| X =0, Z , k 1 Measures of model performance Z ) depended on whether or not “Z + (beta of Z )× Z ” 2 1 2 2 For each simulated scenario, the simulation process was reached a threshold (k = 2, 3, and 4) (Table 1). For ex- repeated 1000 times. In each of the 1000 simulated data- ample, in Scenario I-2, the threshold was set at 0.15, sets, the log risk ratio was estimated from the requiring “Z +3× Z ” to be greater than 0.15 to impact 1 2 log-binomial model and the robust Poisson model, re- P (Y =1| X =0, Z , Z )). The defined threshold varied by 2 1 2 spectively. For each scenario, the relative bias, standard scenario and was chosen such that the percentages of ex- error (SE), and mean square error (MSE) in log scale for posed subjects at the maximum P (Y =1| X, Z , Z )) could 1 2 all three measures were calculated by summarizing the be controlled within the range of 1.4–5.8%. A truncation results from the 1000 datasets for each regression model. yielded a spike of observations at the maximum P (Y =1| Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 5 of 12 Table 1 Design of the simulation data Scenario Scenario Models to generate simulation datasets Model misspe-cified? Max P of Beta of Z Link % of exposed exposed Function at Max P I-1 0.75 3 log 0 log (P (Y =1| X =0, Z , Z )) = −1.38 – Z –3* Z No 1 1 2 1 2 I-2 0.75 3 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.23 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 I-3 0.75 3 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 1.08 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 I-4 0.75 3 log 5.8 log (P (Y =1| X =0, Z , Z )) = − 0.78 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 II-1 0.85 3 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.26 – Z –3* Z No 1 1 2 1 2 II-2 0.85 3 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.11 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 II-3 0.85 3 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.96 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 II-4 0.85 3 log 5.8 log (P (Y =1| X =0, Z , Z )) = − 0.66 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 III-1 0.95 3 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.15 – Z –3* Z No 1 1 2 1 2 III-2 0.95 3 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.00 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 III-3 0.95 3 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.85 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 III-4 0.95 3 log 5.8 log (P (Y =1| X =0, Z , Z )) = − 0.55 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 IV-1 0.95 2 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.15 – Z –2* Z No 1 1 2 1 2 IV-2 0.95 2 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 1.05 – max (Z +2 * Z , 0.10) Yes 2 1 2 1 2 IV-3 0.95 2 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.95 – max (Z +2 * Z , 0.20) Yes 3 1 2 1 2 IV-4 0.95 2 log 5.8 log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.75 – max (Z1 + 2 * Z2, 0.40) Yes V-1 0.95 4 log 0 log (P (Y =1| X =0, Z , Z )) = − 1.15 – Z –4* Z No 1 1 2 1 2 V-2 0.95 4 log 1.4 log (P (Y =1| X =0, Z , Z )) = − 0.95 – max (Z +4 * Z , 0.20) Yes 2 1 2 1 2 V-3 0.95 4 log 2.8 log (P (Y =1| X =0, Z , Z )) = − 0.75 – max (Z +4 * Z , 0.40) Yes 3 1 2 1 2 V-4 0.95 4 log 5.8 log (P (Y4 = 1| X = 0, Z1, Z2)) = − 0.35 – max (Z1 + 4 * Z2, 0.80) Yes VI-1 0.95 3 logit 0 logit (P (Y =1| X =0, Z , Z )) = − 0.76 – Z –3* Z Yes 1 1 2 1 2 VI-2 0.95 3 logit 1.4 logit (P (Y =1| X =0, Z , Z )) = − 0.61 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 VI-3 0.95 3 logit 2.8 logit (P (Y =1| X =0, Z , Z )) = − 0.46 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 VI-4 0.95 3 logit 5.8 logit (P (Y =1| X =0, Z , Z )) = − 0.16 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 VII-1 0.95 3 probit 0 probit (P (Y =1| X =0, Z , Z )) = − 0.48 – Z –3* Z Yes 1 1 2 1 2 VII-2 0.95 3 probit 1.4 probit (P (Y =1| X =0, Z , Z )) = − 0.33 – max (Z +3 * Z , 0.15) Yes 2 1 2 1 2 VII-3 0.95 3 probit 2.8 probit (P (Y =1| X =0, Z , Z )) = − 0.18 – max (Z +3 * Z , 0.30) Yes 3 1 2 1 2 VII-4 0.95 3 probit 5.8 probit (P (Y =1| X =0, Z , Z )) = − 0.12 – max (Z +3 * Z , 0.60) Yes 4 1 2 1 2 Maximum P (Y =1| X =1, Z , Z ) k 1 2 Models to generate Y for unexposed subjects. For exposed subjects, P (Y =1| X =1, Z , Z ) =3*P (Y =1| X =0, Z , Z ). k = 1, 2, 3, and 4 k k 1 2 k 1 2 Model was defined as misspecified when the link function was not ‘log’ or the % of exposed at maximum P (Y =1| X =0, Z , Z ) was greater than 0 k 1 2 X, Z , Z )) for both exposed and unexposed subjects. values set above (0.75, 0.85, and 0.95) were 1.4, 2.8, and 1 2 When other parameters were fixed, a spike goes higher 5.8%. These values were derived by simulation and rep- with the increase of the threshold. This allowed us to resented the various levels of alteration of the linear study how the volume of large P (Y =1| X, Z , Z )) im- predictors. 1 2 pacted model performance. The intercepts were manually calculated to satisfy P (Y =1 | X =0, Z , Z ) ≤ 0.75/3, 0.85/3 and 0.95/3, re- k 1 2 Scenarios to study the impact of maximum P(Y = 1) spectively. For example, for the equation log (P (Y =1| First, scenarios I-1, II-1, and III-1 were created using the X =0, Z , Z )) = α – Z –3* Z in scenario I-1, α = 1 2 1 2 log link function. The maximum values of P (Y =1| X = log(0.75/3) = − 1.38 since the logarithm is an increasing 1, Z , Z )) were set to 0.75, 0.85 and 0.95, respectively, function and hence the maximum of P (Y =1| X =0, Z , 1 2 1 1 to study the impact of maximum P (Y =1| X =1, Z , Z ) is achieved when Z = 0 and Z = 0. For the same 1 1 2 1 2 Z )) (Table 1). The selected thresholds were set at 0.15, reason, for the equation log (P (Y =1| X =0, Z , Z )) = α 2 2 1 2 0.30, and 0.60 for Y , Y and Y , respectively. The per- – Z – max (Z +3* Z , 0.15) in scenario I-2, α = 2 3, 4 1 1 2 centages of exposed subjects who were at the maximum log(0.75/3) + 0.15 = − 1.23. The 12 scenarios designed to Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 6 of 12 study the impact of large P(Y=1) were listed in the first level of misspecification, measured by the percentage of section of Table 1 (in the first 12 rows). exposed subjects at maximum P, increased. Large P(Y = 1) was associated with an increased level of bias when the percentage of exposed subjects having Scenarios to study the impact of coefficient of Z the max P (Y =1| X =1, Z , Z ) was fixed; refer to the k 1 2 To study the impact of the entire distribution of first three rows of Table 2 and Fig. 1, Panel A. For ex- P(Y = 1) when large P(Y = 1) existed, eight more scenar- ample, when the percentage of exposed subjects whose ios were produced, with the beta coefficient of Z being maximum P (Y =1| X =1, Z , Z ) was fixed at 5.8, an k 1 2 2 and 4 (shown in the middle section of Table 1), to join increase of maximum P (Y =1| X =1, Z , Z ) from 0.75 k 1 2 the four scenarios for which the beta coefficient of Z (scenario I-4) to 0.95 (scenario III-4) resulted a change was set to 3 (i.e. III-1, III-2, III-3, and III-4). The distri- in relative bias from − 4.1% to − 11.7%. bution of P(Y = 1) was shifted towards zero as the beta The impact of average P(Y = 1) was displayed from the coefficient of Z increased. Thus, these scenarios allowed 4th to the 6th rows of Table 2 and in Panel B of Fig. 1. us to study the impact of the outcome distribution, or When the percentage of exposed subjects having the the average P(Y = 1). max P (Y =1| X =1, Z , Z ) was fixed, the log-binomial k 1 2 The intercepts and the thresholds were generated models were more vulnerable (larger absolute value of using the same approach as described in the previous relative biases) when the beta coefficient of Z increased section. Because Z follows the uniform distribution, the (i.e. average P (Y =1| X =1, Z , Z ) decreased). For ex- 1 2 thresholds increase proportionally with the beta coeffi- ample, when the percentage of exposed subjects whose cients. For example, the threshold to make 1.4% of ex- maximum P (Y =1| X =1, Z , Z ) = 2.8, the relative bias k 1 2 posed subjects reached the maximum P (Y =1| X =1, changed from − 3.8% to − 8.5% when the beta coefficient Z , Z ) = 0.1 when the beta of Z was 2 and increased to 1 2 2 increased from 2 to 4. This indicates that the value of 0.2 when the beta of Z was 4. the average P(Y = 1) impacts the performance of the log-binomial models. When there were enough large Scenarios to study the impact of misspecified link functions P(Y = 1), a low average P(Y = 1) away from large The link function was altered from log to logit and pro- P(Y = 1) was associated with a large relative bias. bit in scenarios VI and VII to assess the model perform- When the underlying distribution of data was logit, ance when the link functions were misspecified; refer to misspecifying the link function as ‘log’ did not signifi- the last section of Table 1. For scenarios VI-2, VI-3, cantly influence relative biases. However, when the VI-4, VII-2, VII-3, and VII-4, not only were the link underlying distribution of data was probit, the bias (− functions misspecified, but also the responses depended 5.9%) was noticeable even when the linear predictors on covariates with truncated probabilities. were properly specified (i.e. percentage of exposed sub- jects having the max P (Y =1| X =1, Z , Z ) was zero). k 1 2 Scenarios with a weaker association between exposure and Refer to the last three rows of Table 2 and Panel C of outcome (RR = 2) Fig. 1. Misspecifying the link function from log to probit To understand the impact of misspecified link functions in the presence of misspecified linear predictors had a and truncation when RR is different from 3.0, we also serious consequence. The relative bias was almost − 18% generated scenarios with parameters identical to those in when the percentage of exposed subjects at the max- III-1, III-2, III-3, III-4, VII-1, VII-2, VII-3 and VII-4, ex- imum P was only 2.8. cept that this time RR = 2.0 instead of 3.0. Standard error Results In all simulation scenarios, the SE of the two models Relative bias were comparable (Table 3). At the 2nd decimal point, The relative biases of the estimated RR in log scale from the SE derived from the log-binomial models were either the two models in each of the 28 scenarios when n = the same or slightly smaller compared to those of robust 1000 are shown in Table 2 and Fig. 1. Poisson models. The largest difference, 0.03 (=0.23– As expected, both models accurately estimated β or log 0.20), occurred when the data distribution was probit, (RR) when they were correctly specified, regardless of the maximum P (Y =1| X =1, Z , Z ) was 0.95, and the beta 1 2 value of the maximum P. When the models were misspe- of Z was 3 (scenario VII-1). When the models correctly cified (as shown in the shaded areas in Table 2), the rela- specified (in the unshaded area of Table 3), the SE of the tive biases of the robust Poisson models were negligible, two models were the same except for one scenario (V-1) while those of log-binomial models tended to negatively in which the SE from log-binomial models was smaller bias away from null. For the log-binomial models, the by 0.01. The SE of the log-binomial models were always magnitude of biases increased in all scenarios when the smaller compared to those of robust Poisson models at Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 7 of 12 Table 2 Relative bias (%) in log scale with and without model misspecification (n = 1000) Scenario 1 2 3 4 LB RP LB RP LB RP LB RP I- 0.4 0.6 1.0 1.5 -1.4 0.7 -4.1 1.1 II- 0.4 0.3 -0.5 0.8 -3.0 1.0 -7.1 0.9 III- 0.9 1.4 -2.3 1.0 -6.1 0.8 -11.7 0.5 IV- 0.2 0.6 -1.5 0.4 -3.8 0.2 -6.4 1.0 III- 0.9 1.4 -2.3 1.0 -6.1 0.8 -11.7 0.5 V- 0.1 0.7 -4.6 -0.2 -8.5 1.2 -15.4 1.2 III- 0.9 1.4 -2.3 1.0 -6.1 0.8 -11.7 0.5 VI- -1.7 0.6 -5.0 0.0 -6.8 1.2 -11.0 0.9 VII- -5.9 2.2 -12.8 0.9 -18.0 0.4 -21.9 1.0 Unshaded: Models were correctly specified Light shaded: Misspecified linear predictors or misspecified link function Dark shaded: Misspecified linear predictors and misspecified link function LB: log-binomial, RP: robust Poisson Change of scenarios: Increasing intercept: I→ II→ III; Increasing coefficient of β :IV→ III→ V; Change of link function: III (log), VI (logit), VII (probit) the 3rd decimal place, even if the models were correctly between the two models becoming larger from IV-3 to specified (data not shown). III-3, and to V-3. For scenarios III-4 to VII-4, the robust Poisson models consistently outperformed the Mean square error log-binomial models. For scenarios of VII (in which the As expected, when the models were correctly specified (in underlying data were generated using a probit link), the the unshaded area in Table 4), log-binomial models MSE of log-binomial models, compared to those of robust yielded the same or marginally smaller MSE compared to Poisson models, were slightly smaller when the linear pre- robust Poisson models. When the underlying distribution dictors were properly specified (such as in scenario VII-1) of data had a log or logit link (scenarios I-VI) and the per- and significantly larger when the linear predictors were centage of exposed subjects having the max P (Y =1| X = improperly specified, even when level of misspecification 1, Z , Z ) was 1.4% or 2.8% (in the columns labeled as ‘2’ of the linear predictors was small. For example, when the 1 2 and ‘3’ in Table 4), the MSE of the two models were still percentage of exposed subjects at the 0.95 (max P) was comparable, except for one scenario. The exception oc- only 1.4, the MSE were 0.051 and 0.038, from the curred in scenario V-3, where the superiority of the robust log-binomial model and the robust Poisson model, Poisson models was quite noticeable; the difference in respectively. MSE between the two models was 0.05. Recall that the average P(Y = 1) for V-3 was smallest among all three sce- Distribution of P(Y = 1) narios (III-3, IV-3 and V-3) and followed by that of III-3. The distributions of P(Y = 1) for all simulated data (one Thus, it is not surprising to observe the difference million data points for each scenario) are shown in Fig. 2. Fig. 1 Percentage bias in log(RR) scale. a From left to right: increasing intercept(scenario I→II →III); b From left to right: increasing coefficient of β (scenario IV→III→V); c From left to right: change of link function (scenarios III, VI and VII). Red lines: Robust Poison; Blue lines: Log-binomial 2 Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 8 of 12 Table 3 Standard error in log scale with and without model misspecification (n = 1000) Scenario 1 2 3 4 LB RP LB RP LB RP LB RP 0.20 0.20 0.18 0.18 0.16 0.17 0.14 0.15 I- II- 0.18 0.18 0.16 0.17 0.15 0.15 0.13 0.14 III- 0.17 0.17 0.15 0.16 0.14 0.15 0.12 0.13 IV- 0.13 0.13 0.13 0.13 0.12 0.13 0.11 0.12 III- 0.17 0.17 0.15 0.16 0.14 0.15 0.12 0.12 V- 0.18 0.19 0.15 0.16 0.15 0.16 0.12 0.13 III- 0.17 0.17 0.15 0.16 0.14 0.15 0.12 0.13 VI- 0.14 0.15 0.13 0.14 0.13 0.14 0.11 0.12 VII- 0.20 0.23 0.18 0.19 0.16 0.17 0.14 0.15 Unshaded: Models were correctly specified Light shaded: Misspecified linear predictors or misspecified link function Dark shaded: Misspecified linear predictors and misspecified link function LB: log-binomial, RP: robust Poisson Change of scenarios: Increasing intercept: I→ II→ III; Increasing coefficient of β :IV→ III→ V; Change of link function: III (log), VI (logit), VII (probit) Panel A shows the distribution of P(Y = 1) with varying and unexposed subjects increased in all scenarios when maximum P(Y = 1). When the link function and beta of the percentage of exposed subjects at the max P(Y = 1) Z were fixed at ‘log’ and 3, respectively, an increase of increased from 1.4, 2.8 to 5.8%, indicating a higher level max P(Y = 1) from 0.75 (scenario I) to 0.95 (scenario III) of violation of the linearity assumption. The distribu- stretched the spikes to the right (Fig. 2 Panel A). tions of P(Y = 1) for data with log and logit link func- Panel B shows the distribution with varying beta of Z . tions were similar; however, the distributions of P(Y = 1) When the beta of Z increased from 2 (scenario VI) to 4 for data with probit link function distinguished them- (scenario V) while the max P(Y=1) was fixed at 0.95 and selves significantly from those of log or logit. This obser- the linkfunctionwas set as ‘log,’ the distribution of P(Y = 1) vation explains the larger biases seen in Fig. 1 Panel C, became taller, thinner, and shifted towards zero. This when the underlying distribution of data were probit, provides the evidence that attributes the increase of biases compared to those of log and logit distributions, even to the downward shift of the distribution of P(Y=1) (i.e. when the predictors were perfectly linear. decrease in the prevalence of the outcome variable Y). Panel C shows the distribution with varying link func- Results of moderate sample size (n = 500) tion. Spikes (i.e. vertical thin lines in all figures except When the simulation was conducted based on samples those labeled with “0% at the max P”) for both exposed of moderate size (n = 500), the same pattern was observed Table 4 Mean square error (MSE) in log scale with and without model misspecification (n = 1000) Scenario 1 2 3 4 LB RP LB RP LB RP LB RP I- 0.040 0.040 0.032 0.033 0.026 0.028 0.022 0.022 II- 0.032 0.033 0.026 0.028 0.023 0.024 0.023 0.019 III- 0.028 0.031 0.022 0.025 0.023 0.021 0.031 0.016 IV- 0.017 0.018 0.016 0.018 0.016 0.016 0.017 0.014 III- 0.028 0.031 0.022 0.025 0.023 0.021 0.031 0.016 V- 0.032 0.036 0.026 0.027 0.031 0.026 0.044 0.018 III- 0.028 0.031 0.022 0.025 0.023 0.021 0.031 0.016 VI- 0.019 0.022 0.020 0.019 0.021 0.019 0.027 0.014 VII- 0.046 0.052 0.051 0.038 0.063 0.028 0.076 0.021 Unshaded: Models were correctly specified Light shaded: Misspecified linear predictors or misspecified link function Dark shaded: Misspecified linear predictors and misspecified link function LB: log-binomial, RP: robust Poisson Change of scenarios: Increasing intercept: I→ II→ III; Increasing coefficient of β :IV→ III→ V; Change of link function: III (log), VI (logit), VII (probit) 2 Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 9 of 12 Fig. 2 Distribution of P(Y = 1). Y-axis: Percent; X-axis: P(Y = 1). a From left to right: increasing intercept (scenario I→II→III); b From left of right: increasing coefficient of β (scenarion IV→III→V); c From left to right: change of link function (scenarios III, VI and VII) in terms of relative biases and SE (Additional file 6, Tables RR for common binary outcomes were examined when AF6.1-AF6.2). As expected, the SE based on the samples the link function was misspecified or when the probabil- of moderate size was larger compared to those derived ity of the response variables was truncated at the right from samples with n = 1000. The same pattern was also tail. Our findings suggest that point estimates from observed for MSE (Additional file 6, Table AF6.3); how- log-binomial models were biased when the link function ever, the differences between the two models were not as was misspecified or when the probability distribution of substantial as seen in the samples of sizes 1000. the response variable was truncated for even a small proportion of observations. For log-binomial models, the Results with a weaker association between exposure and percentage of truncated observations was positively asso- outcome (RR = 2) ciated with the presence of the bias. The bias was more When the simulation was conducted based on simula- significant if these observations came from a population tion datasets with RR = 2, the relative biases were similar in which the response rate was lower, given the other to those of the corresponding scenarios with RR = 3 parameters being examined were fixed. (data not shown). The standard errors derived from the For MLE based methods, misspecification can cause simulation datasets with RR = 2 were 12–36% smaller inconsistent estimators of parameters [20]. Lumley et al. compared to those of the corresponding scenarios with (2006) pointed out that compared to robust Poisson and RR = 3; however, the pattern remained the same. That is, other non-MLE based models, log-binomial models the standard errors of the two models were comparable (MLE based) have very large weights when p (referred with those from the robust Poisson models being only by authors as μ) is large ([26] Fig. 1). The same authors slightly larger than those from the log-binomial models. also pointed out that for log-binomial models, “a single The MSE yielded from the simulation datasets with point with μ close to 1 can have arbitrarily large influ- RR = 2 were 16–47% smaller compared to those of the ence despite having bounded covariate values”. Our ob- matching datasets with RR = 3. Nevertheless, similar to servation was consistent with that of Lumley et al. We the patterns observed for RR = 3, the robust Poisson had demonstrated that when the percentage of observations lower MSE compared to the log-binomial models, and the with large P increased, the magnitude of bias also in- differences were more dramatic when a probit link was creased. This may explain the less optimal performance used (versus a log link) and when the data had a higher of the model applied to data generated using a probit percentage of truncation. link compared to that of log or logit link when other pa- rameters were fixed (Panel C of Fig. 2). It is well known Discussion that log-binomial models may fail to convergence or In this study, the statistical performances of the two generate incorrect estimates when the covariate values most popular model-based approaches used to estimate in the data are not bounded by 1 [3, 5, 8]. However, we Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 10 of 12 believe that this is a different issue than what we have Although we did not evaluate data based on other link focused on, which has been the impact of large Ps. In functions that are also suitable for modeling binary out- Scenario VII-1, truncation was not applied and none of comes (e.g. complementary log-log or log-log), it is ex- the observations had predicted probabilities > 1, yet the pected that the results would have similar patterns. A point estimate was still biased. truncated distribution appears in many real-life datasets On the other hand, the point estimates from the robust where the collection of data is limited to those that are Poisson models were nearly unbiased in all the scenarios above or under a threshold. For example, a typical scale examined, including when they were applied to the data used in clinics or hospitals can measure height up to that were generated using a probit link, which yielded 200 cm and weight up to 250 kg. Subjects exceeding quite different probability distributions compared to those these values would be truncated to these limits. In the from a log link, and/or when the distribution of 5.8% of simulated datasets, the distributions of approximately the exposed subjects were altered. In Chen et al. [27], both 1.4, 2.8, and 5.8% of the exposed subjects were truncated the MLE generated by log-binomial models and the in that they no longer followed the distribution specified quasi-likelihood estimators produced by robust Poisson by the link function through a combination of linear models deteriorated when outliers were introduced [27]. predictors. The truncation rates (1.4, 2.8, and 5.8%) for However, in the current study, the biases in point esti- the exposed subjects were plausible values that can be mates based on robust Poisson models were negligible, related to real-life applications. even when both link functions and predictors were incor- In contrast to Chen et al. [27], in which no differences rectly specified. This interesting contrast can be explained were found at the second decimal point when the data by a major difference in the design of the two studies. In were not contaminated with outliers, we found small dif- the previous study [27], the association between the ex- ferences in the variances at the second decimal point be- posure and the outcome was weakened when the “out- tween the log-binomial and robust Poisson models under liers” were introduced, and thus the negative biases were some of the scenarios for both samples (n =1000 and n = observed for the robust Poisson models. Nevertheless, in 500) when the models were correctly specified. This find- the current study, the true RR was maintained at 3.0, (or ing is consistent with that of Petersen and Deddens [11], 2.0 for some scenarios), even when the link function was which was based on a sample with 100 observations and a misspecified and/or when the probabilities were trun- single independent variable with a uniform distribution. cated. Our simulations demonstrated that for robust Kauermann and Carroll [28] showed that variances of Poisson regression, the misspecification of the link func- sandwich estimators were generally less efficient than tion did not hinder its ability to find the true RR. This is variance estimates derived from parametric models. This likely due to the fact that the quasi-likelihood method en- weakness impacts the coverage probability, the probabil- ables regression coefficient estimation without fully spe- ity that a confidence interval covers the true RR, and cifying the distribution of the observed data. We thus the ability to reject a null when the alternative is examined exposure-outcome associations with RR 3.0 true. Hence, log-binomial models are preferred over the and 2.0. The magnitude of the observed bias in our simu- robust Poisson models when the log-binomial models lation results did not change much when the association are correctly specified. was reduced from 3.0 to 2.0; however, it is conceivable The COPY method was reported to have convergence that the bias could be reduced in scenarios when the as- issue when there are continuous covariates in the model sociation is smaller than 2.0. [11]. However, convergence was barely an issue in this Model misspecification does not always yield differences study as it converged completely (i.e. 1000 out of 1000 in point estimates between the two models. In fact, in a simulations) in 23 out of 28 scenarios when the sample previous examination (Additional file 3), we found when size was 1000, and 21 out of 28 scenarios when the sam- an important explanatory variable was omitted, a higher ple size was 500. In the 12 scenarios (five for sample size order term of non-linear explanatory variable was ignored, 1000 and seven for sample size 500) for which the COPY or an interaction term was overlooked, the two models method did not completely converge in all 1000 simula- produced comparable results regardless of the outcome tions, there was only one out of 1000 simulations that rate, risk ratio or the strength of association between the failed to converge for each scenario. The number of vir- exposure and the confounder or between the outcome tual copies used in the study, 10,000, was reported to be and the confounder. Only in the scenario where an inter- accurate to three decimal places [25]. action term was ignored did the models yield large biases. Misspecification tests were developed [29, 30]and This highlights the relative importance of observations proven to be able to maintain reasonable size across vari- with large weights, since in the previous examination, the ous settings in simulation when they were applied to logis- number of observations with large probabilities of having tic and beta-binomial regression models [30]. However, the response was small. the power to detect the types of alternatives commonly Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 11 of 12 observed in practice (e.g. alternative link functions) was Additional file 2: Comparison of robust Poisson and log-binomial low [30]. Blizzard and Hosmer [10] assessed model-fit of models in estimating risk ratio (RR) of ≥ 7 SABA canisters dispensed in the past year. (DOCX 12 kb) log-binomial models by applying the Hosmer-Lemeshow Additional file 3: Comparing robustness to model misspecification test (a commonly used goodness-of-fit test for logistic re- between robust Poisson and log-binomial models for estimating risk gression models), the Pearson chi-square test, and the un- ratios: Initial simulation study. (DOCX 45 kb) weighted sum of squares test, finding that all three tests Additional file 4: Using Fish scoring (iteration) to estimate β. Proof of exhibited acceptable Type I errors yet low-to-moderate the iteration equation in Methods section. (DOCX 45 kb) power. Due to the lack of powerful diagnostic tools to de- Additional file 5: SAS codes used to estimate β for each regression model. (DOCX 12 kb) tect any forms of model misspecification, the robust Pois- Additional file 6: Results on simulated data when n = 500. (DOCX 18 kb) son model may be considered a good choice because of its ability to produce unbiased risk ratios. Efforts to establish Abbreviations efficient and robust parameter estimators are ongoing. A ACT: Asthma control test; FeNO: Fractional exhaled nitric oxide; FEV: Forced recent publication summarized issues with the current ap- expiratory volume; GEE: Generalized estimation equation; GLM: Generalized linear proaches within the GLM family to estimate relative risks models; ICS: Inhaled corticosteroids; IRLS: Iterative reweighted least squares; MLE: Maximum likelihood estimator; MSE: Mean square error; QL: Quasi-likelihood; and risk differences, and provided a possible alternative to RR: Risk ratio; SABA: Short-acting beta-agonist; SE: Standard error estimate relative risks and risk differences using a non-GLM approach [31]. The authors proposed to model Acknowledgements The authors would like to thank Dr. Mark Krailo for his valuable input on the relative risks as functions of baseline covariates. Validation methodology. of this approach is needed to determine its applicability to studies such as those presented here. Availability of data and materials The program to generate the simulated data can be made available from the corresponding author on reasonable request. Conclusions Authors’ contributions Given the vulnerability of log-binomial models when they WC conceivedand carriedout thestudy, and drafted the manuscript. LQ are misspecified, a robust Poisson model should be con- participated in the design, data generation and interpretation of the analyses. JS participated in the design and interpretation of the analyses. MF participated in the sidered the preferred choice for estimating risk ratios. This design and provided guidance. All authors read and approved the final version. is especially the case when the prevalence of the outcome is low and the model contains continuous covariates. If Ethics approval and consent to participate The asthma study mentioned in the “A motivating example” Section was approved the result of a robust Poisson model approaches border- by the Institutional Review Board of Kaiser Permanente Southern California. line significance, consider performing a log-binomial re- gression as well, as the increased efficiency of the Consent for publication Not applicable. log-binomial model may increase the probability of detect- ing the effect with a given significance level. If the point Competing interests estimates of the two models are inconsistent and the The authors declare that they have no competing interests. log-binomial model is preferred, categorize continuous variables and re-fit the model. If the data contain trun- Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in cated values, examine the distribution of the data carefully published maps and institutional affiliations. and consider converting them into categorical variables if such a conversion is clinically meaningful. The robust Author details Kaiser Permanente Southern California, Department of Research and Poisson model does not work well for samples that are Evaluation, 100 S. Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA. very small because the sample-based sandwich estimators Department of Preventive Medicine, Keck School of Medicine, University of tend to underestimate the true standard errors [32]. Southern California, 1975 Zonal Ave, Los Angeles, CA 90033, USA. In summary, we found evidence to favor the robust Received: 25 August 2017 Accepted: 8 June 2018 Poisson model under various scenarios when models were misspecified. Future studies to develop model misspecifi- cation and/or goodness-of-fit tests that are powerful and References 1. Greenland S. Interpretation and choice of effect measures in epidemiologic convenient to apply for log-binomial models are analyses. Am J Epidemiol. 1987;125(5):761–8. warranted. 2. Zhang J, Yu KF. What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA. 1998;280(19):1690–1. 3. McNutt LA, Wu C, Xue X, Hafner JP. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol. 2003; Additional files 157(10):940–3. 4. Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vanderbroucke JP. Risk ratio and rate ratio estimation in case-cohort Additional file 1: Popularity of log-binomial and robust Poisson designs: hypertension and cardiovascular mortality. Stat Med. 1993;12(18): regression models – A Medline search. (DOCX 18 kb) 1733–45. Chen et al. BMC Medical Research Methodology (2018) 18:63 Page 12 of 12 5. Skov T, Deddens J, Petersen MR, Endahl L. Prevalence proportion ratios: estimation and hypothesis testing. Int J Epidemiol. 1998;27(1):91–5. 6. Greenland S. Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case- control studies. Am J Epidemiol. 2004;160(4):301–5. 7. Barros AJ, Hirakata VN. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol. 2003;3:21. 8. Wacholder S. Binomial regression in GLIM: estimating risk ratios and risk differences. Am J Epidemiol. 1986;123(1):174–84. 9. Zou G. A modified poisson regression approach to prospective studies with binary data. Am J Epidemiol. 2004;159(7):702–6. 10. Blizzard L, Hosmer DW. Parameter estimation and goodness-of-fit in log binomial regression. Biom J. 2006;48(1):5–22. 11. Petersen MR, Deddens JA. A comparison of two methods for estimating prevalence ratios. BMC Med Res Methodol. 2008;8:9. 12. Knol MJ, Le Cessie S, Algra A, Vandenbroucke JP, Groenwold RH. Overestimation of risk ratios by odds ratios in trials and cohort studies: alternatives to logistic regression. CMAJ. 2012;184(8):895–9. 13. Schatz M, Nakahiro R, Crawford W, Mendoza G, Mosen D, Stibolt TB. Asthma quality-of-care markers using administrative data. Chest. 2005;128(4):1968–73. 14. McCullagh P, Nelder JA. Generalized linear models. Second ed. Boca Raton: Chapman and Hall/CRC; 1989. 15. Green PJ. Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. J R Stat Soc Series B Stat Methodol. 1984;46(2):149–92. 16. Agresti A. Foundations of linear and generalized linear models. Hoboken: Wiley; 2015. 17. McCullagh P. Quasi-likelihood functions. Ann Stat. 1983;1:59–67. 18. Wedderburn RW. Quasi-likelihood functions, generalized linear models, and the gauss—Newton method. Biometrika. 1974;61(3):439–47. 19. Huber PJ. The behavior of maximum likelihood estimates under non- standard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. 1967;1:221–33. 20. White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982:1–25. 21. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42(1):121–30. 22. Carter RE, Lipsitz SR, Tilley BC. Quasi-likelihood estimation for relative risk regression models. Biostatistics (Oxford, England). 2005;6(1):39–44. 23. SAS. Software Version 9.3 of the SAS System for Unix. Cary: SAS Institute Inc; 24. Deddens JA, Petersen MR, Lei X. Estimation of prevalence ratios when PROC GENMOD does not converge. Proceedings of the 28th annual SAS users group international conference. 2003; Mar 30;30:270–28. 25. Petersen M, Deddens J. A revised SAS macro for maximum likelihood estimation of prevalence ratios using the COPY method. Occup Environ Med. 2009;66(9):639. 26. Lumley T, Kronmal R, Ma S. Relative risk regression in medical research: models, contrasts, estimators and algorithms. In: UW biostatistics working paper series 293. Seattle: University of Washington. http://www.bepress. com/uwbiostat/paper293. Accessed 13 June 2018. 27. Chen W, Shi J, Qian L, Azen SP. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study. BMC Med Res Methodol. 2014;14:82. 28. Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96(456):1387–96. 29. Presnell B, Boos DD. The IOS test for model misspecification. J Am Stat Assoc. 2004;99(465):216–27. 30. Capanu M, Presnell B. Misspecification tests for binomial and beta-binomial models. Stat Med. 2008;27(14):2536–54. 31. Richardson TS, Robins JM, Wang L. On modeling and estimation for the relative risk and risk difference. J Am Stat Assoc. 2017;112(519):1121–30. 32. Carroll R, Wang S, Simpson D, Stromberg A, Ruppert D. The sandwich (robust covariance matrix) estimator. Unpublished manuscript. 1998;

Journal

BMC Medical Research MethodologySpringer Journals

Published: Jun 22, 2018

There are no references for this article.