Comparing the performance of different multiple imputation strategies for missing binary outcomes in cluster randomized trials: a simulation study
Jinhui Ma,1–3 Parminder Raina,1,2 Joseph Beyene,1 Lehana Thabane1,3–5
1Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada; 2McMaster University Evidence-based Practice Center, Hamilton, ON, Canada; 3Biostatistics Unit, St Joseph's Healthcare Hamilton, Hamilton, ON, Canada; 4Centre for Evaluation of Medicines, St Joseph's Healthcare Hamilton, Hamilton, ON, Canada; 5Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada
Introduction: Although researchers have proposed various strategies to handle missing outcomes in cluster randomized trials (CRTs), limited attention has been paid to the performance of these strategies. Under the assumption of covariate-dependent missingness, the objective of this simulation study is to compare the performance of various strategies in handling missing binary outcomes in CRTs under different design settings.
Methods: There are six missing data strategies investigated in this paper, which include complete case analysis, standard multiple imputation (MI) strategies using either logistic regression or Markov chain Monte Carlo (MCMC) method, within-cluster MI strategies using either logistic regression or MCMC method, and MI using logistic regression with cluster as a fixed effect. The performance of these strategies is evaluated through bias, empirical standard error, root mean squared error, and coverage probability.
Results: Under the assumption of covariate-dependent missingness and applying the generalized estimating equations approach for fitting the logistic regression, it was shown that complete case analysis yields valid inferences when the percentage of missing outcomes is not large (<20%) for all the designs of CRTs considered in this paper. Standard MI strategies can be adopted when the design effect is small (variance inflation factor [VIF] ≤ 3); however, they tend to underestimate the standard error of treatment effect when the design effect is large. Within-cluster MI strategy using logistic regression is valid for imputation of missing data from CRTs, especially when the cluster size is large (>50) and the design effect is large (VIF > 3). In contrast, within-cluster MI strategy using MCMC method may yield biased estimates of treatment effect for CRTs with small cluster size (≤50). MI using logistic regression with cluster as a fixed effect may substantially overestimate the standard error of the estimated treatment effect when the intracluster correlation coefficient is small. It may also lead to biased estimated treatment effect.
Conclusion: Findings from this simulation study provide researchers with quantitative evidence to guide selection of an appropriate strategy to deal with missing binary outcomes.
Keywords: missing data, design effect, variance inflation factor
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.Download Article [PDF]