Handling Missing Values in Interrupted Time Series Analysis of Longitudinal Individual-Level Data
Received 6 June 2020
Accepted for publication 16 August 2020
Published 8 October 2020 Volume 2020:12 Pages 1045—1057
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Dr Eyal Cohen
Juan Carlos Bazo-Alvarez,1,2 Tim P Morris,3 Tra My Pham,3 James R Carpenter,3,4 Irene Petersen1,5
1Research Department of Primary Care and Population Health, University College London (UCL), London, UK; 2Instituto de Investigación, Universidad Católica Los Ángeles de Chimbote, Chimbote, Peru; 3MRC Clinical Trials Unit at UCL, London, UK; 4Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK; 5Department of Clinical Epidemiology, Aarhus University, Aarhus, Denmark
Correspondence: Juan Carlos Bazo-Alvarez
Research Department of Primary Care and Population Health, University College London (UCL), Rowland Hill Street, London NW3 2PF, UK
Tel +44 7376076260
Background: In the interrupted time series (ITS) approach, it is common to average the outcome of interest at each time point and then perform a segmented regression (SR) analysis. In this study, we illustrate that such ‘aggregate-level’ analysis is biased when data are missing at random (MAR) and provide alternative analysis methods.
Methods: Using electronic health records from the UK, we evaluated weight change over time induced by the initiation of antipsychotic treatment. We contrasted estimates from aggregate-level SR analysis against estimates from mixed models with and without multiple imputation of missing covariates, using individual-level data. Then, we conducted a simulation study for insight about the different results in a controlled environment.
Results: Aggregate-level SR analysis suggested a substantial weight gain after initiation of treatment (average short-term weight change: 0.799kg/week) compared to mixed models (0.412kg/week). Simulation studies confirmed that aggregate-level SR analysis was biased when data were MAR. In simulations, mixed models gave less biased estimates than SR analysis and, in combination with multilevel multiple imputation, provided unbiased estimates. Mixed models with multiple imputation can be used with other types of ITS outcomes (eg, proportions). Other standard methods applied in ITS do not help to correct this bias problem.
Conclusion: Aggregate-level SR analysis can bias the ITS estimates when individual-level data are MAR, because taking averages of individual-level data before SR means that data at the cluster level are missing not at random. Avoiding the averaging-step and using mixed models with or without multilevel multiple imputation of covariates is recommended.
Keywords: interrupted time series analysis, segmented regression, missing data, multiple imputation, mixed effects models, electronic health records, big data
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.Download Article [PDF] View Full Text [HTML][Machine readable]