Missing Data in Cluster Randomised Trials
Hossain, A; (2017) Missing Data in Cluster Randomised Trials. PhD (research paper style) thesis, London School of Hygiene & Tropical Medicine. DOI: https://doi.org/10.17037/PUBS.04646133

Text
 Accepted Version
License: Download (3MB)  Preview 
Abstract
Missing outcomes are a commonly occurring problem in cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Handling missing data in CRTs is complicated due to the hierarchical structure of the data. Two approaches for analysing such trials are clusterlevel analysis and individual level analysis. An assumption regarding missing outcomes in CRTs that is sometimes plausible is that missingness depends on baseline covariates, but conditioning on these baseline covariates, not on the outcome itself, which is known as a covariate dependent missingness (CDM) mechanism. The aim of my thesis was to investigate the validity of the approaches to the analysis of CRTs for the three most common outcome types: continuous, binary and timetoevent, when outcomes are missing under the assumption of CDM. Missing outcomes were handled using complete records analysis (CRA) and multilevel multiple imputation (MMI). We investigated analytically, and through simulations, the validity of the different combinations of the analysis model and missing data handling approach for each of the three outcome types. Simulations studies were performed considering scenarios depending on whether the missingness mechanism is the same between the intervention groups and whether the covariate effect is the same between the intervention groups in the outcome model. Based on our analytical and simulations results, we give recommendations for which methods to use when the CDM assumption is thought to be plausible for missing outcomes. The key findings of this thesis are as follows. Continuous outcomes: Clusterlevel analyses using CRA are in general biased unless the intervention groups have the same missingness mechanism and the same covariate effects on outcome in the data generating model. In the case of individuallevel analysis, the linear mixed model (LMM) using CRA adjusted for covariates such that the CDM assumption holds gives unbiased estimates of intervention effect regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariates in the data generating model for outcome, provided that such interaction is included in the model when required. There is no gain in terms of bias or efficiency of the estimates using MMI over CRA as long as both approaches use the same functional form of the same set of baseline covariates. Binary outcomes: The adjusted clusterlevel estimator for estimating risk ratio (RR) using full data is consistent if the true data generating model is a log link model, the functional form of the baseline covariates is the same between the intervention groups, and the random effects distribution is the same between the intervention groups. Clusterlevel analyses using CRA for estimating risk difference (RD) are in general biased. For estimating RR, clusterlevel analyses using CRA are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same functional form of the covariates in the outcome model. In contrast, MMI followed by clusterlevel analyses gives valid inferences for estimating RD and RR regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in the imputation model when required. In the case of individuallevel analysis, both random effects logistic regression (RELR) and generalised estimating equations (GEE) give valid inferences using both CRA (adjusted for covariates such that the CDM assumption holds) and MMI regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in both the imputation model and the analysis model when required. Like continuous outcomes, in the absence of auxiliary variables, there is no benefit in performing MMI rather than doing CRA in terms of bias or efficiency of the estimates. TimetoEvent outcomes: In the case of censored data, the unadjusted clusterlevel analysis for estimating rate ratio (RaR) is consistent when the event rates are small and the covariate effects are the same between the intervention groups. In contrast, the adjusted clusterlevel analysis for estimating RaR is consistent for any event rates when the the covariate effects are the same between the intervention groups. The gamma shared frailty model as an individuallevel analysis underestimates the standard errors (SEs) of the estimates when each intervention group has small number of clusters. The Williams approach performs better than the Greenwood approach for estimating the SEs of KaplanMeier (KM) estimates unless the event rate is low and the value of intraclass correlation coefficient is very small.Missing outcomes are a commonly occurring problem in cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Handling missing data in CRTs is complicated due to the hierarchical structure of the data. Two approaches for analysing such trials are clusterlevel analysis and individuallevel analysis. An assumption regarding missing outcomes in CRTs that is sometimes plausible is that missingness depends on baseline covariates, but conditioning on these baseline covariates, not on the outcome itself, which is known as a covariate dependent missingness (CDM) mechanism. The aim of my thesis was to investigate the validity of the approaches to the analysis of CRTs for the three most common outcome types: continuous, binary and timetoevent, when outcomes are missing under the assumption of CDM. Missing outcomes were handled using complete records analysis (CRA) and multilevel multiple imputation (MMI). We investigated analytically, and through simulations, the validity of the different combinations of the analysis model and missing data handling approach for each of the three outcome types. Simulations studies were performed considering scenarios depending on whether the missingness mechanism is the same between the intervention groups and whether the covariate effect is the same between the intervention groups in the outcome model. Based on our analytical and simulations results, we give recommendations for which methods to use when the CDM assumption is thought to be plausible for missing outcomes. The key findings of this thesis are as follows. Continuous outcomes: Clusterlevel analyses using CRA are in general biased unless the intervention groups have the same missingness mechanism and the same covariate effects on outcome in the data generating model. In the case of individuallevel analysis, the linear mixed model (LMM) using CRA adjusted for covariates such that the CDM assumption holds gives unbiased estimates of intervention effect regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariates in the data generating model for outcome, provided that such interaction is included in the model when required. There is no gain in terms of bias or efficiency of the estimates using MMI over CRA as long as both approaches use the same functional form of the same set of baseline covariates. Binary outcomes: The adjusted clusterlevel estimator for estimating risk ratio (RR) using full data is consistent if the true data generating model is a log link model, the functional form of the baseline covariates is the same between the intervention groups, and the random effects distribution is the same between the intervention groups. Clusterlevel analyses using CRA for estimating risk difference (RD) are in general biased. For estimating RR, clusterlevel analyses using CRA are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same functional form of the covariates in the outcome model. In contrast, MMI followed by clusterlevel analyses gives valid inferences for estimating RD and RR regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in the imputation model when required. In the case of individuallevel analysis, both random effects logistic regression (RELR) and generalised estimating equations (GEE) give valid inferences using both CRA (adjusted for covariates such that the CDM assumption holds) and MMI regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in both the imputation model and the analysis model when required. Like continuous outcomes, in the absence of auxiliary variables, there is no benefit in performing MMI rather than doing CRA in terms of bias or efficiency of the estimates. TimetoEvent outcomes: In the case of censored data, the unadjusted clusterlevel analysis for estimating rate ratio (RaR) is consistent when the event rates are small and the covariate effects are the same between the intervention groups. In contrast, the adjusted clusterlevel analysis for estimating RaR is consistent for any event rates when the the covariate effects are the same between the intervention groups. The gamma shared frailty model as an individuallevel analysis underestimates the standard errors (SEs) of the estimates when each intervention group has small number of clusters. The Williams approach performs better than the Greenwood approach for estimating the SEs of KaplanMeier (KM) estimates unless the event rate is low and the value of intraclass correlation coefficient is very small.
Item Type:  Thesis 

Thesis Type:  Doctoral 
Thesis Name:  PhD (research paper style) 
Contributors:  Bartlett, JW (Thesis advisor); DiazOrdaz, K (Thesis advisor); Allen, E (Thesis advisor); 
Faculty and Department:  Faculty of Epidemiology and Population Health > Dept of Medical Statistics 
Funders:  Economic and Social Research Council, Charles Wallace Bangladesh Trust 
Copyright Holders:  Md Anower Hossain 
URI:  http://researchonline.lshtm.ac.uk/id/eprint/4646133 
Statistics
Accesses by country  last 12 months  Accesses by referrer  last 12 months 
Actions (login required)
Edit Item 