Missing Data in Cluster Randomised Trials

Abstract

Missing outcomes are a commonly occurring problem in cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Handling missing data in CRTs is complicated due to the hierarchical structure of the data. Two approaches for analysing such trials are cluster-level analysis and individual level analysis. An assumption regarding missing outcomes in CRTs that is sometimes plausible is that missingness depends on baseline covariates, but conditioning on these baseline covariates, not on the outcome itself, which is known as a covariate dependent missingness (CDM) mechanism. The aim of my thesis was to investigate the validity of the approaches to the analysis of CRTs for the three most common outcome types: continuous, binary and time-to-event, when outcomes are missing under the assumption of CDM. Missing outcomes were handled using complete records analysis (CRA) and multilevel multiple imputation (MMI). We investigated analytically, and through simulations, the validity of the different combinations of the analysis model and missing data handling approach for each of the three outcome types. Simulations studies were performed considering scenarios depending on whether the missingness mechanism is the same between the intervention groups and whether the covariate effect is the same between the intervention groups in the outcome model. Based on our analytical and simulations results, we give recommendations for which methods to use when the CDM assumption is thought to be plausible for missing outcomes. The key findings of this thesis are as follows. Continuous outcomes: Cluster-level analyses using CRA are in general biased unless the intervention groups have the same missingness mechanism and the same covariate effects on outcome in the data generating model. In the case of individual-level analysis, the linear mixed model (LMM) using CRA adjusted for covariates such that the CDM assumption holds gives unbiased estimates of intervention effect regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariates in the data generating model for outcome, provided that such interaction is included in the model when required. There is no gain in terms of bias or efficiency of the estimates using MMI over CRA as long as both approaches use the same functional form of the same set of baseline covariates. Binary outcomes: The adjusted cluster-level estimator for estimating risk ratio (RR) using full data is consistent if the true data generating model is a log link model, the functional form of the baseline covariates is the same between the intervention groups, and the random effects distribution is the same between the intervention groups. Cluster-level analyses using CRA for estimating risk difference (RD) are in general biased. For estimating RR, cluster-level analyses using CRA are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same functional form of the covariates in the outcome model. In contrast, MMI followed by cluster-level analyses gives valid inferences for estimating RD and RR regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in the imputation model when required. In the case of individual-level analysis, both random effects logistic regression (RELR) and generalised estimating equations (GEE) give valid inferences using both CRA (adjusted for covariates such that the CDM assumption holds) and MMI regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in both the imputation model and the analysis model when required. Like continuous outcomes, in the absence of auxiliary variables, there is no benefit in performing MMI rather than doing CRA in terms of bias or efficiency of the estimates. Time-to-Event outcomes: In the case of censored data, the unadjusted cluster-level analysis for estimating rate ratio (RaR) is consistent when the event rates are small and the covariate effects are the same between the intervention groups. In contrast, the adjusted cluster-level analysis for estimating RaR is consistent for any event rates when the the covariate effects are the same between the intervention groups. The gamma shared frailty model as an individual-level analysis underestimates the standard errors (SEs) of the estimates when each intervention group has small number of clusters. The Williams approach performs better than the Greenwood approach for estimating the SEs of Kaplan-Meier (KM) estimates unless the event rate is low and the value of intraclass correlation coefficient is very small.Missing outcomes are a commonly occurring problem in cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Handling missing data in CRTs is complicated due to the hierarchical structure of the data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. An assumption regarding missing outcomes in CRTs that is sometimes plausible is that missingness depends on baseline covariates, but conditioning on these baseline covariates, not on the outcome itself, which is known as a covariate dependent missingness (CDM) mechanism. The aim of my thesis was to investigate the validity of the approaches to the analysis of CRTs for the three most common outcome types: continuous, binary and time-to-event, when outcomes are missing under the assumption of CDM. Missing outcomes were handled using complete records analysis (CRA) and multilevel multiple imputation (MMI). We investigated analytically, and through simulations, the validity of the different combinations of the analysis model and missing data handling approach for each of the three outcome types. Simulations studies were performed considering scenarios depending on whether the missingness mechanism is the same between the intervention groups and whether the covariate effect is the same between the intervention groups in the outcome model. Based on our analytical and simulations results, we give recommendations for which methods to use when the CDM assumption is thought to be plausible for missing outcomes. The key findings of this thesis are as follows. Continuous outcomes: Cluster-level analyses using CRA are in general biased unless the intervention groups have the same missingness mechanism and the same covariate effects on outcome in the data generating model. In the case of individual-level analysis, the linear mixed model (LMM) using CRA adjusted for covariates such that the CDM assumption holds gives unbiased estimates of intervention effect regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariates in the data generating model for outcome, provided that such interaction is included in the model when required. There is no gain in terms of bias or efficiency of the estimates using MMI over CRA as long as both approaches use the same functional form of the same set of baseline covariates. Binary outcomes: The adjusted cluster-level estimator for estimating risk ratio (RR) using full data is consistent if the true data generating model is a log link model, the functional form of the baseline covariates is the same between the intervention groups, and the random effects distribution is the same between the intervention groups. Cluster-level analyses using CRA for estimating risk difference (RD) are in general biased. For estimating RR, cluster-level analyses using CRA are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same functional form of the covariates in the outcome model. In contrast, MMI followed by cluster-level analyses gives valid inferences for estimating RD and RR regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in the imputation model when required. In the case of individual-level analysis, both random effects logistic regression (RELR) and generalised estimating equations (GEE) give valid inferences using both CRA (adjusted for covariates such that the CDM assumption holds) and MMI regardless of whether the missingness mechanism is the same or different between the intervention groups, and whether there is an interaction between intervention and baseline covariate in the outcome model, provided that such interaction is included in both the imputation model and the analysis model when required. Like continuous outcomes, in the absence of auxiliary variables, there is no benefit in performing MMI rather than doing CRA in terms of bias or efficiency of the estimates. Time-to-Event outcomes: In the case of censored data, the unadjusted cluster-level analysis for estimating rate ratio (RaR) is consistent when the event rates are small and the covariate effects are the same between the intervention groups. In contrast, the adjusted cluster-level analysis for estimating RaR is consistent for any event rates when the the covariate effects are the same between the intervention groups. The gamma shared frailty model as an individual-level analysis underestimates the standard errors (SEs) of the estimates when each intervention group has small number of clusters. The Williams approach performs better than the Greenwood approach for estimating the SEs of Kaplan-Meier (KM) estimates unless the event rate is low and the value of intraclass correlation coefficient is very small.

Additional Information

Item Type	Thesis
Thesis Type	Doctoral
Thesis Name	PhD (research paper style)
Contributors	Bartlett, Jonathan; Diaz-ordaz, Karla and Allen, Elizabeth
Faculty and Department	Faculty of Epidemiology and Population Health > Dept of Medical Statistics
Funder Name	Economic and Social Research Council, Charles Wallace Bangladesh Trust
Copyright Holders	Md Anower Hossain