Modelling relative survival in the presence of incomplete data: a tutorial.

Nur, U; Shack, LG; Rachet, B; Carpenter, JR; Coleman, MP; (2010) Modelling relative survival in the presence of incomplete data: a tutorial. International journal of epidemiology, 39 (1). pp. 118-28. ISSN 0300-5771 DOI:

Full text not available from this repository. (Request a copy)


BACKGROUND: Missing data frequently create problems in the analysis of population-based data sets, such as those collected by cancer registries. Restriction of analysis to records with complete data may yield inferences that are substantially different from those that would have been obtained had no data been missing. 'Naive' methods for handling missing data, such as restriction of the analysis to complete records or creation of a 'missing' category, have drawbacks that can invalidate the conclusions from the analysis. We offer a tutorial on modern methods for handling missing data in relative survival analysis. METHODS: We estimated relative survival for 29 563 colorectal cancer patients who were diagnosed between 1997 and 2004 and registered in the North West Cancer Intelligence Service. The method of multiple imputation (MI) was applied to account for the common example of incomplete stage at diagnosis, under the missing at random (MAR) assumption. Multivariable regression with a generalized linear model and Poisson error structure was then used to estimate the excess hazard of death of the colorectal cancer patients, over and above the background mortality, adjusting for significant predictors of mortality. RESULTS: Incomplete information on stage, morphology and grade meant that only 55% of the data could be included in the 'complete-case' analysis. All cases could be included after indicator method (IM) or MI method. Handling missing data by MI produced a significantly lower estimate of the excess mortality for stage, morphology and grade, with the largest reductions occurring for late-stage and high-grade tumours, when compared with the results of complete-case analysis. CONCLUSION: In complete-case analysis, almost 50% of the information could not be included, and with the IM, all records with missing values for stage were combined into a single 'missing' category. We show that MI methods greatly improved the results by exploiting all the information in the incomplete records. This method also helped to ensure efficient inferences about survival were made from the multivariate regression analyses.

Item Type: Article
Faculty and Department: Faculty of Epidemiology and Population Health > Dept of Non-Communicable Disease Epidemiology
Faculty of Epidemiology and Population Health > Dept of Medical Statistics
Research Centre: Cancer Survival Group
Centre for Global Non-Communicable Diseases (NCDs)
PubMed ID: 19858106
Web of Science ID: 274491000021


Download activity - last 12 months
Downloads since deposit
Accesses by country - last 12 months
Accesses by referrer - last 12 months
Impact and interest
Additional statistics for this record are available via IRStats2

Actions (login required)

Edit Item Edit Item