Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification.
Nevalainen, Jaakko;
Kenward, Michael G;
Virtanen, Suvi M;
(2009)
Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification.
Statistics in medicine, 28 (29).
pp. 3657-3669.
ISSN 0277-6715
DOI: https://doi.org/10.1002/sim.3731
Permanent Identifier
Use this Digital Object Identifier when citing or linking to this resource.
Multiple imputation (MI) has increasingly received attention as a flexible tool to resolve missing data problems both in observational and controlled studies. Our goal has been to develop a valid and efficient MI procedure for the Diabetes Prediction and Prevention Nutrition Study, in which the diet of a cohort of newborn children with HLA-DQB1-conferred susceptibility to type 1 diabetes is repeatedly measured by 3-day food records over early childhood. The estimation of risk is based on a nested case-control design setup within the cohort. We have used an iterative procedure known as the fully conditional specification (FCS) to generate appropriate values for the missing dietary data, here playing the role of time-dependent covariates. Our method extends the standard FCS to repeated measurements settings with the possibility of non-monotone missingness patterns by being doubly iterative over the follow-up time of the individuals. In addition, our proposed procedure is nonparametric in the sense that the variables can have distributions deviating strongly from normality: it makes use of quantile normal scores to transform to normality, performs imputations, and transforms back to the original scale. By the use of a moving time window and stepwise regression procedures, the two-fold FCS method operates well with a great number of variables each measured repeatedly over time. Extensive simulation studies demonstrate that the procedure together with the proposed transformations and variable selection methods provides tools for valid and efficient statistical inference in the nested case-control setting, and its applications extend beyond that.