Analyzing Coarsened and Missing Data by Imputation Methods.

Lars LJ van der Burg ORCID logo ; Stefan Böhringer ; Jonathan W Bartlett ORCID logo ; Tjalling Bosse ; Nanda Horeweg ORCID logo ; Liesbeth C de Wreede ; Hein Putter ORCID logo ; (2025) Analyzing Coarsened and Missing Data by Imputation Methods. Statistics in medicine, 44 (6). e70032-. ISSN 0277-6715 DOI: 10.1002/sim.70032
Copy

In various missing data problems, values are not entirely missing, but are coarsened. For coarsened observations, instead of observing the true value, a subset of values - strictly smaller than the full sample space of the variable - is observed to which the true value belongs. In our motivating example for patients with endometrial carcinoma, the degree of lymphovascular space invasion (LVSI) can be either absent, focally present, or substantially present. For a subset of individuals, however, LVSI is reported as being present, which includes both non-absent options. In the analysis of such a dataset, difficulties arise when coarsened observations are to be used in an imputation procedure. To our knowledge, no clear-cut method has been described in the literature on how to handle an observed subset of values, and treating them as entirely missing could lead to biased estimates. Therefore, in this paper, we evaluated the best strategy to deal with coarsened and missing data in multiple imputation. We tested a number of plausible ad hoc approaches, possibly already in use by statisticians. Additionally, we propose a principled approach to this problem, consisting of an adaptation of the SMC-FCS algorithm (SMC-FCS   CoCo $$ {}_{\mathrm{CoCo}} $$ : Coarsening compatible), that ensures that imputed values adhere to the coarsening information. These methods were compared in a simulation study. This comparison shows that methods that prevent imputations of incompatible values, like the SMC-FCS   CoCo $$ {}_{\mathrm{CoCo}} $$ method, perform consistently better in terms of a lower bias and RMSE, and achieve better coverage than methods that ignore coarsening or handle it in a more naïve way. The analysis of the motivating example shows that the way the coarsening information is handled can matter substantially, leading to different conclusions across methods. Overall, our proposed SMC-FCS   CoCo $$ {}_{\mathrm{CoCo}} $$ method outperforms other methods in handling coarsened data, requires limited additional computation cost and is easily extendable to other scenarios.

picture_as_pdf

picture_as_pdf
van-der-Burg-etal-2025-Analyzing-Coarsened-and-Missing-Data-by-Imputation-Methods.pdf
subject
Published Version
Available under Creative Commons: Attribution 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads