Analyzing Coarsened and Missing Data by Imputation Methods.

van der Burg, Lars LJ

; Böhringer, Stefan; Bartlett, Jonathan W

; Bosse, Tjalling; Horeweg, Nanda

; de Wreede, Liesbeth C; and Putter, Hein

(2025) Analyzing Coarsened and Missing Data by Imputation Methods. Statistics in medicine, 44 (6). e70032-. ISSN 0277-6715 DOI: 10.1002/sim.70032

Copy

In various missing data problems, values are not entirely missing, but are coarsened. For coarsened observations, instead of observing the true value, a subset of values - strictly smaller than the full sample space of the variable - is observed to which the true value belongs. In our motivating example for patients with endometrial carcinoma, the degree of lymphovascular space invasion (LVSI) can be either absent, focally present, or substantially present. For a subset of individuals, however, LVSI is reported as being present, which includes both non-absent options. In the analysis of such a dataset, difficulties arise when coarsened observations are to be used in an imputation procedure. To our knowledge, no clear-cut method has been described in the literature on how to handle an observed subset of values, and treating them as entirely missing could lead to biased estimates. Therefore, in this paper, we evaluated the best strategy to deal with coarsened and missing data in multiple imputation. We tested a number of plausible ad hoc approaches, possibly already in use by statisticians. Additionally, we propose a principled approach to this problem, consisting of an adaptation of the SMC-FCS algorithm (SMC-FCS CoCo $$ {}_{\mathrm{CoCo}} $$ : Coarsening compatible), that ensures that imputed values adhere to the coarsening information. These methods were compared in a simulation study. This comparison shows that methods that prevent imputations of incompatible values, like the SMC-FCS CoCo $$ {}_{\mathrm{CoCo}} $$ method, perform consistently better in terms of a lower bias and RMSE, and achieve better coverage than methods that ignore coarsening or handle it in a more naïve way. The analysis of the motivating example shows that the way the coarsening information is handled can matter substantially, leading to different conclusions across methods. Overall, our proposed SMC-FCS CoCo $$ {}_{\mathrm{CoCo}} $$ method outperforms other methods in handling coarsened data, requires limited additional computation cost and is easily extendable to other scenarios.

Item Type	Article
Elements ID	237250
Date Deposited	11 Mar 2025 13:20

Explore Further

Bartlett, Jonathan

Dept of Medical Statistics

Statistics in medicine

picture_as_pdf

picture_as_pdf: van-der-Burg-etal-2025-Analyzing-Coarsened-and-Missing-Data-by-Imputation-Methods.pdf
subject: Published Version
: Available under Creative Commons: Attribution 4.0

View

Download

Atom

BibTeX

OpenURL ContextObject in Span

Multiline CSV

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

JSON

MARC (ASCII)

MARC (ISO 2709)

METS

MODS

RDF+N3

RDF+N-Triples

RDF+XML

RIOXX2 XML

Reference Manager

Refer

Simple Metadata

ASCII Citation

EP3 XML

Export

Downloads