What are the implications of using individual and combined sources of routinely collected data to identify and characterise incident site-specific cancers? a concordance and validation study using linked English electronic health records data.

Helen Strongman

; Rachael Williams ; Krishnan Bhaskaran

; (2020) What are the implications of using individual and combined sources of routinely collected data to identify and characterise incident site-specific cancers? a concordance and validation study using linked English electronic health records data. BMJ OPEN, 10 (8). e037719-. ISSN 2044-6055 DOI: 10.1136/bmjopen-2020-037719

Copy

OBJECTIVES: To describe the benefits and limitations of using individual and combinations of linked English electronic health data to identify incident cancers. DESIGN AND SETTING: Our descriptive study uses linked English Clinical Practice Research Datalink primary care; cancer registration; hospitalisation and death registration data. PARTICIPANTS AND MEASURES: We implemented case definitions to identify first site-specific cancers at the 20 most common sites, based on the first ever cancer diagnosis recorded in each individual or commonly used combination of data sources between 2000 and 2014. We calculated positive predictive values and sensitivities of each definition, compared with a gold standard algorithm that used information from all linked data sets to identify first cancers. We described completeness of grade and stage information in the cancer registration data set. RESULTS: 165 953 gold standard cancers were identified. Positive predictive values of all case definitions were ≥80% and ≥94% for the four most common cancers (breast, lung, colorectal and prostate). Sensitivity for case definitions that used cancer registration alone or in combination was ≥92% for the four most common cancers and ≥80% across all cancer sites except bladder cancer (65% using cancer registration alone). For case definitions using linked primary care, hospitalisation and death registration data, sensitivity was ≥89% for the four most common cancers, and ≥80% for all cancer sites except kidney (69%), oral cavity (76%) and ovarian cancer (78%). When primary care or hospitalisation data were used alone, sensitivities were generally lower and diagnosis dates were delayed. Completeness of staging data in cancer registration data was high from 2012 (minimum 76.0% in 2012 and 86.4% in 2014 for the four most common cancers). CONCLUSIONS: Ascertainment of incident cancers was good when using cancer registration data alone or in combination with other data sets, and for the majority of cancers when using a combination of primary care, hospitalisation and death registration data.

Item Type	Article
Elements ID	150465
Date Deposited	14 Jan 2021 12:23

Explore Further

Dept of Non-Communicable Disease Epidemiology

EHR Research Group

BMJ OPEN

picture_as_pdf: e037719.full.pdf
subject: Published Version
: Available under Creative Commons: Attribution-NonCommercial-No Derivative Works 3.0

View

Download

Atom

BibTeX

OpenURL ContextObject in Span

Multiline CSV

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

JSON

MARC (ASCII)

MARC (ISO 2709)

METS

MODS

RDF+N3

RDF+N-Triples

RDF+XML

RIOXX2 XML

Reference Manager

Refer

Simple Metadata

ASCII Citation

EP3 XML

Export

Downloads