Making metadata machine-readable as the first step to FAIR population health data
Background Metadata describes and provides context for other data and plays a pivotal role in enabling the FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empowers both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse. Objective To address these challenges, we propose a comprehensive framework utilizing standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing its FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications. Methods: The framework implements a three-stage approach: 1. DDI (Data Documentation Initiative) Integration: Leveraging the DDI Codebook metadata, detailed information for data and associated assets is documented, ensuring transparency and comprehensiveness. 2. OMOP CDM (Observational Medical Outcomes Partnership Common Data Model) Standardization: Data is harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous datasets. 3. Schema.org and JSON-LD (JavaScript Object Notation for Linked Data) Integration: Machine-readable metadata is generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these three stages using the infectious disease surveillance and response (IDSR) data from Malawi and Kenya. Results The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms like Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the utilization of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via GitHub repository, and the HTML code integrated with JSON-LD is available on the The Implementation Network for Sharing Population Information from Research Entities (INSPIRE) website. Conclusion The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries (LMICs). Machine-readable metadata can accelerate research, improve healthcare decision-making, and ultimately promote better health outcomes for populations worldwide.
Item Type | Article |
---|---|
Elements ID | 224581 |
Official URL | https://ojphi.jmir.org/ |
Date Deposited | 30 May 2024 12:53 |