Pf8: an open dataset of Plasmodium falciparum genome variation in 33,325 worldwide samples [version 1; peer review: awaiting peer review]

Muzamil Mahdi Abdel Hamid ; Mohamed Hassan Abdelraheem ; Desmond Omane Acheampong ORCID logo ; Ishag Adam ; Pedro Aide ; Olusola Ajibaye ORCID logo ; Mozam Ali ; Jacob Almagro-Garcia ; Alfred Amambua-Ngwa ; Lucas Amenga-Etego ORCID logo ; +104 more... Ifeyinwa Aniebo ; Enoch Aninagyei ORCID logo ; Felix Ansah ; Tobias O Apinjoh ; Cristina V Ariani ; Sarah Auburn ; Gordon A Awandare ; Andrew Balmer ; Philip Bejon ORCID logo ; Simone Boene ORCID logo ; George Bwire ; Baltazar Candrinho ; Arlindo Chidimatembue ; Keobouphaphone Chindavongsa ; Kiba Comiche ; David Conway ORCID logo ; Antoine Dara ; Mahamadou Diakite ORCID logo ; Abdoulaye Djimde ORCID logo ; Arjen Dondorp ORCID logo ; Seydou Doumbia ; Eleanor Drury ; Caterina A Fanello ORCID logo ; Mike Ferdig ; Katherine Figueroa ; Dionicia Gamboa ; Lemu Golassa ; Sónia Gonçalves ; Merepen dite Agnes Guindo ; Mainga Hamaluba ORCID logo ; Borimas Hanboonkunupakarn ; Kevin Howe ; Maazza Hussien ; Mallika Imwong ; Deus Ishengoma ; Julia Jeans ; Alinune Kabaghe ; Appolinary Kamuhabwa ; Jean-Marie Kindermans ; Drissa S Konate ORCID logo ; Dominic P Kwiatkowski ; Chiyun Lee ; Samuel K Lee ; Sue J Lee ORCID logo ; Benedikt Ley ORCID logo ; Alejandro Llanos-Cuentas ORCID logo ; Jutta Marfurt ; Glória Matambisso ; Rapeephan Rattanawongnara Maude ; Richard James Maude ORCID logo ; Alfredo Mayor ; Mayfong Mayxay ORCID logo ; Oumou Maïga-Ascofaré ; Robert S McCann ; Alistair Miles ; Olivo Miotto ; Abdelrahim Osman Mohamed ORCID logo ; Collins Misita Morang’a ORCID logo ; Kathryn Murie ; Billy Ephraim Ngasala ORCID logo ; Thuy-Nhien Nguyen ORCID logo ; Oscar Nolasco ; Francois Nosten ORCID logo ; Rintis Noviyanti ; Ísla O'Connor ; Mary Oboh ; Lynette Isabella Ochola-Oyier ; Catherine Olufunke Falade ; Adeola Olukosi ; Ajibola Olumide ; Fiyinfoluwa I Olusola ; Marie A Onyamboko ORCID logo ; Eniyou Cheryll Oriero ; Wellington Aghoghovwia Oyibo ; Danielle Pannebaker ; Richard D Pearson ORCID logo ; Kamija Phiri ORCID logo ; Rob W van der Pluijm ; Ric N Price ORCID logo ; Huynh Hong Quang ; Vinoth Rajkumar Devaraju ; Milijaona Randrianarivelojosia ORCID logo ; Lisa Ranford-Cartwright ORCID logo ; Julian C Rayner ORCID logo ; Eduard Rovira-Vallbona ORCID logo ; Katherine Rowlands ; Valentin Ruano-Rubio ; Juan F Sanchez ; Francisco Saúte ; Shuwaram Shettima ; Clemente da Silva ; Victoria J Simpson ; Simon Suddaby ; Willem Takken ; Aung Myint Thu ; Mahamoudou Toure ; Eyyub Unlu ; Hugo O Valdivia ORCID logo ; Michele van Vugt ; Naomi Waithira ORCID logo ; Thomas Wellems ORCID logo ; Jason Wendler ; Nina White ORCID logo ; Rachel Wuendrich Ogidan ; (2025) Pf8: an open dataset of Plasmodium falciparum genome variation in 33,325 worldwide samples [version 1; peer review: awaiting peer review]. Wellcome Open Research, 10. p. 325. ISSN 2398-502X DOI: 10.12688/wellcomeopenres.24031.1
Copy

We describe the Pf8 data resource, the latest MalariaGEN release of curated genome variation data on over 33,000 Plasmodium falciparum samples from 99 partner studies and 122 locations over more than 50 years. This release provides open access to raw sequencing data and genotypes at over 12 million genomic positions. For the first time, it includes copy-number variation (CNV) calls in the drug-resistance associated genes gch1 and crt. As in Pf7, CNV calls are provided for mdr1 and plasmepsin2/3, along with calls for deletion in hrp2 and hrp3, genes associated with rapid diagnostic test failures. This data resource additionally features derived datasets, interactive web applications for exploring patterns of drug resistance and variation in over 5,000 genes, an updated Python package providing methods for accessing and analysing the data, and open access analysis notebooks that can be used as starting points for further analyses. In addition, informative example analyses show contrasting profiles of the decline of chloroquine resistance-associated mutations in Africa, and variation in copy number variation across 10 distinct sub-populations. To the best of our knowledge, Pf8 is the largest open data set of genome variation in any eukaryotic species, making it an invaluable foundational resource for understanding evolution, including that of pathogens.


picture_as_pdf
Hamid-etal-2025-Pf8-an-open-dataset-of-plasmodium.pdf
subject
Published Version
Available under Creative Commons: Attribution 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads