Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis.

Romain Derelle ; Johanna von Wachsmann ; Tommi Mäklin ORCID logo ; Joel Hellewell ; Timothy Russell ORCID logo ; Ajit Lalvani ; Leonid Chindelevitch ORCID logo ; Nicholas J Croucher ; Simon R Harris ; John A Lees ORCID logo ; (2024) Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis. Genome research, 34 (10). pp. 1661-1673. ISSN 1088-9051 DOI: 10.1101/gr.279449.124
Copy

Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split k-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.


picture_as_pdf
Derelle-etal-2024-Seamless-rapid-and-accurate-analyses.pdf
subject
Published Version
Available under Creative Commons: Attribution 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads