Abstract
Background
Klebsiella pneumoniaeand close relatives are a growing cause of healthcare-associated infections for which increasing rates of multi-drug resistance are a major concern. TheKlebsiellapolysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible, and many isolates are serologically non-typeable. Molecular methods for capsular typing are needed, but existing methods lack sensitivity and specificity and fail to take advantage of the information available in whole-genome sequence data, which is increasingly being generated for surveillance and investigation ofKlebsiella.
Methods
We investigated the diversity of capsule synthesis loci (K loci) among a large, diverse collection of 2503 genome sequences ofK. pneumoniaeand closely related species. We incorporated analyses of both full-length K locus DNA sequences and clustered protein coding sequences to identify, annotate and compare K locus structures, and we propose a novel method for identifying K loci based on full locus information extracted from whole genome sequences.
Results
A total of 134 distinct K loci were identified, including 31 novel types. Comparative analysis of K locus gene content detected 508 unique protein coding gene clusters that appear to reassort via homologous recombination, generating novel K locus types. Extensive nucleotide diversity was detected among thewziandwzcgenes, both within and between K loci, indicating that current typing schemes based on these genes are inadequate. As a solution, we introduceKaptive, a novel software tool that automates the process of identifying K loci from large sets ofKlebsiellagenomes based on full locus information.
Conclusions
This work highlights the extensive diversity ofKlebsiellaK loci and the proteins that they encode. We propose a standardised K locus nomenclature forKlebsiella, present a curated reference database of all known K loci, and introduce a tool for identifying K loci from genome data (https://github.com/katholt/Kaptive). These developments constitute important new resources for theKlebsiellacommunity for use in genomic surveillance and epidemiology.