Horesh, Gal; Blackwell, Grace A; Tonkin-Hill, Gerry; Corander, Jukka; Heinz, Eva; Thomson, Nicholas R; (2021) A comprehensive and high-quality collection of Escherichia coli genomes and their genes. Microbial genomics. ISSN 2057-5858 DOI: https://doi.org/10.1099/mgen.0.000499
Permanent Identifier
Use this Digital Object Identifier when citing or linking to this resource.
Abstract
<jats:p> <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">Escherichia coli</jats:ext-link> </jats:named-content> </jats:italic> is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> population is driven by high genome plasticity and a very large gene pool. All these have made <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> and <jats:italic> <jats:named-content content-type="genus"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3329" xlink:type="simple">Shigella</jats:ext-link> </jats:named-content> </jats:italic> genomes to provide a single, uniform, high-quality dataset. <jats:italic> <jats:named-content content-type="genus"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3329" xlink:type="simple">Shigella</jats:ext-link> </jats:named-content> </jats:italic> were included as they are considered specialized pathovars of <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic>. We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> lineages and the distribution and flow of genes in the <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the <jats:italic> <jats:named-content content-type="species"> <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://doi.org/10.1601/nm.3093" xlink:type="simple">E. coli</jats:ext-link> </jats:named-content> </jats:italic> species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.</jats:p>
Item Type | Article |
---|---|
Faculty and Department | Faculty of Infectious and Tropical Diseases > Department of Infection Biology |
PubMed ID | 33417534 |
Elements ID | 155179 |
Download
Filename: mgen000499.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 3.0
Download