Napier, G; (2023) Using whole genome sequencing data to identify strain-types, transmission enhancers and novel drug resistance mutations of Mycobacterium tuberculosis. PhD thesis, London School of Hygiene & Tropical Medicine. DOI: https://doi.org/10.17037/PUBS.04670881
Permanent Identifier
Use this Digital Object Identifier when citing or linking to this resource.
Abstract
Tuberculosis disease (TB), caused by bacteria in the Mycobacterium tuberculosis complex (MTBC) including M. tuberculosis (Mtb), is a leading cause of global morbidity and mortality. Drug resistance, especially to first-line rifampicin (RIF) and isoniazid (INH) drugs, is making the control of the disease difficult. To understand aspects of the genomic epidemiology of TB, with insights for disease control, this thesis analyses whole genome sequences from a large global dataset (n=~32k). Understanding genetic variation in the MTBC genome informs strain-typing, phylogenetic clustering and transmission patterns, and predicts genotypic drug resistance. In turn, these analyses can assist diagnostic design and provide epidemiological insights, as well as improve clinical and surveillance decision making, leading to improvements in TB control. To enhance strain-typing, the 32k dataset was used to infer synonymous single nucleotide polymorphisms (SNPs) which uniquely identify 90 MTBC clades (lineages and sub-lineages). By finding those SNPs with perfect sub population differentiation (fixation index Fst values = 1), a new barcode was inferred, providing greater resolution of the MTBC phylogenetic tree than previous work, including by identifying 30 new sub-lineages with associated barcoding SNPs. Spoligotyping is a method of strain-typing, where a 42-place binary barcode can be generated from the presence or absence of so-called 'spacers' in repeat regions of the MTBC genome. Spoligotypes can be inferred from whole genome sequencing data, and software was developed (Spolpred2) to rapidly do this. Spoligotyping has lower resolution and precision in its ability to discern a sample's place on the MTBC phylogenetic tree compared to the SNP-based barcoding of lineages outlined above, but is nevertheless widely used. Therefore, correlations between various levels of phylogenetic lineage and spoligotypes were investigated, and revealed high concordance between the two systems at the highest lineage levels. Pakistan is a high burden nation for TB, and the profiling of Mtb drug resistance and transmission was conducted on 535 samples across that country. High relatedness of samples based on genome-wide SNP differences was used to infer transmission clusters, which provided a proxy phenotype for increased transmissibility. Using these transmitted and other potentially non-transmitted samples, a genome-wide association study (GWAS) was conducted to find associations between SNPs and increased transmission, revealing the nusG gene to be the most significant (P=5.8x10–10), after adjustment for population structure. In terms of drug resistance, there were mismatches between the phenotypic drug susceptibility tests (DST) in the data and genotypic predicted drug resistance, revealing putative SNPs conferring drug resistance in Pakistan.Mutations in MTBC bacteria that cause drug resistance often come with a fitness cost. To compensate for this cost, the bacteria can develop changes in genes which have similar roles to that of the (pro-)drug targets. To improve genotypic predictions for drug resistance to RIF and INH, samples with compensatory mutations, but no known drug resistance mutations were found in the 32k dataset, thereby leading to the identification of novel putative drug resistance mutations in the relevant genes (rpoB for RIF and katG for INH). Unsurprisingly, there were no new rpoB mutations found, but 31 novel katG putative resistance mutations were identified. Additional analyses, including in silico modeling of the katG gene, were undertaken to provide evidence that the putative INH resistance mutations may be causally relevant. Overall, this thesis has reinforced the benefits of using whole genome sequencing data to provide insights into TB control. Such insights are needed to meet international targets for disease eradication.
Item Type | Thesis |
---|---|
Thesis Type | Doctoral |
Thesis Name | PhD |
Contributors | Clark, TG; Hibberd, M and Phelan, J |
Faculty and Department | Faculty of Infectious and Tropical Diseases > Department of Infection Biology |
Funder Name | Biotechnology and Biological Sciences Research Council |
Copyright Holders | Gary Napier |
Download
Filename: 2022_ITD_PhD_Napier_G-SR.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Download