A bioinformatic analysis of Mycobacterium tuberculosis and host genomic data

Phelan, J; (2018) A bioinformatic analysis of Mycobacterium tuberculosis and host genomic data. PhD (research paper style) thesis, London School of Hygiene & Tropical Medicine. DOI: https://doi.org/10.17037/PUBS.04646310

Text - Accepted Version

Download (58MB) | Preview


Human tuberculosis disease (TB) is caused by bacteria within the Mycobacterium tuberculosis complex, including M. tuberculosis (Mtb). Genetic variation within the pathogen can lead to drug resistance, affect virulence and transmissibility. I have analysed Mtb whole genome sequence data to improve the understanding of global genetic variation, and the resulting insights could ultimately assist the development of TB control measures. Whole genome sequencing platforms are being used to infer drug resistance profiles, and thereby could assist clinical management. I investigated the reproducibility of sequence data from two platforms (Illumina MiSeq, Ion Torrent PGM™) and two rapid analytic pipelines (TBProfiler, Mykrobe predictor). DNA replicates from the reference strain (H37Rv) and 10 drug-resistant strains were sequenced, and inferred drug resistance genotypes were compared to drug susceptibility testing phenotypes. Genome-wide association study (GWAS) can be used to detect mutations associated with Mtb drug resistance. A first GWAS (n=127) attempted to identify mutations associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. A second GWAS was applied to a large global set (n>6400) to identify mutations associated with first- and secondline drug resistance. M. aurum is an environmental mycobacteria that has been proposed as a model for the development of anti-TB drugs. I have assembled and annotated its draft genome, and identified copy number variants in known drug resistance targets. Approximately 10% of the Mtb genome consists of two gene families (pe/ppe) that are poorly characterised, and are hypothesised to be important virulence factors. Using a de novo assembly approach, I characterised these genes and their diversity across a global collection of clinical isolates with high depth short-read sequence data (n=518). A follow-up study using a long-read sequence technology (n=18, diverse stain types) confirmed the findings. This work also generated new annotated reference genomes and characterised methylation sites, which may affect transmissibility, pathogenicity and virulence. A future direction of the TB genomics field is to identify genetic check points in host-pathogen interactions using both human and Mtb genotypes. I analysed the genomes of ~720 TB case–Mtb pairs and identified susceptibility markers, which are promising targets for future control measures.

Item Type: Thesis
Thesis Type: Doctoral
Thesis Name: PhD (research paper style)
Contributors: Clark, TG (Thesis advisor); Hibberd, MI (Thesis advisor); Bhakta, S (Thesis advisor);
Faculty and Department: Faculty of Infectious and Tropical Diseases > ITD Distance Learning
Research Group: Taane G Clark, Martin L Hibberd, Sanjib Bhakta
Funders: Biotechnology and Biological Sciences Research Council
Copyright Holders: Jody Emile Phelan
URI: http://researchonline.lshtm.ac.uk/id/eprint/4646310


Download activity - last 12 months
Downloads since deposit
Accesses by country - last 12 months
Accesses by referrer - last 12 months
Impact and interest
Additional statistics for this record are available via IRStats2

Actions (login required)

Edit Item Edit Item