Tunstall, T; (2023) Using Machine Learning to Anticipate Antimicrobial Resistance in Mycobacterium Tuberculosis. PhD thesis, London School of Hygiene & Tropical Medicine. DOI: https://doi.org/10.17037/PUBS.04670981
Permanent Identifier
Use this Digital Object Identifier when citing or linking to this resource.
Abstract
Antimicrobial resistance (AMR) continues to threaten public healthcare worldwide. Drug-resistant tuberculosis (DR-TB) is a major example of AMR, with resistance developing to multiple drugs, impeding treatment. Resistance in Mycobacterium tuberculosis (the bacterium causing human TB) is primarily mediated via mutations causing a single amino acid change at a specific position in a given protein, termed single amino acid variation (SAV). This thesis focusses on using computational methods to investigate the molecular consequences of SAVs on resistance development in the six main M. tuberculosis gene-drug targets: alr-cycloserine (DCS), embB-ethambutol (EMB), gidB-streptomycin (STR), katG-isoniazid (INH), pncA-pyrazinamide (PZA), and rpoB-rifampicin (RFP). Mutation data was sourced from a genome-wide association study of over 35,000 clinical isolates. An analysis pipeline extracted over 4000 SAVs across all targets and calculated minor allele frequency, odds ratio and lineage contributions. Protein structure modelling and docking were performed to obtain the gene-drug complex in the absence of an experimentally determined structure. Multiple in silico estimators of mutational effects on protomer stability, molecular affinities, evolutionary conservation, and residue-level properties were calculated. Initial analysis explored interrelationships between estimators for gene-targets. Visualisation tools, built to interactively inspect these relationships, aided the interpretation. Lineage effects on resistance were examined to understand the influence of epistasis. Together, these were used to build a supervised machine learning (ML) classification pipeline using multiple classifiers to predict resistance. ML models were built for individual and combined gene-drug targets, with the latter showing supervised ML classification could be used in a gene-agnostic manner to predict resistance. Model performance was assessed using the Matthews Correlation Coefficient (MCC), with performance generally improving upon feature selection for most models. For individual gene-drug targets, ML prediction for predicting PZA resistance performed the best, with an MCC score of 0.52 achieved using the Multilayer Perceptron (MLP) classifier. This was followed by an MCC of 0.49 for RFP resistance prediction using the XGBoost model. EMB and INH resistance predictions followed equally with MCC scores of 0.42 using XGBoost for EMB, and both Linear Discriminant Analysis and Ridge classifiers for INH. For the combined model, PZA prediction was the highest with an MCC of 0.46 based on the Extra Tree classifier, followed by RFP prediction of 0.39 MCC using MLP. EMB resistance prediction was 0.34 MCC using the Random Forest classifier, and finally an MCC of 0.31 with Stochastic Descent for INH resistance prediction. INH resistance prediction was the lowest compared with other targets both in the individual and combined ML approaches, while DCS and STR resistance prediction results were inconclusive. Exploiting a combined genomic and structural approach to understand mutational effects of resistance to anticipate resistance in a gene-agnostic manner would benefit clinical decision making and drug stewardship efforts. Future work could extend these methods to develop epistasis-informed ML models and apply transfer and unsupervised learning to other gene-targets in M. tuberculosis. The methods and pipelines developed can also be applied to other AMR pathogens.
Item Type | Thesis |
---|---|
Thesis Type | Doctoral |
Thesis Name | PhD |
Contributors | Furnham, N and Clark, T |
Faculty and Department | Faculty of Infectious and Tropical Diseases > Department of Infection Biology |
Funder Name | Biotechnology and Biological Sciences Research Council, NPIF, LIDo |
Copyright Holders | Tanushree Tunstall |
Download
Filename: 2023_ITD_PhD_Tunstall_T.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Download