Bayesian Feature Selection via Variational Inference in Omics Data

DAV Scott ; (2022) Bayesian Feature Selection via Variational Inference in Omics Data. PhD thesis, London School of Hygiene & Tropical Medicine. DOI: 10.17037/PUBS.04668862

Copy

The advent of genome sequencing has led to a dramatic change in the scale and breadth of information within biology. Omics technologies have enabled a single experiment to generate a very large amount of raw data, of increasingly complex phenomena. This data is often highdimensional, the size raises questions about the efficiency of the computational approach used to estimate the model and the number of attributes often exceed the number of observations. The focus of the thesis is on Bayesian feature selection in high-dimensional omics data via variational inference. Our objective is to develop and implement reliable inferential tools that scale efficiently with dimensionality. Our first algorithm identifies compositional covariates and effect sizes associated with a response of interest via auxiliary indicator variables. This is particularly useful for data sets generated from genome sequencing technology such as human microbiome, as these only contain information on the relative magnitudes of the compositional components. Novel priors account for model constraints and a Monte Carlo step, guided by the data, is introduced to estimate intractable marginal expectations. We extend the methodology to a multidimensional response, where different compositional covariates are free to be associated with different responses. This allows the relationship between the microbiome and complex phenotypes such as lipids or metabolites to be explored in one model, facilitating a system genetics approach to understanding the flow of biological information. By a reparameterisation of the likelihood, we are able to perform fast covariance and covariate selection despite the vast model space. A hierarchical Bayesian model is developed for clusters of individuals who exhibit different causal pathways to the same multi-dimensional endpoint. Again, we are able to reparametrise the likelihood to incorporate fast predictor and covariance selection within a large model space. We capture the different latent structures across the clusters to aid model fitting and understanding. Sparse feature selection is performed both within each expert and in the unsupervised learning of cluster detection. Our hope is that the software which follows the methods we have outlined will be used by practitioners to develop biological understanding and insight.

Item Type	Thesis (Doctoral)
Thesis Type	Doctoral
Thesis Name	PhD
Contributors	Lewin, A
Grant number	MR/N013638/1
Copyright Holders	Darren Andrew Vincent Scott
Date Deposited	28 Mar 2023 14:27

Explore Further

Scott, DAV

Medical Research Council

Dept of Medical Statistics

picture_as_pdf: 2022_EPH_PhD_Scott_D.pdf
subject: Accepted Version
: Available under Creative Commons: Attribution-NonCommercial-No Derivative Works 4.0

View

Download

Atom

BibTeX

OpenURL ContextObject in Span

Multiline CSV

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

JSON

MARC (ASCII)

MARC (ISO 2709)

METS

MODS

RDF+N3

RDF+N-Triples

RDF+XML

RIOXX2 XML

Reference Manager

Refer

Simple Metadata

ASCII Citation

EP3 XML

Export

Downloads