Differential networks (and other statistical issues) for the analysis of metabolomic data

Macleod, D; (2017) Differential networks (and other statistical issues) for the analysis of metabolomic data. PhD thesis, London School of Hygiene & Tropical Medicine. DOI: https://doi.org/10.17037/PUBS.03817570

Text - Accepted Version

Download (8MB) | Preview


Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. However, the high dimensionality of this type of information and its strong interdependencies raise several analytical difficulties. These difficulties were investigated, motivated by the study of 228 metabolites acquired from blood samples as part of the British Womens Heart and Health Study (BWHHS). Issues regarding transformations of the metabolomics data and their reliability were examined. Analytical methods typically adopted with high-dimensional data were reviewed, and then a more recently developed method, differential networks, was examined in detail. When investigating differential networks using simulations of three alternative data generating scenarios, it was found that an edge between two nodes can be induced if the effect of one node on disease is modified by another node, or if the disease causes (or is associated with) a "breaking down" in the relationship between the two nodes. The simulations focused on simplified settings but exemplify the difficulties in interpreting differential networks and helped elucidate the sample sizes required. Further algebraic examination of likely data generating mechanisms identified the potential pitfalls of relying on partial correlations in building differential networks. This shows that, when important nodes influencing the correlation structure are not measured, irrelevant edges may be selected, while relevant ones may be missed. Analysis of the BWHHS metabolite data flagged a small number of metabolites that could potentially be associated with CHD, with small VLDL triglycerides being the strongest candidate. Comparisons were made with the results obtained using regression-based methods as these are more easily accessible to epidemiologists. The fact that there was little overlap in identified biomarkers is an indication of the complexity of this field of research.

Item Type: Thesis
Thesis Type: Doctoral
Thesis Name: PhD
Contributors: De Stavola, BL (Thesis advisor);
Faculty and Department: Faculty of Epidemiology and Population Health > Dept of Medical Statistics
Funders: Economic and Social Research Council
URI: http://researchonline.lshtm.ac.uk/id/eprint/3817570


Download activity - last 12 months
Downloads since deposit
Accesses by country - last 12 months
Accesses by referrer - last 12 months
Impact and interest
Additional statistics for this record are available via IRStats2

Actions (login required)

Edit Item Edit Item