Computational metabolomics

Metabolomics provides a snapshot of the molecular phenotype through comprehensive measurement of the small organic molecules in biological samples. Computational techniques are then used to study the metabolome in relation to life science research questions, e.g. medicine or nutrition. We develop algorithms and pipelines for metabolomics data generation and omics data analysis. We also apply these procedures to investigate associations between exposures, omics and health.

Liquid chromatography-mass spectrometry (LC-MS) is the most used technique in our metabolomics studies, since it provides a wide coverage of the measurable metabolome. But the technique has inherent issues with drift stability. In addition, preanalytical sample management (e.g. temperature and time until centrifugation) has a strong impact on the metabolome. 

We have developed several algorithms to address and overcome these issues. Current efforts go into automating procedures to continuously monitor data quality as samples are being analysed. Another challenge we are addressing is the fact that metabolomics data from nontarget analysis normally contains thousands of variables, but relatively fewer samples.

Data analyses are therefore consequently prone to overfitting and false positive discovery. To keep with the exploratory nature of nontarget analysis, we developed the MUVR algorithm to perform machine learning (PLS and Random Forest) with repeated double crossvalidation to reduce bias and false discovery. In addition, the unbiased variable selection from MUVR can then be used in other downstream analyses, e.g. for epidemiological purposes. We are currently developing an updated version of MUVR.