Building and analysing the tree of GEMs

Projektförslag för kandidatarbete inom inst. Kemi och kemiteknik och Biologi och bioteknik
​Projektexamenskod: BBTX01-21-04

Avdelningen för Infrastrukturer
Institutionen för Biologi och Bioteknik, Chalmers tekniska högskola 

In the field of systems biology, mathematical models of biological processes are often used for simulations and predictions. Models can be of different types, depending on the processes they aim to model. Genome-scale metabolic models (GEMs) are used to study the metabolism at a genome-scale and understand how the metabolism changes in different conditions, eg different media or in a disease context.
Standardisation efforts have improved the dissemination of models. Firstly, file formats for different model types have become established. Secondly, deposition databases, eg BioModels are consistently used in conjunction with publications that include models. Moreover, other databases apply further curation to models before making them publicly available, eg BiGG. More recently, GEMs have adopted versioning principles common in software development. While this brings many advantages, eg by being able to have newer versions of the same model with transparent curations, it is also a fundamentally different approach from the deposition databases.
Through various dissemination approaches, the publication of models has facilitated model reuse – old models are incorporated into new models. While information describing the reuse is typically included in the publication accompanying the model, it is not stored in any database. Therefore, for each model, its provenance has to be figured out by every reader.

Model reuse should be transparent, at a large scale. To this end, we propose to identify annotation parameters for model reuse, and enrich a large proportion of published models with this annotation.
In addition to existing annotation parameters, eg species, year, version, other parameters should be proposed, such as model inheritance, file formats, databases used as external identifiers, test scores, affiliations. An example schematic of model inheritance was included in publication of the most recent human model, as Fig. S1 (also shown here), illustrating the year when different generic human GEMs were published, with arrows indicating which models were incorporated, to some extent, into other GEMs (Robinson, J., et al, 2020. An atlas of human metabolism. Sci. Signal. 13, eaaz1482). This model was developed at Systems and synthetic biology at the BIO department.
Through this enriched annotation, questions such as the following may be answered:
- How has the rate of publishing models changed in the last decade?
- How has the rate of model reuse changed in the past years?
- What are the publication trends for different genera?
- How has the use of databases changed in the last decade?
- How often are models updated?
- How often are models reused?
- Which centres around the world publish models?

Additional questions should be identified through discussions with researchers in the field.
The newly obtained annotation is planned to be displayed on Metabolic Atlas, a website developed jointly by the BIO department at Chalmers and the National Bioinformatics Infrastructure Sweden, part of SciLifeLab and Elixir.
Genomförande /Viktiga moment/teknikinnehåll 
The project is structured as following:
1.    Understand the existing annotation models present in different databases and/or described in the literature.
2.    Define a new annotation set, together with a convenient file format.
3.    Enrich existing models (about 1000) with newly defined annotation in accordance with FAIR principles .
4.    Reach out to researchers to compile a broad set of questions.
5.    Analyse the annotation data in order to answer the questions, with visualisations.

Speciella förkunskapskrav: Basic knowledge in biology and metabolism is required. Basic knowledge of Python/R/Matlab is a plus.

Möjlig målgrupp For this project, an interest in the field of modelling is expected. By the end of the project, students would have gained a deep understanding of model types (especially GEMs), the process to create them and the tools used. Students will also gain experience in enriching research databases, defining research questions, and creating visualisations.

Gruppstorlek: 4–6 studenter 

Förslagsställare/kontaktperson: ​​Mihail Anton (
Huvudhandledare:  Jonathan Robinson (
Övriga handledare: Hao Wang (

Sidansvarig Publicerad: fr 30 okt 2020.