Statistical Significance of Biological Sequences

Life originated in the sea, and the species variety that we observe today has its roots in marine biology. Besides possessing the most diverse and unique genomes among all living things, marine organisms serve as illustrative indicators of climate change. This project proposal focuses on the statistical treatment of biological sequence data in genomic pipelines. The foundation of the proposal is a larger project, which involves building an infrastructure for marine genomics research, in which our role is the development and management of a bioinformatics pipeline for high-throughput genome sequence analysis. The statistical modeling of biological sequence motifs, or sequence words, runs as a red thread through such a pipeline. In the sequence assembly, separating sequences of different origin. In the genome characterization, identifying functional elements in the sequence. In comparative analyzes, characterizing the evolutionary relationships between species. In the functional analyzes, comparing the gene predictions to known gene and protein families. In gene expression analysis, computing the relative abundance of various transcripts in various situations. All these analyzes rests upon the robust characterization and statistical modeling of biological sequence ``words´´. Such word models will then be used for clustering genomic signatures, hypothesis testing between different gene sets, and comparative gene finding over large evolutionary distances.

Partner organizations

  • University of Gothenburg (Academic, Sweden)
  • University of Gothenburg (Publisher, Sweden)
Start date 01/01/2012
End date The project is closed: 31/12/2015

​Life originated in the sea, and the species variety that we observe today have their roots in marine biology. Besides possessing the most diverse and unique genomes among all living things, marine organisms serve as illustrative indicators of climate change, ocean acidification, habitat exploitation and environmental contamination.
This project focuses on the statistical treatment of biological sequence data in genomic analysis pipelines as part of a larger project that involves building an infrastructure for marine genomics research. The statistical modeling of biological sequence motifs is vital in such a pipeline. In the sequence assembly, separating sequences of different origin. In the genome characterization, identifying functional elements in the sequence. In comparative analyzes, characterizing the evolutionary relationships between species. In the functional analyzes, comparing the gene predictions to known gene and protein families. In gene expression analysis, computing the relative abundance of various transcripts in various situations. All these analyzes rests upon the robust characterization and statistical modeling of biological sequence "words" and their treatment in the downstream statistical analyses.

Keywords: Statistical significance, bioinformatics, biological sequence analysis, genomic signatures

Project leader and contact for communications
​Marina Axelson-Fisk, e-mail marinaa@chalmers.se
External partners
​Magnus Alm Rosenblad
Related projects
​IMAGO – Infrastructure for MArine Genetic model Organisms, led by Anders Blomberg and Kerstin Johannesson

Funded by

  • Swedish Research Council (VR) (Public, Sweden)
​Life Science

Published: Thu 31 May 2018.