Life originated in the sea, and the species variety that we observe today have their roots in marine biology. Besides possessing the most diverse and unique genomes among all living things, marine organisms serve as illustrative indicators of climate change, ocean acidification, habitat exploitation and environmental contamination.
This project focuses on the statistical treatment of biological sequence data in genomic analysis pipelines as part of a larger project that involves building an infrastructure for marine genomics research. The statistical modeling of biological sequence motifs is vital in such a pipeline. In the sequence assembly, separating sequences of different origin. In the genome characterization, identifying functional elements in the sequence. In comparative analyzes, characterizing the evolutionary relationships between species. In the functional analyzes, comparing the gene predictions to known gene and protein families. In gene expression analysis, computing the relative abundance of various transcripts in various situations. All these analyzes rests upon the robust characterization and statistical modeling of biological sequence "words" and their treatment in the downstream statistical analyses.
Keywords: Statistical significance, bioinformatics, biological sequence analysis, genomic signatures
IMAGO – Infrastructure for MArine Genetic model Organisms, led by Anders Blomberg and Kerstin Johannesson