When dealing with biological phenomena which are random, not deterministic, the best way to describe them are through probabilistic models. Thus, we get a better understanding of the phenomena, so that we can make predictions on new data. Mariana studies DNA sequencing data. The human genome project, when all genes of the human genome were identified and mapped, was complete in 2003. Since then, lots of improvements have been made in the DNA sequencing techniques, which are known as NGS, next generation sequencing. These technologies generate lots of data and careful work is needed to make sense of them, not least to determine what is signal and what is noise. Because of the high variability involved in the techniques, and intrinsic to the biological process of interest, statistics is the best way to do this.
Bacteria exchange genes between cells to adapt to the environment
Mariana’s research involves bacterial communities from the environment, from which samples have been taken and all DNA of the microorganisms in them have been sequenced. In these communities, she has studied horizontal gene transfer, the ability that bacteria have to exchange genes between cells. She focuses in one genetic mechanism that allows horizontal gene transfers, called integrons. The genes that have been transferred via integrons have a marker and a model of this marker is made to see what the bacteria are transferring and how they evolve to better adapt to their environment. In particular, this is a mechanism bacteria can use to become resistant against antibiotics. The 13 000 genes that have been found are from all sorts of environments – from oceanic samples, from the Amazonas, from geysers and from the guts of humans and elephants. They carry all sorts of different functions, although the majority do not correspond to any known function, indicating that further studies on this topic are required.
Also, metagenomics data can explain how communities differ in the genetic level. For example, we can investigate if more antibiotic resistance genes are found in a polluted environment than in a pristine environment, or what bacterial genes are found in the gut of a patient with a disease compared to the gut of a healthy person. Usually in these comparisons between communities there is very much noise among the data. The second part of the thesis deals with the removal of this noise. Mariana have compared nine normalization methods to get rid of systematic noise, and the results show that some methods can produce high levels of false positives, and highlight the importance of using a suitable method. The thesis can be used as a guidance on how to analyse metagenomic data to better understand microbial communities. Also, the data of the 13 000 genes that have been found can easily be downloaded and used in other studies.
Master programme in Bioinformatics led to PhD position
When Mariana began her studies at the University of São Paulo she took the new programme Medical physics. She liked physics, mathematics and biology, which she was very curious about, but she did not want to read only one of the subjects. Medical physics, however, did not turn out to be exactly her thing. She finished her bachelor degree, moved to Sweden, and found the master programme Bioinformatics. In one of the courses there she met her current supervisor Erik Kristiansson, who recommended her to apply for the doctoral position she has held.
– I would say about everything in Sweden is different from Brazil. What I really like in the academic life is the low hierarchical levels, that the professors and supervisors talk to you as to an equal and listen to your ideas.
Next stop for Mariana is London, where she will have a three-year postdoctoral position at the Institute of Cancer Research. She has worked a little bit with cancer as well, although the paper has not been published yet. Still, the techniques she will use for her postdoc are the same as in her doctoral work. More specifically, she will work with identification of the driver mutations in prostate cancer. Mariana finds the cancer problems interesting and challenging, and she wants to see results of the research in a relatively near future. She also looks forward to move to London, even if she would be glad to return to Sweden in the future.
Mariana Buongermino Pereira will defend her PhD thesis “Statistical modelling and analyses of DNA sequence data with applications to metagenomics” on September 22 at 10.15 in the room Pascal, Hörsalsvägen 1. Supervisor is Erik Kristiansson, co-supervisor is Marija Cvijovic.
Text and photo: Setta Aspström