Abstracts, see below.
17/1 - Magnus Röding: Imaging, characterization, and in silico design of heterogeneous porous materials
24/1 - Mats Gyllenberg, Helsingfors Universitet: On models of physiologically structured populations and their reduction to ordinary differential equations.
31/1 - Christian A. Naesseth, Automatic Control, Linköping: Variational and Monte Carlo methods - Bridging the Gap.
7/2 - Jonas Wallin, Lund University: Multivariate Type-G Matérn fields.
14/2 - Jes Frellsen, IT University of Copenhagen: Deep latent variable models: estimation and missing data imputation.
21/2 - Riccardo De Bin, University of Oslo: Detection of influential points as a byproduct of resampling-based variable selection procedures.
28/2 - Johan Henriksson: Single-cell perturbation analysis – the solution to systems biology?
7/3 - Larisa Beilina: Time-adaptive parameter identification in mathematical model of HIV infection with drug therapy.
14/3 - Umbert Picchini: Accelerating MCMC sampling via approximate delayed-acceptance.
21/3 - Samuel Wiqvist, Lund University: Automatic learning of summary statistics for Approximate Bayesian Computation using Partially Exchangeable Networks.
28/3 - Hans Falhlin (Chief Investment Officer, AP2, Andra AP-fonden) and Tomas Morsing ( Head of Quantitative Strategies, AP2, Andra AP-fonden): A scientific approach to financial decision making in the context of managing Swedish pension assets.
11/4 - Daniele Silvestro: Birth-death models to understand the evolution of (bio)diversity.
12/4 - Erika B. Roldan Roa, Department of Mathematics, The Ohio State University: Evolution of the homology and related geometric properties of the Eden Growth Model.
16/5 - Susanne Ditlevsen, University of Copenhagen: Inferring network structure from oscillating systems with cointegrated phase processes.
23/5 - Chun-Biu Li, Stockholms Universitet: Information Theoretic Approaches to Statistical Learning.
13/6 - Sara Hamis, Swansea University: DNA Damage Response Inhibition: Predicting in vivo treatment responses using an in vitro- calibrated mathematical model.
19/9 - Ronald Meester, Vrije University, Amsterdam: The DNA Database Controversy 2.0.
26/9 - Valerie Monbet, Université de Rennes: Time-change models for asymmetric processes.
3/10 - Peter Jagers, Chalmers: Populations - from few independently reproducing individuals to continuous and deterministic flows. Or: From branching processes to adaptive population dynamics.
15/10 - Mats Gyllenberg, Helsingfors Universitet: Difference and differential equations in population biology: History and modelling.
17/10 - Richard Davis, Columbia University and Chalmers Jubileum Professor 2019: Extreme Value Theory Without the Largest Values: What Can Be Done?
24/10 - Erica Metheney, Department of Political Sciences, University of Gothenburg: Modifying Non-Graphic Sequences to be Graphic.
31/10 - Sofia Tapani, AstraZeneca: Early clinical trial design - Platform designs with the patient at its center.
6/11 - Richard Torkar, Software Engineering, Chalmers: Why do we encourage even more missingness when dealing with missing data?
7/11 - Krzysztof Bartoszek, Linköping University: Formulating adaptive hypotheses in multivariate phylogenetic comparative methods.
20/11 - Paul-Christian Bürkner, Aalto University: Bayesflow: Software assisted Bayesian workflow.
27/11 - Geir Storvik, Oslo University: Flexible Bayesian Nonlinear Model Configuration.
4/12 - Moritz Schauer, Chalmers/GU: Smoothing and inference for high dimensional diffusions.
11/12 - Johannes Borgqvist, Chalmers/GU: The polarising world of Cdc42: the derivation and analysis of a quantitative reaction diffusion model of cell polarisation.
24/1 - Mats Gyllenberg, Helsingfors Universitet: On models of physiologically structured populations and their reduction to ordinary differential equations
Sammanfattning: Considering the environmental condition as a given function of time, we formulate a physiologically structured population model as a linear non-autonomous integral equation for the, in general distributed, population level birth rate. We take this renewal equation as the starting point for addressing the following question: When does a physiologically structured population model allow reduction to an ODE without loss of relevant information? We formulate a precise condition for models in which the state of individuals changes deterministically, that is, according to an ODE. Specialising to a one-dimensional individual state, like size, we present various sufficient conditions in terms of individual growth-, death-, and reproduction rates, giving special attention to cell fission into two equal parts and to the catalogue derived in an other paper of ours (submitted). We also show how to derive an ODE system describing the asymptotic large time behaviour of the population when growth, death and reproduction all depend on the environmental condition through a common factor (so for a very strict form of physiological age).
31/1 - Christian A. Naesseth, Automatic Control, Linköping: Variational and Monte Carlo methods - Bridging the Gap
Abstract: Many recent advances in large scale probabilistic inference rely on the combination of variational and Monte Carlo (MC) methods. The success of these approaches depends on (i) formulating a flexible parametric family of distributions, and (ii) optimizing the parameters to find the member of this family that most closely approximates the exact posterior. My aim is to show how MC methods can be used not only for stochastic optimization of the variational parameters, but also for defining a more flexible parametric approximation in the first place. First, I will review variational inference (VI). Second, I describe some of the pivotal tools for VI, based on MC methods and stochastic optimization, that have been developed in the last few years. Finally, I will show how we can synthesize sequential Monte Carlo methods and VI to learn more accurate posterior approximations with theoretical guarantees.
7/2 - Jonas Wallin, Lund University: Multivariate Type-G Matérn fields
Abstract: I will present a class of non-Gaussian multivariate random fields is formulated using systems of stochastic partial differential equations (SPDEs) with additive non-Gaussian noise. To facilitate computationally efficient likelihood-based inference, the noise is constructed using normal-variance mixtures (type-G noise). Similar, but simpler, constructions have been proposed earlier in the literature, however they lack important properties such as ergodicity and flexibility of predictive distributions. I will present that for a specific system of SPDEs the marginal of the fields has Matérn covariance functions.
Further I will present a parametrization of the system, that one can use to separate the cross-covariance and the extra dependence coming from the non-Gaussian noise in the proposed model.
If time permits I will discuss some recent result on proper scoring rules (PS). PS is the standard tool for evaluating which model fits data best in spatial statistics (like Gaussian vs non-Gaussian models).
We have developed a new class of PS that I argue is better suited for evaluation model if one has observations at irregular locations.
14/2 - Jes Frellsen, IT University of Copenhagen: Deep latent variable models: estimation and missing data imputation
Abstract: Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models. In this talk, we first give a brief introduction to deep learning. Then we discuss how DLVMs are estimated: variational methods are commonly used for inference; however, the exact likelihood of these models has been largely overlooked. We show that most unconstrained models used for continuous data have an unbounded likelihood function and discuss how to ensure the existence of maximum likelihood estimates. Then we present a simple variational method, called MIWAE, for training DLVMs, when the training set contains missing-at-random data. Finally, we present Monte Carlo algorithms for missing data imputation using the exact conditional likelihood of DLVMs: a Metropolis-within-Gibbs sampler for DLVMs trained on complete datasets and an importance sampler for DLVMs trained on incomplete data sets. For complete training sets, our algorithm consistently and significantly outperforms the usual imputation scheme used for DLVMs. For incomplete training set, we show that MIWAE trained models provide accurate single and multiple imputations, and are highly competitive with state-of-the-art methods.
This is joint work with Pierre-Alexandre Mattei.
21/2 - Riccardo De Bin, University of Oslo: Detection of influential points as a byproduct of resampling-based variable selection procedures
Abstract: Influential points can cause severe problems when deriving a multivariable regression model. A novel approach to check for such points is proposed, based on the variable inclusion matrix, a simple way to summarize results from resampling-based variable selection procedures. These procedures rely on the variable inclusion matrix, which reports whether a variable (column) is included in a regression model fitted on a pseudo-sample (row) generated from the original data (e.g., bootstrap sample or subsample). The variable inclusion matrix is used to study the variable selection stability, to derive weights for model averaged predictors and in others investigations. Concentrating on variable selection, it also allows understanding whether the presence of a specific observation has an influence on the selection of a variable.
From the variable inclusion matrix, indeed, the inclusion frequency (I-frequency) of each variable can be computed only in the pseudo-samples (i.e., rows) which contain the specific observation. When the procedure is repeated for each observation, it is possible to check for influential points through the distribution of the I-frequencies, visualized in a boxplot, or through a Grubbs’ test. Outlying values in the former case and significant results in the latter point to observations having an influence on the selection of a specific variable and therefore on the finally selected model. This novel approach is illustrated in two real data examples.
28/2 - Johan Henriksson: Single-cell perturbation analysis – the solution to systems biology?
Abstract: The ideas behind systems biology has been around for ages. However, the field has been held back by the lack of data. In this talk I will cover new methods, by me and others, toward generating the large amounts of data needed to fit realistic regulatory models. Focus will be on wet lab methods as well as equations, and how we practically can solve them. I will try to cover, in particular, CRISPR, RNA-seq, ATAC-seq, STARR-seq, bayesian models, ODE and a bit of physics.
7/3 - Larisa Beilina: Time-adaptive parameter identification in mathematical model of HIV infection with drug therapy
Abstract: Parameter identification problems are frequently occurring within biomedical applications. These problems are often ill-posed, and thus challenging to solve numerically. In this talk will be presented the time-adaptive optimization method for determination of drug efficacy in the mathematical model of HIV infection. Time-adaptive method means that first we determine drug efficacy at known coarse time partition using known values of observed functions. Then we locally refine time-mesh at points where a posteriori error indicator is large and compute drug efficacy on a new refined mesh until the error is reduced to the desired accuracy. The time-adaptive method can eventually be used by clinicians to determine the drug-response for each treated individual. The exact knowledge of the personal drug efficacy can aid in the determination of the most suitable drug as well as the most optimal dose for each person, in the long run resulting in a personalized treatment with maximum efficacy and minimum adverse drug reactions.
14/3 - Umberto Picchini: Accelerating MCMC sampling via approximate delayed-acceptance
Abstract: While Markov chain Monte Carlo (MCMC) is the ubiquitous tool for sampling from complex probability distributions, it does not scale well with increasing datasets. Also, its structure is not naturally suited for parallelization.
When pursuing Bayesian inference for model parameters, MCMC can be computationally very expensive, either when the dataset is large, or when the likelihood function is unavailable in closed form and itself requires Monte Carlo approximations. In these cases each iteration of Metropolis-Hastings may result intolerably slow. The so-called "delayed acceptance" MCMC (DA-MCMC) was suggested by Christen and Fox in 2005 and allows the use of a computationally cheap surrogate of the likelihood function to rapidly screen (and possibly reject) parameter proposals, while using the expensive likelihood only when the proposal has survived the "scrutiny" of the cheap surrogate. They show that DA-MCMC samples from the exact posterior distribution and returns results much more
rapidly than standard Metropolis-Hastings. Here we design a novel delayed-acceptance algorithm, which is between 2 and 4 times faster than the original DA-MCMC, though ours results in approximate inference. Despite this, we show empirically that our algorithm returns accurate inference. A computationally intensive case study is discussed,
involving ~25,000 observations from protein folding reaction coordinate, fit by an SDE model with an intractable likelihood approximated using sequential Monte Carlo (that is particle MCMC).
This is joint work with Samuel Wiqvist, Julie Lyng Forman, Kresten Lindorff-Larsen and Wouter Boomsma.
keywords: Bayesian inference, Gaussian process; intractable likelihood; particle MCMC; protein folding; SDEs
21/3 - Samuel Wiqvist, Lund University: Automatic learning of summary statistics for Approximate Bayesian Computation using Partially Exchangeable Networks
Abstract: Likelihood-free methods enable statistical inference for the parameters of complex models, when the likelihood function is analytically intractable. For these models, several tools are available that only require the ability to run a computer simulator of the mathematical model, and use the output to replace the unavailable likelihood function. The most famous of these type of methodologies is Approximate Bayesian Computation (ABC), which relies on the access to low-dimensional summary statistics of the data. Learning these summary statistics is a fundamental problem in ABC, and selecting them is not trivial. It is in fact the main challenge when applying ABC in practice, and it affects the resulting inference considerably. Deep learning methods have previously been used to learn summary statistics for ABC.
Here we introduce a novel deep learning architecture (Partially Exchangeable Networks, PENs), with the purpose to automatize the summaries selection task. We only need to provide our network with samples from the prior predictive distribution, and this will return summary statistics for ABC use. PENs are designed to have the correct invariance property for Markovian data, and PENs are therefore particularly useful when learning summary statistics for Markovian data.
Case studies show that our methodology outperforms other popular methods, resulting in more accurate ABC inference for models with intractable likelihoods. Empirically, we show that for some case studies our approach seems to work well also with non-Markovian and non-exchangeable data.
28/3 - Hans Falhlin (Chief Investment Officer, AP2, Andra AP-fonden) and Tomas Morsing (Head of Quantitative Strategies, AP2, Andra AP-fonden): A scientific approach to financial decision making in the context of managing Swedish pension assets
Abstract: The Second Swedish Pension Fund AP2 is one of the four large Swedish pension buffer funds. In this presentation we will give examples of our scientific approach to financial decision making in the area of strategic asset allocation and, in greater depth, model based portfolio management. Model based portfolio management, the management of portfolios of financial assets with mathematical and statistical models, involve many interesting and challenging problems. We will in this presentation give an overview of the area and indicate areas for future research.
11/4 - Daniele Silvestro: Birth-death models to understand the evolution of (bio)diversity
Abstract: Our planet and its long history are characterized by a stunning diversity of organisms, environments and, more recently, cultures and technologies. To understand what factors contribute to generating diversity and shaping its evolution we have to look beyond diversity patterns. Here I present a suite of Bayesian models to infer the dynamics of origination and extinction processes using fossil occurrence data and show how the models can be adapted to the study of cultural evolution. Through empirical examples, I will demonstrate the use of this probabilistic framework to test specific hypotheses and quantify the processes underlying (bio)diversity patterns and their evolution.
12/4 - Erika B. Roldan Roa, Department of Mathematics, The Ohio State University: Evolution of the homology and related geometric properties of the Eden Growth Model
Abstract: In this talk, we study the persistent homology and related geometric properties of the evolution in time of a discrete-time stochastic process defined on the 2-dimensional regular square lattice. This process corresponds to a cell growth model called the Eden Growth Model (EGM). It can be described as follows: start with the cell square of the 2-dimensional regular square lattice of the plane that contains the origin; then make the cell structure grow by adding one cell at each time uniformly random to the perimeter. We give a characterization of the possible change in the rank of the first homology group of this process (the "number of holes"). Based on this result we have designed and implemented a new algorithm that computes the persistent homology associated to this stochastic process and that also keeps track of geometric features related to the homology. Also, we present obtained results of computational experiments performed with this algorithm, and we establish conjectures about the asymptotic behaviour of the homology and other related geometric random variables. The EGM can be seen as a First Passage Percolation model after a proper time-scaling. This is the first time that tools and techniques from stochastic topology and topological data analysis are used to measure the evolution of the topology of the EGM and in general in FPP models.
16/5 - Susanne Ditlevsen, University of Copenhagen: Inferring network structure from oscillating systems with cointegrated phase processes
We present cointegration analysis as a method to infer the network structure of a linearly phase coupled oscillating system. By defining a class of oscillating systems with interacting phases, we derive a data generating process where we can specify the coupling structure of a network that resembles biological processes. In particular we study a network of Winfree oscillators, for which we present a statistical analysis of various simulated networks, where we conclude on the coupling structure: the direction of feedback in the phase processes and proportional coupling strength between individual components of the system. We show that we can correctly classify the network structure for such a system by cointegration analysis, for various types of coupling, including uni-/bi-directional and all-to-all coupling. Finally, we analyze a set of EEG recordings and discuss the current applicability of cointegration analysis in the field of neuroscience.
Ref: J. Østergaard, A. Rahbek and S. Ditlevsen. Oscillating systems with cointegrated phase processes. Journal of Mathematical Biology, 75(4), 845--883, 2017.
23/5 - Chun-Biu Li, Stockholms Universitet: Information Theoretic Approaches to Statistical Learning
Abstract: Since its introduction in the context of communication theory, information theory has extended to a wide range of disciplines in both natural and social sciences. In this talk, I will explore information theory as a nonparametric probabilistic framework for unsupervised and supervised learning free from a prioriassumption on the underlying statistical model. In particular, the soft (fuzzy) clustering problem in unsupervised learning can be viewed as a tradeoff between data compression and minimizing the distortion of the data. Similarly, modeling in supervised learning can be treated as a tradeoff between compression of the predictor variables and retaining the relevant information about the response variable. To illustrate the usage of these methods, some applications in biophysical problems and time series analysis will be briefly addressed in the talk.
13/6 - Sara Hamis, Swansea University: DNA Damage Response Inhibition: Predicting in vivo treatment responses using an in vitro- calibrated mathematical model
Abstract: Mathematical models, and their corresponding in silico experiments, can be used to simulate both in vitro and in vivo tumour scenarios. However, the microenvironment in an in vitro cell culture is significantly different from the microenvironment in a solid tumour and many details that influence tumour dynamics differ between in vitro and in vivo settings. These details include cell proliferation, oxygen distribution and drug delivery. It follows that translating quantitative in vitro findings to in vivo predictions is not straightforward.
In this talk I will present an individual based mathematical cancer model in which one individual corresponds to one cancer cell. This model is governed by a few observable and well documented principles, or rules. To account for differences between the in vitro and in vivo scenarios, these rules can be appropriately adjusted. By only adjusting the rules (whilst keeping the fundamental framework intact), the mathematical model can first be calibrated by in vitro data and thereafter be used to successfully predict treatment responses in mouse xenografts in vivo. The model is used to investigate treatment responses to a drug that hinders tumour proliferation by targeting the cellular DNA damage response process.
19/9 - Ronald Meester, Vrije University, Amsterdam: The DNA Database Controversy 2.0
Abstract: What is the evidential value of a unique match of a DNA profile in database? Although the probabilistic analysis of this problem is in principle not difficult, it was the subject of a heated debate in the literature around 15 years ago, to which I also contributed. Very recently, to my surprise, the debate was re-opened by the publication of a paper by Wixted, Christenfeld and Rouder, in which a new element to the discussion was introduced. They claimed that the size of the criminal population (however defined) was important. In this lecture I will first review the database problem, together with the principal solution. Then I explain why this new ingredient does not add anything, and only obscures the picture. The fact that not everybody agrees with us will be illustrated by some interesting quotes from the recent literature. If you thought that mathematics could not be polemic you should certainly come and listen. (Joint work with Klaas Slooten.)
26/9 - Valerie Monbet, Université de Rennes: Time-change models for asymmetric processes
Many records in environmental sciences exhibit asymmetric trajectories. The physical mechanisms behind these records may lead for example to sample paths with different characteristics at high and low levels (up-down asymmetries) or in the ascending and descending phases leading to time irreversibility (front-back asymmetries). Such features are important for many applications and there is a need for simple and tractable models which can reproduce them. We explore original time-change models where the clock is a stochastic process which depends on the observed trajectory. The ergodicity of the proposed model is established under general conditions and this result is used to develop non-parametric estimation procedures based on the joint distribution of the process and its derivative. The methodology is illustrated on meteorological and oceanographic datasets. We show that, combined with a marginal transformation, the proposed methodology is able to reproduce important characteristics of the dataset such as marginal distributions, up-crossing intensity, up-down and front-back asymmetries.
3/10 - Peter Jagers, Chalmers: Populations - from few independently reproducing individuals to continuous and deterministic flows. Or: From branching processes to adaptive population dynamics
Abstract: When the density of populations grows, in pace with an environmental carrying capacity growth, general branching populations with interacting individuals and also in interplay with the environment, will stabilise towards a deterministic population flow, determined by an integral equation. The deviation between the original density and the limiting one, as the carrying capacity grows beyond all limits, will also converge to a diffusion process. This provides a firm basis in individual behaviour for ad hoc deterministic population models.
17/10 - Richard Davis, Columbia University and Chalmers Jubileum Professor 2019: Extreme Value Theory Without the Largest Values: What Can Be Done?
Abstract: During the last five years, there has been growing interest in inference related problems in the traditional extreme value theory setup in which the data has been truncated above some large value. The principal objectives have been to estimate the parameters of the model, usually in a Pareto or a generalized Pareto distribution (GPD) formulation, together with the truncated value. Ultimately, the Hill estimator plays a starring role in this work. In this paper we take a different perspective. Motivated by data coming from a large network, the Hill estimator appeared to exhibit smooth “sample path” behavior as a function of the number of upper order statistics used in the constructing the estimator. This became more apparent as we artificially censored more of the upper order statistics. Building on this observation, we introduce a new parameterization into the Hill estimator that is a function of δ and θ, that correspond, respectively, to the proportion of extreme values that have been censored and the path behavior of the “Hill estimator”. As a function of (δ,θ), we establish functional convergence of the renormalized Hill estimator to a Gaussian process. Based on this limit theory, an estimation procedure is developed to estimate the number of censored observations and other extreme value parameters including $\alpha$, the index of regular variation and the bias of Hill’s estimate. We illustrate this approach in both simulations and with real data. (This is joint work with Jingjing Zou and Gennady Samorodnitsky.)
24/10 - Erica Metheney, Department of Political Sciences, University of Gothenburg: Modifying Non-Graphic Sequences to be Graphic
Abstract: The field of network science has expanded greatly in recent years with applications in fields such as computer science, biology, chemistry, and political science. Overtime the networks in which we are interested have become larger and more interconnected, posing new computational challenges. We study the generation of graphic degree sequences in order to improve the overall efficiency of simulating networks with power-law degree distributions. We explain the challenges associated with this class of networks and present an algorithm to modify non-graphic degree sequences to be graphic. Lastly we show that this algorithm preserves the original degree distribution and satisfies certain optimality properties.
31/10 - Sofia Tapani, AstraZeneca: Early clinical trial design - Platform designs with the patient at its center
Abstract text: Adapting a portfolio approach to the implementation of clinical trials at the early stage has been evaluated within the oncology therapy area.
This feature of clinical trial design can also add value to other therapy areas due to its potential exploratory nature. The platform design allows for multi-arm clinical trials to evaluate several experimental treatments perhaps not all available at the same point in time. At the early clinical development stage, new drugs are rarely at the same stage of development. The alternative, several separate two-arm studies is time consuming and can be a bottle neck in development due to budget limitations in comparison to the more efficient platform study where arms are added at several different time points after start of enrolment.
Platform designs within the heart failure therapy area in early clinical development are exploratory of nature. Clear prognostic and predictive biomarker profiles for disease are not available and need to be explored to be identified for each patient population. As an example, we’ll have a look at the HIDMASTER trial design for biomarker identification and compound graduation throughout the platform.
All platform trials need to be thoroughly simulated, and simulations should be used as a tool to decide among design options. Simulations of platform trials gives the opportunity to investigate many scenarios including null scenario to establish overall type I error. We can evaluate bias estimation and sensitivity to patient withdrawals, missing data, enrolment rates/patterns, interim analysis timings, data access delays, data cleanliness, analysis delays, etc.
Simulations should also comprise decision operating characteristics to be able to make decisions on the design based on the objective of the trial: early stops of underperforming arms, early go for active arms, prioritise arms on emerging data or drawing insights from whole study data analysis.
Over time the trial learns about the disease, new endpoints, stratification biomarkers and prognostic vs predictive effects.
6/11 - Richard Torkar, Software Engineering, Chalmers: Why do we encourage even more missingness when dealing with missing data?
Abstract: In this presentation, we first introduce the reader to Bayesian data analysis (BDA) and missing data, and, in particular, how this is handled in empirical software engineering (ESE) research today. The example we make use of presents the steps done when conducting state of the art statistical analysis in our field. First, we need to understand the problem we want to solve. Second, we conduct causal analysis. Third, we analyze non-identifiability. Fourth, we conduct missing data analysis. Finally, we do a sensitivity analysis of priors. All this before we design our statistical model. Once we have a model, we present several diagnostics one can use to conduct sanity checks. We hope that through these examples, empirical software engineering will see the advantages of using BDA. This way, we hope Bayesian statistics will become more prevalent in our field, thus partly avoiding the reproducibility crisis we have seen in other disciplines. Our hope is that in this seminar statisticians will provide (valuable!) feedback on what is proposed, and hence provide empirical software engineering with a good first step in using BDA for the type of analyses we conduct.
7/11 - Krzysztof Bartoszek, Linköping University: Formulating adaptive hypotheses in multivariate phylogenetic comparative methods
Abstract: (joint work with G. Asimomitis, V. Mitov, M. Piwczyński, T. Stadler) Co-adaptation is key to understanding species evolution. Different traits have to function together so that the organism can work as a whole. Hence, all changes to environmental pressures have to be coordinated. Recently, we have developed R packages that are able to handle general, multivariate Gaussian processes realized over a phylogenetic tree. At the heart of the modelling framework is the so-called GLInv (Gaussian, mean depending linearly on the ancestral value and variance Invariant with respect to ancestral value) family of models. More formally a stochastic process evolving on a tree belongs to this family if
* after branching the traits evolve independently
* the distribution of the trait at time t, X(t), conditional on the ancestral value, X(s), at time s<t, is Gaussian with ** E[X(t) | X(s)] =
w(s,t) + F(s,t)X(s)
** Var[X(t) | X(s) ] = V(s,t),
where neither w(s,t), F(s,t), nor V(s,t) can depend on X(.) but may be further parametrized. Using the likelihood computational engine PCMBase [2, available on CRAN] the PCMFit [3, publicly available on GitHub] package allows for inference of models belonging to the GLInv family and furthermore allows for finding points of shifts between evolutionary regimes n the tree. What is particularly novel is that it allows not only for shifts between a model's parameters but for switches between different types of models within then GLInv family (e.g. a shift from a Brownian motion (BM) to an Ornstein-Uhlenbeck (OU) process and vice versa). Interactions between traits can be understood as magnitudes and signs of off-diagonal entries of F(s,t) or V(s,t). What is particularly interesting is that in this family of models one may obtain changes in the direction of the relationship, i.e. the long and short term joint dynamics can be of a different nature. This is possible even if one simplifies the process to an OU one. Here, one is able to very finely understand the dynamics of the process and propose specific model parameterizations [PCMFit and current CRAN version of mvSLOUCH, 1, which is based on PCMBase]. In the talk I will discuss how one can setup different hypotheses concerning relationships between the traits in terms of model parameters and how one can view the long and short term evolutionary dynamics. The software's possibilities will be illustrated by considering the evolution of fruit in the Ferula genus. I will also discuss some limit results that are amongst others, useful for setting initial seeds of the numerical estimation procedures.
 K. Bartoszek, J. Pienaar, P. Mostad, S. Andersson, and T. F. Hansen.
A phylogenetic comparative method for studying multivariate adaptation.
J. Theor. Biol. 314:204-215, 2012.
 V. Mitov, K. Bartoszek, G. Asimomitis, T. Stadler. Fast likelihood calculation for multivariate phylogenetic comparative methods: The PCMBase R package. arXiv:1809.09014, 2018.
 V. Mitov, K. Bartoszek, T. Stadler. Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models. PNAS, 201813823, 2019.
20/11 - Paul-Christian Bürkner, Aalto University: Bayesflow: Software assisted Bayesian workflow
Abstract: Probabilistic programming languages such as Stan, which can be used to specify and fit Bayesian models, have revolutionized the practical application of Bayesian statistics. They are an integral part of Bayesian data analysis and as such, a necessity to obtain reliable and valid inference. However, they are not sufficient by themselves. Instead, they have to be combined with substantive statistical and subject matter knowledge, expertise in programming and data analysis, as well as critical thinking about the decisions made in the process.
A principled Bayesian workflow consists of several steps from the design of the study, gathering of the data, model building, estimation, and validation, to the final conclusions about the effects under study. I want to present a concept for a software package that assists users in following a principled Bayesian workflow for their data analysis by diagnosing problems and giving recommendations for sensible next steps. This concept gives rise to a lot of interesting research questions we want to investigate in the upcoming years.
27/11 - Geir Storvik, Oslo University: Flexible Bayesian Nonlinear Model Configuration
Abstract: Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. In this talk, we introduce the class of Bayesian generalized nonlinear regression models (BGNLM) with a comprehensive non-linear feature space. Non-linear features are generated hierarchically similarly to deep learning, but with extended flexibility on the possible types of features to be considered. This extended flexibility combined with variable selection allows to find a small set of important features and thereby more interpretable models. A mode jumping MCMC algorithm is presented to make inference on BGNLMs. Model averaging is also possible within our framework. In various applications, we illustrate how BGNLM is used to obtain meaningful non-linear models. Additionally, we compare its predictive performance with a number of machine learning algorithms.
This is joint work with Aliaksandr Hubin (Norwegian Computing Center) and Florian Frommlet (CEMSIIS, Medical University of Vienna)
4/12 - Moritz Schauer, Chalmers/GU: Smoothing and inference for high dimensional diffusions
Abstract: Suppose we discretely observe a diffusion process and we wish to estimate parameters appearing in either the drift coefficient or the diffusion coefficient. We derive a representation of the conditional distribution given observations as change of measure to be embedded as a step in a Monte-Carlo procedure to estimate those parameters. The technique is based on solving the reverse time filtering problem for a linear approximation of the diffusion and a change of measure to correct for the difference between the linear approximation and the true smoothed process.
We apply this to the problem of tracking convective cloud systems from satellite data with low time resolution.
11/12 - Johannes Borgqvist, Chalmers/GU: The polarising world of Cdc42: the derivation and analysis of a quantitative reaction diffusion model of cell polarisation
Abstract: A key regulator of cell polarisation in organisms ranging from yeast to higher mammals is the Cell division control protein 42 homolog, Cdc42. It is a GTPase of the Rho family which determines the site of the pole by a combination of reactions (i.e. activation and deactivation) and diffusion in the cell. A study in yeast showed that with high age, the Cdc42 pathway loses its function which prevents replicative ageing. Moreover, Cdc42 activity is involved in both ageing and rejuvenation of hematopoietic stem cells which illustrates the importance of Cdc42 in the ageing process of numerous organisms. Experimentally, the challenge is that the concentration profile of Cdc42 is not uniform. Thus, accounting for spatial inhomogeneities is crucial when data is collected, but these experiments are hard to conduct. Similarly, the problem with the numerous mathematical models is that they do not account for cell geometry. Consequently, they do not provide a realistic description of the polarisation process mediated by Cdc42.
In this project, we develop a quantifiable model of cell polarisation accounting for the morphology of the cell. The model consists of a coupled system of PDEs, more specifically Reaction Diffusion equations, with two spatial domains: the cytosol and the cell membrane. In this setting, we prove sufficient conditions for pattern formation. Using a “Finite Element”-based numerical scheme, we simulate cell polarisation for these two domains. Further, we illustrate the impact of the parameters on the patterns that emerge and we estimate the time until polarization. Using this work as a starting point, it is possible to integrate data into the theoretical description of the process to deeper understand cell polarisation mechanistically.