## 2021

**19/1, Anna Dreber Almenberg, Stockholm School of Economics: (Predicting) replication outcome**

Abstract: Why are there so many false positive results in the published scientific literature? And what is the actual share of results that do not replicate in different literatures in the experimental social sciences? I will discuss several large replication projects on direct and conceptual replications, as well as our studies on "wisdom-of-crowds" mechanisms like prediction markets and forecasting surveys where researchers attempt to predict replication outcomes as well as new outcomes.

2/2, Claudia Redenbach, Technische Universität Kaiserslautern: Using stochastic models for segmentation and characterization of spatial microstructures

Abstract: The performance of engineering materials such as foams, fibre composites or concrete is heavily influenced by the microstructure geometry. Quantitative analysis of 3D images, provided for instance by micro computed tomography (µCT), allows for a characterization of material samples. In this talk, we will illustrate how models from stochastic geometry may support the segmentation of image data and the statistical analysis of the microstructures. Our first example deals with the estimation of the fibre length distribution from µCT images of glass fibre reinforced composites. As examples of segmentation tasks we present the reconstruction of the solid component of a porous medium from focused ion beam scanning electron microscopy (FIB-SEM) image data and the segmentation of cracks in µCT images of concrete.

16/2, Fredrik Johansson, Chalmers: Making the most of observational data in causal estimation with machine learning

Abstract: Decision making is central to all aspects of society, private and public. Consequently, using data and statistics to improve decision-making has a rich history, perhaps best exemplified by the randomized experiment. In practice, however, experiments carry significant risk. For example, making an online recommendation system worse could result in millions of lost profits; selecting an inappropriate treatment for a patient could have devastating consequences. Luckily, organizations like hospitals and companies who serve recommendations routinely collect vast troves of observational data on decisions and outcomes. In this talk, I discuss how to make the best use of such data to improve policy, starting with an example of what can go wrong if we’re not careful. Then, I present two pieces of research on how to avoid such perils if we are willing to say more about less.

2/3, Andrea De Gaetano, IRIB CNR: Modelling haemorrhagic shock and statistical challenges for parameter estimation

Abstract: In the ongoing development of ways to mitigate the consequences of penetrating trauma in humans, particularly in the area of civil defence and military operations, possible strategies aimed at identifying the victim's physiological state and its likely evolution depend on mechanistic, quantitative understanding of the compensation mechanisms at play. In this presentation, time-honored and recent mathematical models of the dynamical response to hemorrhage are briefly discussed and their applicability to real-life situations is examined. Conclusions are drawn as to the necessary formalization of this problem, which however poses methodological challenges for parameter estimation.

16/3, Fredrik Lindsten, Linköping University: Monte Carlo for Approximate Bayesian Inference

Abstract: Sequential Monte Carlo (SMC) is a powerful class of methods for approximate Bayesian inference. While originally used mainly for signal processing and inference in dynamical systems, these methods are in fact much more general and can be used to solve many challenging problems in Bayesian statistics and machine learning, even if they lack apparent sequential structure. In this talk I will first discuss the foundations of SMC from a machine learning perspective. We will see that there are two main design choices of SMC: the proposal distribution and the so-called intermediate target distributions, where the latter is often overlooked in practice. Focusing on graphical model inference, I will then show how deterministic approximations, such as variational inference and expectation propagation, can be used to approximate the optimal intermediate target distributions. The resulting algorithm can be viewed as a post-correction of the biases associated with these deterministic approximations. Numerical results show improvements over the baseline deterministic methods as well as over "plain" SMC.

The first part of the talk is an introduction to SMC inspired by our recent Foundations and Trends tutorial

30/3, Manuela Zucknick, University of Oslo: Bayesian modelling of treatment response in ex vivo drug screens for precision cancer medicine

Abstract: Large-scale cancer pharmacogenomic screening experiments profile cancer cell lines or patient-derived cells versus hundreds of drug compounds. The aim of these in vitro studies is to use the genomic profiles of the cell lines together with information about the drugs to predict the response to a particular combination therapy, in particular to identify combinations of drugs that act synergistically. The field is hyped with rapid development of sophisticated high-throughput miniaturised platforms for rapid large-scale screens, but development of statistical methods for the analysis of resulting data is lagging behind. I will discuss typical challenges for estimation and prediction of response to combination therapies, from large technical variation and experimental biases to modelling challenges for prediction of drug response using genomic data. I will present two Bayesian models that we have recently developed to address diverse problems relating to the estimation and prediction tasks, and show how they can improve the identification of promising drug combinations over standard non-statistical approaches.

6/4, Prashant Singh, Uppsala University: Likelihood-free parameter inference of stochastic time series models: exploring neural networks to enhance scalability, efficiency and performance

Abstract: Parameter inference of stochastic time series models, such as gene regulatory networks in the likelihood-free setting is a challenging task, particularly when the number of parameters to be inferred is large. Recently, data-driven machine learning models (neural networks in particular) have delivered encouraging results towards addressing the scalability, efficiency and parameter inference quality of the likelihood-free parameter inference pipeline. In particular, this talk will present a detailed discussion on neural networks as trainable, expressive and scalable summary statistics of high-dimensional time series for parameter inference tasks.

Preprint reference: Åkesson, M., Singh, P., Wrede, F., & Hellander, A. (2020). Convolutional neural networks as summary statistics for approximate bayesian computation. arXiv preprint arXiv:2001.11760

11/5, Ilaria Prosdocimi, University of Venice: Statistical models for the detection of changes in peak river flow in the UK

Abstract: Several parts of the United Kingdom have experienced highly damaging flooding events in the recent decades, raising doubts on whether methods used to assess flood risk, and therefore design flood defences, are "fit for purpose". It has also been hypothesized that the high number of recent extreme events might be one of the impacts of the (anthropogenic) changes in the climate. Indeed, with the increasing evidence of a changing climate, there is much interest in investigating the potential impacts of these changes on the risks linked to natural hazards such as intense rainfall, extreme waves and flooding. This has resulted in several studies investigating changes in natural hazard extremes, including peak river flow extremes in the UK. This talk will review a selection of these studies, discussing some of the pitfalls of statistical models typically employed to assess whether any change can be detected in peak river flow extremes. Solutions to these pitfalls are outlined and discussed. In particular, the consequences of the functional forms assumed to describe change in extremes on the ability of describing changes in the risk profiles of natural hazards are discussed.

25/5, Matteo Fasiolo, University of Bristol: Generalized additive models for ensemble electricity demand forecasting

Abstract: Future grid management systems will coordinate distributed production and storage resources to manage, in a cost-effective fashion, the increased load and variability brought by the electrification of transportation and by a higher share of weather-dependent production.

Electricity demand forecasts at a low level of aggregation will be key inputs for such systems. In this talk, I'll focus on forecasting demand at the individual household level, which is more challenging than forecasting aggregate demand, due to the lower signal-to-noise ratio and to the heterogeneity of consumption patterns across households.

I'll describe a new ensemble method for probabilistic forecasting, whichborrows strength across the households while accommodating their individual idiosyncrasies.

The first step consists of designing a set of models or 'experts' which capture different demand dynamics and fitting each of them to the data from each household.

Then the idea is to construct an aggregation ofexperts where the ensemble weights are estimated on the whole data set,the main innovation being that we let the weights vary with the covariates by adopting an additive model structure.In particular, the proposed aggregation method is an extension of regression stacking (Breiman, 1996) where the mixture weights are modelled using linear combinations of parametric, smooth or random effects.

The methods for building and fitting additive stacking models are implemented by the gamFactory R package, available at https://github.com/mfasiolo/gamFactory

8/6, Seyed Morteza Najibi, Lund University, Functional Singular Spectrum Analysis with application to remote sensing data

One of the popular approaches in the decomposition of time series is accomplished using the rates of change. In this approach, the observed time series is partitioned (decomposed) into informative trends plus potential seasonal (cyclical) and noise (irregular) components. Aligned with this principle, Singular Spectrum Analysis (SSA) is a model-free procedure that is commonly used as a nonparametric technique in analysing the time series. SSA does not require restrictive assumptions such as stationarity, linearity, and normality. It can be used for a wide range of purposes such as trend and periodic component detection and extraction, smoothing, forecasting, change-point detection, gap filling, causality, and so on.

In this talk, I will briefly overview SSA methodology and introduce a new extension called functional SSA to analyze functional time series. This is developed by integrating ideas from functional data analysis and univariate SSA. I will demonstrate this approach for tracking changes in vegetation over time by analysing the kernel density functions of Normalized Difference Vegetation Index (NDVI) images. At the end of the talk, I will also illustrate a simulated example in the interactive Shiny web application implemented in the Rfssa package.

25/8, Jonas Wallin, Lund University: Locally scale invariant proper scoring rules

Abstract: Averages of proper scoring rules are often used to rank probabilistic forecasts. In many cases, the variance of the individual observations and their predictive distributions vary in these averages. We show that some of the most popular proper scoring rules, such as the continuous ranked probability score (CRPS) which is the go-to score for continuous observation ensemble forecasts, up-weight observations with large uncertainty which can lead to unintuitive rankings.

To describe this issue, we define the concept of local scale invariance for scoring rules. A new class of generalized proper kernel scoring rules is derived, and as a member of this class, we propose the scaled CRPS (SCRPS). This new proper scoring rule is locally scale-invariant and therefore works in the case of varying uncertainty. Like CRPS it is computationally available for output from ensemble forecasts and does not require the ability to evaluate the density of the forecast. The theoretical findings are illustrated in a few different applications, where we in particular focus on models in spatial statistics.

14/9, Moritz Schauer, Chalmers/GU: The sticky Zig-Zag sampler: an event chain Monte Carlo (PDMP-) sampler for Bayesian variable selection

Abstract: During the talk, I will present the sticky event chain Monte Carlo (piecewise deterministic Monte Carlo) samplers [1]. This is a new class of efficient Monte Carlo methods based on continuous-time piecewise deterministic Markov processes (PDMPs) suitable for inference in high dimensional sparse models, i.e. models for which there is prior knowledge that many coordinates are likely to be exactly 0. This is achieved with the fairly simple idea of endowing existing PDMP samplers with sticky coordinate axes, coordinate planes etc. Upon hitting those subspaces, an event is triggered, during which the process sticks to the subspace, this way spending some time in a sub-model. That introduces non-reversible jumps between different (sub-)models. During the talk, I will touch upon computational aspects of the algorithm and illustrate the method for a number of statistical models where both the sample size N and the dimensionality d of the parameter space are large.

[1] J. Bierkens, S. Grazzi, F. van der Meulen, and M. Schauer. Sticky PDMP samplers for sparse and local inference problems. arXiv: 2103.08478, 2021.

Joris Bierkens, Delft University of Technology, joris.bierkens@tudelft.nl

Sebastiano Grazzi, Delft University of Technology, s.grazzi@tudelft.nl

Frank van der Meulen, Delft University of Technology, f.h.vandermeulen@tudelft.nl

Moritz Schauer, Chalmers University of Technology, University of Gothenburg, smoritz@chalmers.se

21/9, Johan Larsson, Lund University: The Hessian Screening Rule and Adaptive Paths for the Lasso

Abstract: Predictor screening rules, which discard predictors from the design matrix before fitting the model, have had sizable impacts on the speed at which sparse regression models, such as the lasso, can be solved in the high-dimensional regime. Current state-of-the-art methods, however, face difficulties when dealing with highly-correlated predictors, often becoming too conservative.

In this talk we introduce a new screening rule that deals with this issue: The Hessian Screening Rule, which offers considerable improvements in computational performance when fitting the lasso. These benefits result both from the screening rule itself, but also from much-improved warm starts.

The Hessian Screening Rule also presents a welcome improvement to the construction of the lasso path: the set of lasso models produced by varying the strength of the penalization. The default approach, to a priori construct a log-spaced penalty grid, often fails in approximating the true (exact) lasso path. Leaning on the information already used when computing the Hessian Screening Rule, however, we can improve upon the construction of this grid by adaptively picking penalty parameters along the path.

12/10, Konstantinos Konstantinou, Chalmers/GU: Spatial modeling of epidermal nerve fiber patterns

Abstract: Peripheral neuropathy is a condition associated with poor nerve functionality. Epidermal nerve fiber (ENF) counts per epidermal surface are dramatically reduced and the two dimensional spatial structure of ENFs tends to become moreclustered as neuropathy progresses. Therefore, studying the spatial structure of ENFs is essential to fully understand the mechanisms that guide those morphological changes. In this paper, we compare ENF patterns of healthy controls and subjects sufferingfrom mild diabetic neuropathy by using suction skin blister specimens obtained from the right foot. Previous analysis of these data has focused on the analysis and modelling of the spatial ENF patterns consisting of the points where the nerves enter the epidermis,base points, and the points where the nerve fibers terminate, end points, projected on a two dimensional plane, regarding the patterns as realisations of spatial point processes. Here, we include the first branching points, the points where the nerve treesbranch for the first time, and model the three dimensional patterns consisting of these three types of points. To analyze the patterns, spatial summary statistics are used and a new epidermal active territory(EAT) that measures the volume in the epidermisthat is covered by the individual nerve fibers is constructed. We developed a model for both the two dimensional and the three dimensionalpatterns including the branching points. Also, possible competitive behavior between individual nerves is examined.Our results indicate that changes in the ENFs spatial structure can more easily be detected in the later parts of the ENFs.

See Konstantinou, K., & Särkkä, A. (2021). Spatial modeling of epidermal nerve fiber patterns. *Statistics in Medicine **https://doi.org/10.1002/sim.9194*

19/10, Alice Corbella, Warwick University: Introducing Zig-Zag Sampling and making it applicable

Abstract: Recent research showed that Piecewise Deterministic Markov Processes (PDMP) may be exploited to design efficient MCMC algorithms [1]. The Zig-Zag sampler is an example of this: it is based on the simulation of a PDMP whose switching rate λ(t) is governed by the derivative of a (minus log) target density.

While many theoretical properties of this sampler have been derived, less has been done to explore the applicability of the Zig-Zag sampler to solve Bayesian inference problems. In particular, the computation of the derivative of the log-density in the rate λ(t) might be challenging. To expand the applicability of the Zig-Zag sampler, we incorporate Automatic Differentiation tools in the Zig-Zag algorithm, to evaluate λ(t) from the functional form of the log-target density. Moreover, to allow the simulation of a PDMP via Poisson thinning, we use univariate optimization routines to find local upper bounds.

In this talk we introduce PDMPs and the Zig-Zag sampler; we expose our Automatic Zig-Zag sampler; we discuss the challenges that arise with the simulation via thinning and the need of a new tuning parameter; and we comment on efficiencies and bottlenecks of AD for Zig-Zag. We present many examples to compare our method to HMC, another widely used gradient-based method.

This is joint work with Simon Spencer and Gareth Roberts.

[1] Fearnhead, P., Bierkens, J., Pollock, M., and Roberts, G.O., 2018. Piecewise deterministic Markov processes for continuous-time Monte Carlo. Statistical Science, 33(3), pp.386-412.

9/11 Dootika Vats, Indian Institute of Technology, Kanpur: Revisiting the Gelman-Rubin Diagnostic

Abstract: Gelman and Rubin's (1992) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods of variance estimation for Monte Carlo averages. We show that this class of estimators find immediate use in the Gelman-Rubin statistic, a connection not established in the literature before. We incorporate these estimators to upgrade both the univariate and multivariate Gelman-Rubin statistics, leading to increased stability in MCMC termination time. An immediate advantage is that our new Gelman-Rubin statistic can also be calculated for a single chain. In addition, we establish a relationship between the Gelman-Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled cut-off criterion for the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via an example. This work is joint with Christina Knudson, University of St. Thomas, Minnesota.

23/11 Edward Ionides, University of Michigan: Bagging and blocking: Inference via particle filters for interacting dynamic systems

Abstract: Infectious disease transmission is a nonlinear partially observed stochastic dynamic system with topical interest. For low-dimensional systems, models can be fitted to time series data using Monte Carlo particle filter methods. As dimension increases, for example when analysing epidemics among multiple spatially coupled populations, basic particle filter methods rapidly degenerate. A collection of independent Monte Carlo calculations can be combined to give a global filtering solution with favourable theoretical scaling properties. The independent Monte Carlo calculations are called bootstrap replicates, and their aggregation is called a bagged filter. Bagged filtering is effective at likelihood evaluation for a model of measles transmission within and between cities. A blocked particle filter also works well at this task. Bagged and blocked particle filters can both be coerced into carrying out likelihood maximization by iterative application to an extension of the model that has stochastically perturbed parameters. Numerical results are carried out using the R package spatPomp.

30/11, Sebastian Persson, Chalmers/GU: Scalable Bayesian inference for dynamic state-space mixed-effects models

Abstract:

Parameter inference is an important step when constructing a dynamic model in many fields ranging from biology, medicine (PK/PD) to finance. In many scenarios, such as when modelling biological single-cell behaviour, we are interested in inference for an entire population by simultaneously fitting observations from multiple individuals. However, inference from multi-individual longitudal data is often non-trivial due to the presence of intrinsic and extrinsic sources of variability. To address this, we consider inference for the challenging case when the dynamics for a state-space mixed-effects model (SSMEM) are driven by stochastic processes such as a Markov-jump process or a stochastic differential equation. We present an efficient Gibbs-sampler for fully Bayesian inference for SSMEMs, which compared to previous samplers can be more than 30 times faster for many (>100) individuals. The individual parameters in the Gibbs-sampler, which have an intractable likelihood, are efficiently sampled via correlated-particle pseudo-marginal Metropolis-Hastings' steps. The population parameters of the random effects, which have a tractable likelihood, are updated using a HMC sampler to allow for a realistic parameterization of the individual parameters. The performance of our Gibbs sampler is investigated on challenging simulated datasets (e.g., a stochastic bi-stable model) and on a real-life dataset. Furthermore, we investigate the performance of different adaptive MCMC algorithms for the pseudo-marginal steps.

10/12, Thordis L. Thorarinsdottir, Norwegian Computing Centre: Validation of point process predictions with proper scoring rules

Abstract: In prediction settings, model validation methods are needed to rank competing models according to their predictive performance. Such rankings are typically obtained by proper scoring rules. A challenge for applying known scoring rules to point process predictions is that mathematical properties, such as densities or moment measures, are intractable for many point process models. We introduce a class of proper scoring rules for evaluating point process predictions based on summary statistics. These scoring rules rely on Monte-Carlo approximations of expectations and can therefore easily be evaluated for any point process model that can be simulated. The scoring rules allow for evaluating the calibration of a model to specific aspects of a point process, such as its spatial distribution or tendency towards clustering.

14/12, Serik Sagitov, Chalmers/GU: Critical branching as a pure death process coming down from infinity

Abstract:

We consider a critical Galton-Watson process with overlapping generations stemming from a single founder.

Assuming that both the variance of the offspring number and the average generation length are finite, we establish the convergence of the finite-dimensional distributions of this branching process, conditioned on non-extinction at a remote time of observation, to those of a pure death process.

This result brings a new perspective on Vatutin's dichotomy claiming that in the critical regime of age-dependent reproduction, an extant population either contains a large number of short-living individuals or consists of few long-living individuals.

## 2020

**4/2, Aleksej Zelezniak, Chalmers: Uncovering genotype-phenotype relationships using artificial intelligence**

Understanding the genetic regulatory code that governs gene expression is a primary, yet challenging aspiration in molecular biology that opens up possibilities to cure human diseases and solve biotechnology problems. I will present two our recent works (1,2). First, I will demonstrate how we applied deep learning on over 20,000 mRNA datasets to learn the genetic regulatory code controlling mRNA expression in 7 model organisms ranging from bacteria to human. There, we show that in all organisms, mRNA abundance can be predicted directly from the DNA sequence with high accuracy, demonstrating that up to 82% of the variation of gene expression levels is encoded in the gene regulatory structure. In second study, I will present a ProteinGAN, a specialised variant of the generative adversarial network that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. We tested ProteinGAN experimentally showing that learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino acid sequence space and creates new, highly diverse functional sequence variants with natural-like physical properties. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse novel functional proteins within the allowed biological constraints of the sequence space.

1) Zrimec J et al “Gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure”, biorxiv, https://doi.org/10.1101/792531

2) Repecka D et al, "Expanding functional protein sequence space using generative adversarial networks”, biorxiv, https://doi.org/10.1101/789719

**11/2, Henrik Imberg, Chalmers: Optimal sampling in unbiased active learning**

Considering a general family of parametric prediction models, we derive asymptotic expansions for the mean squared prediction error and for the variance of the total loss, and consequently present sampling schemes that minimise these quantities. We show that the resulting sampling schemes depend both on label uncertainty and on the influence on model fitting through the location of data points in the feature space, and have a close connection to statistical leverage.

The proposed active learning algorithm is evaluated on a number of datasets, and we demonstrate better predictive performance than competing methods on all benchmark datasets. In contrast, deterministic uncertainty sampling always performed worse than simple random sampling, as did probabilistic uncertainty sampling in one of the examples.

**18/2, Ottmar Cronie, Department of Public Health and Community Medicine, University of Gothenburg, and Department of Mathematics and Mathematical Statistics, Umeå University: Resample-smoothing and its application to Voronoi estimators**

**10/3, Nikolaos Kourentzes, University of Skövde: Predicting with hierarchies**

**17/3, Mike Pereira, Chalmers: A matrix-free approach to deal with non-stationary Gaussian random fields in geostatistical applications**

**24/3, Valeria Vitelli, Department of Biostatistics, University of Oslo: A novel Variational Bayes approach to Preference Learning with the Mallows rank model**

**21/4, Rasmus Pedersen, Roskilde Universitet: Modelling Hematopoietic Stem Cells and their Interaction with the Bone Marrow Micro-Environment**

Blood cell formation (hematopoiesis) is a process maintained by the hematopoietic stem cells (HSCs) from within the bone marrow. HSCs give rise to progenitors which in turn produce the vast amount of cells located in the blood. HSCs are capable of self-renewal, and hence a sustained production of cells is possible, without exhaustion of the HSCs pool.

Mutations in the HSC genome give rise to a wide range of hematologic malignancies, such as acute myeloid leukemia (AML) or the myeloproliferative neoplasms (MPNs). As HSCs are difficult to investigate experimentally, mathematical modelling of HSC and blood dynamics is a useful tool in the battle against blood cancers.

We have developed a mechanism-based mathematical model of the HSCs and their interaction with the bone marrow micro-environment. Specifically, the model directly considers the reversible binding of HSCs to their specific niches, often omitted in other modelling works. In my talk, I will discuss some of the aspects of developing the model and the immediate results that arise from the model, which includes an expression of HSC fitness and insight about outcomes of bone marrow transplantation. To relate the HSC dynamics to observable measures such as blood-cell count, the model is reduced and incorporated into a separate model of the blood system. The combined model is compared with a vast data-set of blood measurements of MPN-diagnosed patients during treatment.By including the biological effects of the treatment used in the model, patient trajectories can be modelled to a satisfying degree. Such insights from the model show great promise for future predictions of patient responses and design of optimal treatment schemes.

**28/4, András Bálint, Chalmers: Mathematical methods in the analysis of traffic safety data**

**23/6, Chris Drovandi, Queensland University of Technology, Australia: Accelerating sequential Monte Carlo with surrogate likelihoods**

[This work is led by PhD student Joshua Bon (Queensland University of Technology) and is in collaboration with Professor Anthony Lee (University of Bristol)]

**13/10, Raphaël Huser, KAUST: Estimating high-resolution Red Sea surface temperature hotspots, using a low-rank semiparametric spatial model**

**20/10, Luigi Acerbi, University of Helsinki: Practical sample-efficient Bayesian inference for models with and without likelihoods**

**24/11, Peter Jagers and Sergey Zuyev, Chalmers: Galton was right: all populations die out**

**1/12, Umberto Simola, University of Helsinki: Adaptive Approximate Bayesian Computation Tolerance Selection**

**8/12, Magnus Röding, Chalmers and RISE: Mass transport in porous materials – combining physics, spatial statistics, machine learning, and data science**

## 2019

**24/1 - Mats Gyllenberg, Helsingfors Universitet: On models of physiologically structured populations and their reduction to ordinary differential equations**

Sammanfattning: Considering the environmental condition as a given function of time, we formulate a physiologically structured population model as a linear non-autonomous integral equation for the, in general distributed, population level birth rate. We take this renewal equation as the starting point for addressing the following question: When does a physiologically structured population model allow reduction to an ODE without loss of relevant information? We formulate a precise condition for models in which the state of individuals changes deterministically, that is, according to an ODE. Specialising to a one-dimensional individual state, like size, we present various sufficient conditions in terms of individual growth-, death-, and reproduction rates, giving special attention to cell fission into two equal parts and to the catalogue derived in an other paper of ours (submitted). We also show how to derive an ODE system describing the asymptotic large time behaviour of the population when growth, death and reproduction all depend on the environmental condition through a common factor (so for a very strict form of physiological age).

**31/1 - Christian A. Naesseth, Automatic Control, Linköping: Variational and Monte Carlo methods - Bridging the Gap**

**7/2 - Jonas Wallin, Lund University: Multivariate Type-G Matérn fields**

Abstract: I will present a class of non-Gaussian multivariate random fields is formulated using systems of stochastic partial differential equations (SPDEs) with additive non-Gaussian noise. To facilitate computationally efficient likelihood-based inference, the noise is constructed using normal-variance mixtures (type-G noise). Similar, but simpler, constructions have been proposed earlier in the literature, however they lack important properties such as ergodicity and flexibility of predictive distributions. I will present that for a specific system of SPDEs the marginal of the fields has Matérn covariance functions.

Further I will present a parametrization of the system, that one can use to separate the cross-covariance and the extra dependence coming from the non-Gaussian noise in the proposed model.

If time permits I will discuss some recent result on proper scoring rules (PS). PS is the standard tool for evaluating which model fits data best in spatial statistics (like Gaussian vs non-Gaussian models).

We have developed a new class of PS that I argue is better suited for evaluation model if one has observations at irregular locations.

**14/2 - Jes Frellsen, IT University of Copenhagen: Deep latent variable models: estimation and missing data imputation**

This is joint work with Pierre-Alexandre Mattei.

**21/2 - Riccardo De Bin, University of Oslo: Detection of influential points as a byproduct of resampling-based variable selection procedures**

Abstract: Influential points can cause severe problems when deriving a multivariable regression model. A novel approach to check for such points is proposed, based on the variable inclusion matrix, a simple way to summarize results from resampling-based variable selection procedures. These procedures rely on the variable inclusion matrix, which reports whether a variable (column) is included in a regression model fitted on a pseudo-sample (row) generated from the original data (e.g., bootstrap sample or subsample). The variable inclusion matrix is used to study the variable selection stability, to derive weights for model averaged predictors and in others investigations. Concentrating on variable selection, it also allows understanding whether the presence of a specific observation has an influence on the selection of a variable.

From the variable inclusion matrix, indeed, the inclusion frequency (I-frequency) of each variable can be computed only in the pseudo-samples (i.e., rows) which contain the specific observation. When the procedure is repeated for each observation, it is possible to check for influential points through the distribution of the I-frequencies, visualized in a boxplot, or through a Grubbs’ test. Outlying values in the former case and significant results in the latter point to observations having an influence on the selection of a specific variable and therefore on the finally selected model. This novel approach is illustrated in two real data examples.

**28/2 - Johan Henriksson: Single-cell perturbation analysis – the solution to systems biology?**

**7/3 - Larisa Beilina: Time-adaptive parameter identification in mathematical model of HIV infection with**

**drug therapy**

**14/3 - Umberto Picchini: Accelerating MCMC sampling via approximate delayed-acceptance**

When pursuing Bayesian inference for model parameters, MCMC can be computationally very expensive, either when the dataset is large, or when the likelihood function is unavailable in closed form and itself requires Monte Carlo approximations. In these cases each iteration of Metropolis-Hastings may result intolerably slow. The so-called "delayed acceptance" MCMC (DA-MCMC) was suggested by Christen and Fox in 2005 and allows the use of a computationally cheap surrogate of the likelihood function to rapidly screen (and possibly reject) parameter proposals, while using the expensive likelihood only when the proposal has survived the "scrutiny" of the cheap surrogate. They show that DA-MCMC samples from the exact posterior distribution and returns results much more

rapidly than standard Metropolis-Hastings. Here we design a novel delayed-acceptance algorithm, which is between 2 and 4 times faster than the original DA-MCMC, though ours results in approximate inference. Despite this, we show empirically that our algorithm returns accurate inference. A computationally intensive case study is discussed,

involving ~25,000 observations from protein folding reaction coordinate, fit by an SDE model with an intractable likelihood approximated using sequential Monte Carlo (that is particle MCMC).

This is joint work with Samuel Wiqvist, Julie Lyng Forman, Kresten Lindorff-Larsen and Wouter Boomsma.

keywords: Bayesian inference, Gaussian process; intractable likelihood; particle MCMC; protein folding; SDEs

**21/3 - Samuel Wiqvist, Lund University: Automatic learning of summary statistics for Approximate Bayesian Computation using Partially Exchangeable Networks**

Here we introduce a novel deep learning architecture (Partially Exchangeable Networks, PENs), with the purpose to automatize the summaries selection task. We only need to provide our network with samples from the prior predictive distribution, and this will return summary statistics for ABC use. PENs are designed to have the correct invariance property for Markovian data, and PENs are therefore particularly useful when learning summary statistics for Markovian data.

Case studies show that our methodology outperforms other popular methods, resulting in more accurate ABC inference for models with intractable likelihoods. Empirically, we show that for some case studies our approach seems to work well also with non-Markovian and non-exchangeable data.

**28/3 -**

**Hans Falhlin (**

**Chief Investment Officer,**

**AP2, Andra AP-fonden)**

**and Tomas Morsing (**

**Head of Quantitative Strategies,**

**AP2, Andra AP-fonden):**

**A scientific approach to financial decision making**

**in the context of managing Swedish pension assets**

**11/4 - Daniele Silvestro: Birth-death models to understand the evolution of (bio)diversity**

Abstract: Our planet and its long history are characterized by a stunning diversity of organisms, environments and, more recently, cultures and technologies. To understand what factors contribute to generating diversity and shaping its evolution we have to look beyond diversity patterns. Here I present a suite of Bayesian models to infer the dynamics of origination and extinction processes using fossil occurrence data and show how the models can be adapted to the study of cultural evolution. Through empirical examples, I will demonstrate the use of this probabilistic framework to test specific hypotheses and quantify the processes underlying (bio)diversity patterns and their evolution.

**12/4 - Erika B. Roldan Roa, Department of Mathematics, The Ohio State University: Evolution of the homology and related geometric properties of the Eden Growth Model**

Abstract: In this talk, we study the persistent homology and related geometric properties of the evolution in time of a discrete-time stochastic process defined on the 2-dimensional regular square lattice. This process corresponds to a cell growth model called the Eden Growth Model (EGM). It can be described as follows: start with the cell square of the 2-dimensional regular square lattice of the plane that contains the origin; then make the cell structure grow by adding one cell at each time uniformly random to the perimeter. We give a characterization of the possible change in the rank of the first homology group of this process (the "number of holes"). Based on this result we have designed and implemented a new algorithm that computes the persistent homology associated to this stochastic process and that also keeps track of geometric features related to the homology. Also, we present obtained results of computational experiments performed with this algorithm, and we establish conjectures about the asymptotic behaviour of the homology and other related geometric random variables. The EGM can be seen as a First Passage Percolation model after a proper time-scaling. This is the first time that tools and techniques from stochastic topology and topological data analysis are used to measure the evolution of the topology of the EGM and in general in FPP models.

**16/5 - Susanne Ditlevsen, University of Copenhagen: Inferring network structure from oscillating systems with cointegrated phase processes**

**23/5 - Chun-Biu Li, Stockholms Universitet: Information Theoretic Approaches to Statistical Learning**

Abstract: Since its introduction in the context of communication theory, information theory has extended to a wide range of disciplines in both natural and social sciences. In this talk, I will explore information theory as a nonparametric probabilistic framework for unsupervised and supervised learning free from a prioriassumption on the underlying statistical model. In particular, the soft (fuzzy) clustering problem in unsupervised learning can be viewed as a tradeoff between data compression and minimizing the distortion of the data. Similarly, modeling in supervised learning can be treated as a tradeoff between compression of the predictor variables and retaining the relevant information about the response variable. To illustrate the usage of these methods, some applications in biophysical problems and time series analysis will be briefly addressed in the talk.

**13/6 - Sara Hamis, Swansea University: DNA Damage Response Inhibition: Predicting in vivo treatment responses using an in vitro- calibrated mathematical model**

In this talk I will present an individual based mathematical cancer model in which one individual corresponds to one cancer cell. This model is governed by a few observable and well documented principles, or rules. To account for differences between the in vitro and in vivo scenarios, these rules can be appropriately adjusted. By only adjusting the rules (whilst keeping the fundamental framework intact), the mathematical model can first be calibrated by in vitro data and thereafter be used to successfully predict treatment responses in mouse xenografts in vivo. The model is used to investigate treatment responses to a drug that hinders tumour proliferation by targeting the cellular DNA damage response process.

**19/9 - Ronald Meester, Vrije University, Amsterdam: The DNA Database Controversy 2.0**

**26/9 - Valerie Monbet, Université de Rennes: Time-change models for asymmetric processes**

**3/10 - Peter Jagers, Chalmers: Populations - from few independently reproducing individuals to continuous and deterministic flows. Or: From branching processes to adaptive population dynamics**

**17/10 - Richard Davis, Columbia University and Chalmers Jubileum Professor 2019: Extreme Value Theory Without the Largest Values: What Can Be Done?**

**24/10 - Erica Metheney, Department of Political Sciences, University of Gothenburg: Modifying Non-Graphic Sequences to be Graphic**

**31/10 - Sofia Tapani, AstraZeneca: Early clinical trial design - Platform designs with the patient at its center**

This feature of clinical trial design can also add value to other therapy areas due to its potential exploratory nature. The platform design allows for multi-arm clinical trials to evaluate several experimental treatments perhaps not all available at the same point in time. At the early clinical development stage, new drugs are rarely at the same stage of development. The alternative, several separate two-arm studies is time consuming and can be a bottle neck in development due to budget limitations in comparison to the more efficient platform study where arms are added at several different time points after start of enrolment.

Platform designs within the heart failure therapy area in early clinical development are exploratory of nature. Clear prognostic and predictive biomarker profiles for disease are not available and need to be explored to be identified for each patient population. As an example, we’ll have a look at the HIDMASTER trial design for biomarker identification and compound graduation throughout the platform.

All platform trials need to be thoroughly simulated, and simulations should be used as a tool to decide among design options. Simulations of platform trials gives the opportunity to investigate many scenarios including null scenario to establish overall type I error. We can evaluate bias estimation and sensitivity to patient withdrawals, missing data, enrolment rates/patterns, interim analysis timings, data access delays, data cleanliness, analysis delays, etc.

Simulations should also comprise decision operating characteristics to be able to make decisions on the design based on the objective of the trial: early stops of underperforming arms, early go for active arms, prioritise arms on emerging data or drawing insights from whole study data analysis.

Over time the trial learns about the disease, new endpoints, stratification biomarkers and prognostic vs predictive effects.

**6/11 - Richard Torkar, Software Engineering, Chalmers: Why do we encourage even more missingness when dealing with missing data?**

**7/11 - Krzysztof Bartoszek, Linköping University: Formulating adaptive hypotheses in multivariate phylogenetic comparative methods**

* after branching the traits evolve independently

* the distribution of the trait at time t, X(t), conditional on the ancestral value, X(s), at time s<t, is Gaussian with ** E[X(t) | X(s)] =

w(s,t) + F(s,t)X(s)

** Var[X(t) | X(s) ] = V(s,t),

where neither w(s,t), F(s,t), nor V(s,t) can depend on X(.) but may be further parametrized. Using the likelihood computational engine PCMBase [2, available on CRAN] the PCMFit [3, publicly available on GitHub] package allows for inference of models belonging to the GLInv family and furthermore allows for finding points of shifts between evolutionary regimes n the tree. What is particularly novel is that it allows not only for shifts between a model's parameters but for switches between different types of models within then GLInv family (e.g. a shift from a Brownian motion (BM) to an Ornstein-Uhlenbeck (OU) process and vice versa). Interactions between traits can be understood as magnitudes and signs of off-diagonal entries of F(s,t) or V(s,t). What is particularly interesting is that in this family of models one may obtain changes in the direction of the relationship, i.e. the long and short term joint dynamics can be of a different nature. This is possible even if one simplifies the process to an OU one. Here, one is able to very finely understand the dynamics of the process and propose specific model parameterizations [PCMFit and current CRAN version of mvSLOUCH, 1, which is based on PCMBase]. In the talk I will discuss how one can setup different hypotheses concerning relationships between the traits in terms of model parameters and how one can view the long and short term evolutionary dynamics. The software's possibilities will be illustrated by considering the evolution of fruit in the Ferula genus. I will also discuss some limit results that are amongst others, useful for setting initial seeds of the numerical estimation procedures.

A phylogenetic comparative method for studying multivariate adaptation.

J. Theor. Biol. 314:204-215, 2012.

[2] V. Mitov, K. Bartoszek, G. Asimomitis, T. Stadler. Fast likelihood calculation for multivariate phylogenetic comparative methods: The PCMBase R package. arXiv:1809.09014, 2018.

[3] V. Mitov, K. Bartoszek, T. Stadler. Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models. PNAS, 201813823, 2019.

**20/11 - Paul-Christian Bürkner, Aalto University: Bayesflow: Software assisted Bayesian workflow**

A principled Bayesian workflow consists of several steps from the design of the study, gathering of the data, model building, estimation, and validation, to the final conclusions about the effects under study. I want to present a concept for a software package that assists users in following a principled Bayesian workflow for their data analysis by diagnosing problems and giving recommendations for sensible next steps. This concept gives rise to a lot of interesting research questions we want to investigate in the upcoming years.

**27/11 - Geir Storvik, Oslo University: Flexible Bayesian Nonlinear Model Configuration**

This is joint work with Aliaksandr Hubin (Norwegian Computing Center) and Florian Frommlet (CEMSIIS, Medical University of Vienna)

**4/12 - Moritz Schauer, Chalmers/GU: Smoothing and inference for high dimensional diffusions**

We apply this to the problem of tracking convective cloud systems from satellite data with low time resolution.

**11/12 - Johannes Borgqvist, Chalmers/GU: The polarising world of Cdc42: the derivation and analysis of a quantitative reaction diffusion model of cell polarisation**

In this project, we develop a quantifiable model of cell polarisation accounting for the morphology of the cell. The model consists of a coupled system of PDEs, more specifically Reaction Diffusion equations, with two spatial domains: the cytosol and the cell membrane. In this setting, we prove sufficient conditions for pattern formation. Using a “Finite Element”-based numerical scheme, we simulate cell polarisation for these two domains. Further, we illustrate the impact of the parameters on the patterns that emerge and we estimate the time until polarization. Using this work as a starting point, it is possible to integrate data into the theoretical description of the process to deeper understand cell polarisation mechanistically.

## 2018

**24/5 - Erwan Koch (EPFL): Spatial risk measures induced by powers of max-stable random fields**

A meticulous assessment of the risk of extreme environmental events is of great necessity for populations, civil authorities as well as the insurance/reinsurance industry. Koch (2017, 2018) introduced a concept of spatial risk measure and a related set of axioms which are well-suited to analyse and quantify the risk due to events having a spatial extent, precisely such as natural disasters. In this paper, we first carry out a detailed study of the correlation (and covariance) structure of powers of the Smith and Brown-Resnick max-stable random fields. Then, using the latter results, we thoroughly investigate spatial risk measures associated with variance and induced by powers of max-stable random fields. In addition, we show that spatial risk measures associated with several classical risk measures and induced by such cost fields satisfy (at least) part of the previously mentioned axioms under appropriate conditions on the max-stable fields. Considering such cost fields is particularly relevant when studying the impact of extreme wind speeds on buildings and infrastructure.

https://arxiv.org/pdf/1804.05694.pdf

- Koch, E. (2017). Spatial risk measures and applications to max-stable processes. Extremes, 20(3):635-670.

- Koch, E. (2018). Spatial risk measures and rate of spatial diversification. Available at https://arxiv.org/abs/1803.07041

**6/9 - Lukas Käll (KTH Genteknologi, SciLifeLab): Distillation of label-free quantitative mass spectrometry data by clustering and Bayesian modeling**

Abstract: Protein quantification by label-free shotgun proteomics experiments is complicated by a multitude of error sources. Typical pipelines for identifying differentially expressed proteins use intermediate filters in an attempt to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favour of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We will also discuss a method, MaRaQuant, in which we reverse the typical processing workflow into a quantification-first approach. Specifically, we apply unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This ensures that no valuable information is discarded due to analytes missing identification thresholds and as well allows us to spend more effort on the identification process due to the data reduction achieved by clustering.

**14/9 - Alex Fletcher (University of Sheffield): Mathematical modelling and analysis of epithelial morphogenesis**

Abstract: The study of morphogenesis - the generation of biological shape and form - promises to shed light on a wide range of developmental defects and inform strategies for the artificial growth of organs. Recently, the experimental study of morphogenesis has thrived due to a rise in quantitative methods. The resulting avalanche of data motivates us to design quantitative hypotheses through mathematical models, make quantitative experimental predictions, devise methods for quantitative data analysis, and design methods for quantitative inference using models and data. In this talk, I describe our recent work on the integrative analysis of morphogenesis in epithelia, one of the major tissue types in animals. Focusing on a widely used cell-based model of epithelia, the vertex model, I discuss to what extent quantitative model predictions may be influenced by parameter values and implementation details. Next, I illustrate how such models can be used to help gain mechanistic insights into, and generate quantitative predictions on, morphogenetic processes such as tissue size control and axis extension. I then outline a method for estimating mechanical parameters of vertex models from imaging data and quantifying the uncertainty associated with such estimates. Together, these contributions help enable the quantitative study of epithelia for a wide range of applications.

**27/9 - Jukka Corander (Department of Biostatistics, University of Oslo): Resolving the mysteries of bacterial evolution by ultra-fast ABC inference.**

Abstract: DNA in bacteria is known to be a subject to multiple evolutionary forces, including mutations, homologous recombination and horizontal transfer of genes. Such changes may be beneficial, deleterious or selectively neutral. Several models have been proposed to explain the variation we see in the genomes of bacteria across natural populations, including ecotype and neutral models. In particular simple neutral models have been shown to have a surprisingly good fit to population surveys. However, in the light of most recent functional data we present conclusive evidence that both neutral and ecotype models provide poor explanations for the strong correlations discovered between accessory genome loci across multiple populations of Streptococcus pneumoniae, a major human pathogen. We introduce a mechanistic model of frequency-dependent selection operating via accessory genes which is able to accurately predict the changes to the composition of the populations following introduction of a vaccination campaign. Unrelated recent large-scale genome data from an E. coli population suggests that the frequency-dependent selection may be a common mechanism regulating the evolution of bacterial populations of many species. These modeling advances have been in practice enable by ultra-fast ABC inference based on Bayesian optimization, which can be up to 4 orders of magnitude faster than sequential population Monte Carlo. The general potential of this inference method is now harnessed by the new open-source software initiative ELFI, which offers automated parallelization and a flexible platform for algorithm developers.

https://www.biorxiv.org/content/early/2018/08/28/400374

http://jmlr.csail.mit.edu/papers/v17/15-017.html

http://jmlr.csail.mit.edu/papers/v19/17-374.html

**28/9 - Kenneth C. Millett (Department of Mathematics, University of California Santa Barbara): Knots and Links in Proteins**

Abstract: Some proteins contain important topological structures: knots, slipknots, and links as well as spatial graphs if one includes cysteine bonds. As a consequence, the geometrical and topological character of these spatial structures is of interest to mathematicians as well as molecular biologists, biochemists and biophysicists. We will describe characterizations of these spatial geometric and topological structures within proteins.

**4/10 - Marco Longfils: Single diffusing particles observed through a confocal microscope: an application of the doubly stochastic Poisson point process**

Abstract: Diffusing particles observed with a confocal laser scanning microscope give rise to a doubly stochastic Poisson point process. In particular, the photon detected by the microscope in one pixel follows a Poisson distribution with parameter that depends on the particle positions in space, which is modelled as a Poisson point process. Many techniques such as Fluorescence correlation spectroscopy, Raster image correlation spectroscopy and photon counting histograms have been developed to study molecular transport in cells and solution. All these techniques are based on the statistics of the photon detection process.

We show that the moments of the photon detection process can be computed in terms of physically relevant parameters such as the diffusion coefficient of the particles, their brightness and others. As a direct consequence, the statistical accuracy of the above mentioned techniques can be evaluated. Thus, we can relate the different experimental parameters that affects the photon detection process to the accuracy of each techniques, allowing us to optimally design an experiment.

**5/10 - Charlotte Hemelrijk (University of Groningen): Collective motion of flocks in relation to a predator**

Abstract: Many species of animals live in groups. This is supposed to protect them against predation. Yet when animals aggregate, they are easier to detect from a distance due to the larger mass of the group. Two evolutionary computational models of collective motion (including ours) show that grouping is advantageous for survival of prey only when the predator can be confused as to whom to attack. This confusion effect we also studied in an experimental design with human ‘predators’ when attacking starlings that flock in a computer simulation called StarDisplay. Humans appeared to become more confused whom to attack, the larger and denser flocks are.

Grouping animals also seem to protect themselves actively against attacks by displaying many patterns of collective escape in relation to the presence of predators, such as herd, ball, flash expansion, agitation wave. Using two computational models, we explain how some of them may arise. Asking when patterns of collective escape appear and whether they offer extra protection to groups of prey, we recorded them for flocks of starlings in Rome. It became clear that some are a direct reaction to an attack of the raptor, others already to its mere presence. Remarkably, in our empirical data the display of patterns of collective escape does not reduce the raptor’s catch success, leaving interesting questions concerning their emergence and asking for new methods of studying them.

**25/10 - Harri Lähdesmäki (Department of Computer Science at Aalto University School of Science): Non-parametric methods for learning continuous-time dynamical systems**

Abstract: Conventional differential equation models are parametric. However, for many complex/real-world systems it is practically impossible to determine parametric equations or interactions governing the underlying dynamics, rendering conventional models unpractical in many applications. To overcome this issue, we propose to use nonparametric models for differential equations by defining Gaussian process priors for the vector-field/drift and diffusion functions. We have developed statistical machine learning methods that can learn the underlying (arbitrary) ODE and SDE systems without prior knowledge. We formulate sensitivity equations for learning or use automatic differentiation with explicitly defined forward simulator for efficient model inference. Using simulated and real data, we demonstrate that our non-parametric methods can efficiently learn the underlying differential equation system, show the models' capabilities to infer unknown dynamics from sparse data, and to simulate the system forward into future. I will also highlight how our non-parametric models can learn stochastic differential equation transformations of inputs prior to a standard classification or regression function to implement state-of-the-art methods for continuous-time (infinitely) deep learning.

https://arxiv.org/abs/1803.04303

https://arxiv.org/abs/1807.05748

https://arxiv.org/abs/1810.04066

**1/11 - Petter Mostad (Chalmers): Error rates for unvalidated medical age assessment procedures**

Abstract: During 2014–2015, Sweden received asylum applications from more than 240,000 people of which more than 40,000 were termed unaccompanied minors. In a large number of cases, claims by asylum seekers of being below 18 years were not trusted by Swedish authorities. To handle the situation, the Swedish national board of forensic medicine (Rättsmedicinalverket, RMV) was assigned by the government to create a centralized system for medical age assessments.

RMV introduced a procedure including two biological age indicators; x-ray of the third molars and magnetic resonance imaging of the distal femoral epiphysis. In 2017, a total of 9617 males and 337 females were subjected to this procedure. No validation study for the procedure was however published, and the observed number of cases with different maturity combinations in teeth and femur were unexpected given the claims originally made by RMV. We present a general stochastic model enabling us to study which combinations of age indicator model parameters and age population profiles are consistent with the observed 2017 data for males. We find that, contrary to some RMV claims, maturity of the femur, as observed by RMV, appears on average well before maturity of teeth. According to our estimates, approximately 15% of the tested males were children. These children had an approximate 33% risk of being classified as adults. The corresponding risk for an adult to be misclassified as a child was approximately 7%.

We determine uncertainties and ranges of estimates under reasonable perturbations of the prior. https://rdcu.be/6PNI

**7/11 - Thomas Schön (Dept. of Information Technology, Uppsala University): Assembling stochastic quasi-Newton algorithms using Gaussian processes**

Abstract: In this talk I will focus on one of our recent developments where we show how the Gaussian process (GP) can be used to solve stochastic optimization problems. Our main motivation for studying these problems is that they arise when we are estimating unknown parameters in nonlinear state space models using sequential Monte Carlo (SMC). The very nature of this problem is such that we can only access the cost function (in this case the likelihood function) and its derivative via noisy observations, since there are no closed-form expressions available. Via SMC methods we can obtain unbiased estimates of the likelihood function. However, our development is fully general and hence applicable to any stochastic optimization problem. We start from the fact that many of the existing quasi-Newton algorithms can be formulated as learning algorithms, capable of learning local models of the cost functions. Inspired by this we can start assembling new stochastic quasi-Newton-type algorithms, applicable in situations where we only have access to noisy observations of the cost function and its derivatives. We will show how we can make use of the GP model to learn the Hessian allowing for efficient solution of these stochastic optimization problems. Additional motivation for studying the stochastic optimization problem stems from the fact that it arise in almost all large-scale supervised machine learning problems, not least in deep learning. I will very briefly mention some ongoing work where we have removed the GP representation and scale our ideas to much higher dimensions (both in terms of the size of the dataset and the number of unknown parameters).

**21/11 - Josef Wilzén: Physiological Gaussian Process Priors for the Hemodynamics in fMRI Analysis**

Abstract: Inference from fMRI data faces the challenge that the hemodynamic system, that relates the underlying neural activity to the observed BOLD fMRI signal, is not known. We propose a new Bayesian model for task fMRI data with the following features: (i) joint estimation of brain activity and the underlying hemodynamics, (ii) the hemodynamics is modelled nonparametrically with a Gaussian process (GP) prior guided by physiological information and (iii) the predicted BOLD is not necessarily generated by a linear time-invariant (LTI) system. We place a GP prior directly on the predicted BOLD time series, rather than on the hemodynamic response function as in previous literature. This allows us to incorporate physiological information via the GP prior mean in a flexible way. The prior mean function may be generated from a standard LTI system, based on a canonical hemodynamic response function, or a more elaborate physiological model such as the Balloon model. This gives us the nonparametric flexibility of the GP, but allows the posterior to fall back on the physiologically based prior when the data are weak. Results on simulated data show that even with an erroneous prior for the GP, the proposed model is still able to discriminate between active and non-active voxels in a satisfactory way. The proposed model is also applied to real fMRI data, where our Gaussian process model in several cases finds brain activity where a baseline model with fixed hemodynamics does not.

## 2017

**26/1, Sebastian Engelke, Ecole Polytechnique Fédérale de Lausanne: Robust bounds for multivariate extreme value distributions**

**2/2, Igor Rychlik: Spatio-temporal model for wind speed variability in Atlantic Ocean**

In Northern Atlantic wind speeds can be successfully modelled by means of a spatio-temporal transformed Gaussian field. Its dependence structure is localized by introduction of time and space dependent parameters in the field. However rarely occurring hurricanes in Caribbean region are missed by the model. A new model is presented that will cover both Northern Atlantic and the Caribbean region where hurricanes occur.

The model has the advantage of having a relatively small number of parameters. These parameters have natural physical interpretation and are statistically fitted to represent variability of observed wind speed in ERA Interim reanalysis data set.

Some validations and applications of the model will be presented. Rice’s method is employed to estimate the 100 years wind speed in some spatial region. This talk presents an ongoing research.

**16/2, Måns Magnusson, Linköping University: Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models**

**2/3, Henrike Häbel, Final seminar / Slutseminarium: Pairwise interaction in 3D – the interplay between repulsive and attractive forces**

**9/3, Ottmar Cronie, The second-order analysis of marked spatio-temporal point processes: applications to earthquake data**

**16/3,**

**Krzysztof Bartoszek, Uppsala University**

**, A punctuated stochastic model of adaptation**

**23/3, Artem Kaznatcheev, University of Oxford: The evolutionary games that cancers play and how to measure them**

**30/3, Robert Noble, ETH Basel: Evolution, ecology, and cancer risk: from naked mole rats to modern humans**

**6/4, John Wiedenhoeft, Rutgers University: Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression**

**18/5, Lloyd Demetrius, Harvard University, Max Planck Institut: An entropic selection principle in evolutionary processes**

*evolutionary entropy*, a statistical measure of the diversity of pathways of energy flow between these interacting elements.

(i) the evolution of life history

(ii) the origins and evolution of cooperation

(iii) the origin and propagation of age-related diseases.

*Physics Reports*, vol. 530 (2013)

**1/6, Daniel Nichol, The Institute of Cancer Research: Collateral Sensitivity is Contingent on the Repeatability of Evolution**

**8/6, Jie Yen Fan, Monash University:**

**Limit theorems for size, age, type, and type structure dependent populations**

**12/10, Aila Särkkä: Anisotropy analysis of spatial point patterns**

This seminar will give an overview of nonparametric methods for anisotropy analysis of (stationary) point processes. Methods based on nearest neighbour and second order summary statistics as well as spectral and wavelet analysis will be discussed. The techniques will be illustrated on both a clustered and a regular example. Finally, one of the methods will be used to estimate the deformation history in polar ice using the measured anisotropy of air inclusions from deep ice cores.

**14/12, David Bolin: The SPDE approach for Gaussian random fields with general smoothness**

## 2016

**21/1, Håvard Rue, NTNU: Penalising model component complexity: A principled practical**

**approach to constructing priors**

**26/1, Stephen Senn, Luxembourg Institute of Health:**

**P-values: The Problem is not What You Think**

**28/1, Ola Izyumtseva, Kiev State University: Self-intersection local time of Gaussian processes. New approach**

**2/2,**

**Erik-Jan Wagenmakers, University of Amsterdam**

**, A Predictive Perspective on Bayesian Inference**

Here I provide a predictive interpretation of Bayes inference, encompassing not only Bayesian model selection, but also Bayesian parameter estimation. This predictive interpretation supports a range of insights about the fundamental properties of learning and rational updating of knowledge.

**25/2, Marcin Lis, Planar Ising model as a gas of cycles**

**3/3, Krzysztof Podgorski, Lund University, Event based statistics for dynamical random fields**

Extreme events that are occurring on such a surface are random and of interest for practitioners - ocean engineers are interested in large waves and damage they may cause to an oil platform or to a ship. Thus data on the ocean surface elevation are constantly collected by system of buoys, ship- or air-borne devices, and satellites all around the globe. These vast data require statistical analysis to answer important questions about random events of interest. For example, one can ask about statistical distribution of wave sizes, in particular, how distributed large waves are or how steep they are. Waves often travel in groups and a group of waves typically causes more damage to a structure or a ship than an individual wave even if the latter is bigger than each one in the group. So one can be interested in how many waves there is per group or how fast groups are travelling in comparison to individual waves.

The methodology initially was applied to Gaussian models but in fact, it is also valid for quite general dynamically evolving stochastic surfaces. In particular, it is discussed how sampling distributions for non-Gaussian processes can be obtained through

Slepian models that describe the distributional form of a stochastic process observed at level crossings of a random process. This is used for efficient simulations of the behaviour of a random processes sampled at crossings of a non-Gaussian moving average process. It is observed that the behaviour of the process at high level crossings is fundamentally different from that in the Gaussian case, which is in line with some recent theoretical results on the subject.

**10/3,**

**Bengt Johannesson, Volvo, Design Strategies of Test Codes for Durability Requirement of Disk Brakes in Truck Application**

**22/3, Hermann Thorisson, University of Iceland, Palm Theory and Shift-Coupling**

**14/4, Omiros Papaspiliopoulos, Universitat Pompeu Fabra, stochastic processes for learning and uncertainty quantification**

and computational efficiency.

**21/4, Alexandre Antonelli, Professor in Systematics and Biodiversity, Dept of Biological and Environmental Sciences, University of Gothenburg: Hundreds of millions of DNA sequences and species observations: Challenges for synthesizing biological knowledge**

**28/4, Peter J Diggle, CHICAS, Lancaster Medical School, Lancaster University: Model-Based Geostatistics for Prevalence Mapping in Low-Resource Settings**

low-resource settings (with Discussion). Journal of the American Statistical Association

(to appear).

**12/5, Ute Hahn, Aarhus University: Monte Carlo envelope tests for high dimensional data or curves**

**26/5, Martin Foster, University of York: A Bayesian decision-theoretic model of sequential experimentation with delayed response**

**1/9, Fima Klebaner, Monash University: On the use of Sobolev spaces in limit theorems for the age of population**

**15/9, Joseba Dalmau, Université Paris-Sud: The distribution of the quasispecies**

**22/9, Sophie Hautphenne, EPFL: A pathwise iterative approach to the extinction of branching processes with countably many types**

**29/9, Umberto Picchini, Lund University: A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for inference in complex models**

**6/10, Jakob Björnberg: Random loop models on trees**

**20/10, Georg Lindgren, Lund University: Stochastic properties of optical black holes**

The statistical properties near phase singularities in a complex wavefield is described as the conditional distributions of the real and imaginary Gaussian components, given a common zero crossing point. The exact distribution is expressed as a Slepian model, where a regression term provides the main structure, with parameters given by the gradients of the Gaussian components at the singularity, and Gaussian non-stationary residuals that provide

local variability. This technique differs from the linearization (Taylor expansion) technique commonly used.

The empirically and theoretically verified elliptic eccentricity of the intensity contours in the vortex core is a property of the regression term, but with different normalization compared to the classical theory. The residual term models the statistical variability around these ellipses. The radii of the circular contours of the current magnitude are similarly modified by the new regression expansion and also here the random deviations are modelled by the residual field.

**27/10, Daniel Ahlberg, IMPA and Uppsala University: Random coalescing geodesics in first-passage percolation**

**3/11, KaYin Leung, Stockholm University, Dangerous connections: the spread of infectious diseases on dynamic networks**

This talk is based on joint work with Odo Diekmann and Mirjam Kretzschmar (Utrecht, The Netherlands)

**24/11, Yi He, Tilburg University: Asymptotics for Extreme Depth-based Quantile Region Estimation**

**1/12, Marianne Månsson: Statistical methodology and challenges in the area of prostate cancer screening**

**8/12, Holger Rootzén: Human life is unbounded -- but short**

**15/12, Martin Schlather, University of Mannheim, Simulation of Max-Stable Random Fields**

## 2015

**29/1, Trifon Missov, Max Planck Institute for Demography, Stochastic Models in Mortality Research: Recent Advancements and Applications**

**12/1, Sergei Zuyev, Optimal sampling of stochastic processes via measure optimisation technique**

Abstract. Let W(t) be a continuous non-stationary stochastic process on [0,1] which can be observed at times T=(t_0 < t_1 < ... t_n) giving rise to a random vector W=(W(t_1),...,W(t_n)). The question we address is how to choose the sampling times T in such a way that the linear spline constructed through the points (T,W) deviates as little as possible from the trajectory (W(t), t in [0,1])? Namely, the average L_2 distance between the paths is minimised. The answer depends on the smoothness coefficient a(t), meaning that the average increment E|W(t+s)-W(t)| behaves like |s|^a(t) for small s. The local variant of the problem for a monotone a(t) was addressed previously by Hashorva, Lifshits and Seleznjev. By using the variation technique on measures we are able to extend the known results and potentially to attack multi-dimensional case of optimal sampling of random fields.

**19/2, Alexey Lindo and Serik Sagitov, A special family of Galton-Watson processes with explosions**

Abstract: The linear-fractional Galton-Watson processes is a well known case when many characteristics of a branching process can be computed explicitly. In this paper we extend the two-parameter linear-fractional family to a much richer four-parameter family of reproduction laws. The corresponding Galton-Watson processes also allow for explicit calculations, now with possibility for infinite mean, or even infinite number of offspring. We study the properties of this special family of branching processes, and show, in particular, that in some explosive cases the time to explosion can be approximated by the Gumbel distribution.

**26/2,**

**Jean-Baptiste Gouéré, Université d'Orléans**

**, Continuum percolation on R^d**

Abstract:

We consider the Boolean model on R^d. This is the union of i.i.d. random Euclidean balls centered at points of an homogeneous Poisson point process on R^d. Choose the intensity of the Poisson point process so that the Boolean model is critical for percolation. In other words, if we lower the intensity then all the connected components of the Boolean model are bounded, while if we increase the intensity then there exists one unbounded component. We are interested in the volumetric proportion of R^d which is covered by this critical Boolean model. This critical volumetric proportion is a function of the dimension d and of the common distribution of the radii. We aim to study this function.

**12/3, Daniel Simpson, Norwegian University of Science and Technology: With low power comes great responsibility: challenges in modern spatial data analysis**

Abstract:

Like other fields in statistics, spatial data analysis has undergone its own "big data" revolution. Over the last decade, this has resulted in new approximate algorithms and new approximate models being used to fit ever more complicated data. There is a particular role in this revolution for model-based statistics and, in particular, Bayesian analysis.

The trouble is that as both the data and the models expand, we can end up with complex, unidentifiable, hierarchical, unobserved nightmares. Hence we are starting to seriously ask the question "What can we responsibly say about this data?".

In this talk, I will go nowhere near answering this fundamental question, but I will provide a clutch of partial answers to simpler problems. In particular, I will outline the trade-offs that need to be considered when building approximate spatial models; the incorporation of weak expert knowledge into priors on the hyper-parameters of spatial models; the dangers of flexible non-stationarity; and the role of prior choice in interpreting posteriors.

This is joint work with Geir-Arne Fuglstad, Sigrunn Sørbye, Janine Illian, Finn Lindgren, and Håvard Rue.

**19/3,**

**Emmanuel Schertzer, Sorbonne, France**

**, The contour process of Crump-Mode-Jagers trees**

Abstract: The genealogy of a (planar) Galton-Watson branching process is encoded by its contour path, which is obtained by recording the height of an exploration particle running along the edges of the tree from left to right.

**23/4,**

**David Dereudre, Université Lille-1, Consistency of likelihood estimation for Gibbs point processes**

Abstract: We prove the strong consistency of the maximum likelihood estimator (MLE) for parametric Gibbs point process models. The setting is very general and includes pairwise pair potentials, finite and infinite multibody interactions and geometrical interactions, where the range can be finite or infinite. Moreover the Gibbs interaction may depend linearly or non-linearly on the parameters, a particular case being hardcore parameters and interaction range parameters. As important examples, we deduce the consistency of the MLE for all parameters of the Strauss model, the hardcore Strauss model, the Lennard-Jones model and the area-interaction model.

**7/5, Olle Häggström, The current debate on p-values and null hypothesis significance testing**

Abstract: The use of p-values and null hypothesis significance testing has been under attack in recent years from practitioners of statistics in various disciplines. One highlight is the publication in 2008 of "The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives" by Stephen Ziliak and Deirdre McCloskey. Another is the ban of p-values from the journal Basic and Applied Social Psychology that its editors announced in February 2015. I will review (parts of) this debate, and stress how important I think it is that we, as statisticians, take part in it.

**21/5, Jürgen Potthoff, University of Mannheim, Sample Properties of Random Fields**

*Kolmogorov–Chentsov*–theorem is presented, which provides sufficient criteria for the existence of a (Hölder) continuous modification of a random field, which is indexed by a metric space admitting certain separability properties. For random fields on an open subset of the

*d*–dimensional euclidean space sufficient criteria are presented which guarantee the existence of a sample differentiable modification. If time permits, results concerning the existence of separable and/or measurable modifications are mentioned.

**28/5, Ingemar Kaj, Uppsala universitet, The Poisson random field site frequency spectrum**

**11/6, Fima Klebaner, Monash University, Limit theorems for age distribution in populations with high carrying capacity**

Joint work with Fan, Hamza (Monash) and Jagers (Chalmers).

**23/6, Carmen Minuesa, University of Extremadura, Robust estimation for Controlled Branching Processes**

The aim of this work is to consider the estimation of the underlying offspring parameters via disparities, assuming that the offspring distribution belongs to a general parametric family.

From a frequentist viewpoint, we obtain the minimum disparity estimators under three possible samples: given the entire family tree up to a certain generation, given the total number of individuals and progenitors in each generation, and given only the population sizes and we examine their asymptotic and robustness properties.

From a Bayesian outlook, we develop an analogous procedure which provides robust Bayesian estimators of the offspring parameter through the use of disparities. The method consists of replacing the log likelihood with an appropriately scaled disparity in the expression of the posterior distribution. For the estimators associated to the resulting distribution, we study their asymptotic properties.

Finally, we illustrate the accuracy of the proposed methods by the way of simulated examples developed with the statistical software R.

**8/9,**

**K. Borovkov, The University of Melbourne, On the asymptotic behaviour of a dynamic version of the Neyman contagious point process**

We consider a dynamic version of the Neyman contagious point process that can be used for modelling the spatial dynamics of biological populations, including species invasion scenarios. Starting with an arbitrary finite initial configuration of points in R^d with nonnegative weights, at each time step a point is chosen at random from the process according to the distribution with probabilities proportional to the points' weights. Then a finite random number of new points is added to the process, each displaced from the location of the chosen "mother" point by a random vector and assigned a random weight. Under broad conditions on the sequences of the numbers of newly added points, their weights and displacement vectors (which include a random environments setup), we derive the asymptotic behaviour of the locations of the points added to the process at time step n and also that of the scaled mean measure of the point process after time step n-->oo.

**24/9, Laurent Decreusefond, ENST, France, Distances between point processes**

**1/10, Tailen Hsing, Analysing Spatial Data Locally**

Abstract: Stationarity is a common assumption in spatial statistics. The justification is often that stationarity is a reasonable approximation if data are collected "locally." In this talk we first review various known approaches for modeling nonstationary spatial data. We then examine the notion of local stationarity in more detail. In particular, we will consider a nonstationary model whose covariance behaves like the Matern covariance locally and an inference approach for that model based on gridded data.

**8/10, Evsey Morozov, Inst. Applied Math. Research, Russia, Stability analysis of regenerative queues: some recent results**

**15/10, Johan Lindström, Lund University: Seasonally Non-stationary Smoothing Splines: Post-processing of Satellite data**

Abstract: Post-processing of satellite remote sensing data is often done to reduce noise and remove artefacts due to atmospheric (and other) disturbances. Here we focus specifically on satellite derived vegetation indices which are used for large scale monitoring of vegetation cover, plant health, and plant phenology. These indices often exhibit strong seasonal patterns, where rapid changes during spring and fall contrast to relatively stable behaviour during the summer and winter season. The goal of the post-processing is to extract smooth seasonal curves that describe how the vegetation varies during the year. This is however complicated by missing data and observations with large biases.

**20/10,**

**Janine B Illian,**

**University of St Andrews and NTNU Trondheim**

**:**

**Spatial point processes in the modern world – an interdisciplinary dialogue**

**2**

**2/10, Sach Mukherjee, German Center for Neurodegenerative Diseases (DZNE): High-dimensional statistics for personalized medicine**

29/10: Peter Olofsson, Trinity University: A stochastic model of speciation through Bateson-Dobzhansky-Muller incompatibilitiesAbstract: Speciation is characterized by the development of reproductive isolating barriers between diverging groups. Intrinsic post-zygotic barriers of the type envisioned by Bateson, Dobzhansky, and Muller are deleterious interactions among loci that reduce hybrid fitness, leading to reproductive isolation. The first stochastic model of the development of these barriers was published by Orr in 1995. We generalize Orr's model by incorporating finite protein–protein interaction networks and by allowing for different fixation rates at different loci. Formulas for the speciation probability and the expected time until speciation are established.
5/11, Murray Pollock, University of Warwick: A New Unbiased and Scalable Monte Carlo Method for Bayesian InferenceAbstract: This talk will introduce novel methodology for exploring posterior distributions by modifying methodology for exactly (without error) simulating diffusion sample paths – the Scalable Langevin Exact Algorithm (ScaLE). This new method has remarkably good scalability properties (among other interesting properties) as the size of the data set increases (it has sub-linear cost, and potentially no cost), and therefore is a natural candidate for “Big Data” inference.
Joint work with Paul Fearnhead (Lancaster), Adam Johansen (Warwick) and Gareth Roberts (Warwick).12/11, Jimmy Olsson, KTH, Efficient particle-based online smoothing in general state-space hidden Markov models: the PaRIS algorithmAbstract: This talk discusses a novel algorithm, the particle-based, rapid incremental smoother (PaRIS), for efficient online approximation of smoothed expectations of additive state functionals in general hidden Markov models. The algorithm, which has a linear computational complexity under weak assumptions and very limited memory requirements, is furnished with a number of convergence results, including a central limit theorem. An interesting feature of PaRIS, which samples on-the-fly from the retrospective dynamics induced by the particle filter, is that it requires two or more backward draws per particle in order to cope with degeneracy of the sampled trajectories and to stay numerically stable in the long run with an asymptotic variance that grows only linearly with time. |

19/1

19/1

**1, Patri**k Albin:**On Extreme Value Theory for Group Stationary Gaussian**

**Processes**

**26/11, Youri K. Belyaev, Umeå University, The Hybrid Moments-Distance method for clustering observations with a mixture of two symmetrical distributions**

Abstract:

Clustering cancer patients based on high-dimensional gene expression data are essential in discovering new subtypes of cancer. Here we present a novel univariate clustering approach that can be used for variable selection in high-dimensional clustering problems.

We observe gene expression data on one gene and n patients, where the jth patient has cancer of type 1 (tj =1) or type 2 (tj =2). The aim is to predict the unobservable list of types {t1,...,tn}.

Here {t1,...,tn} are values of i.i.d. random variables {T1,...,Tn} such that P[Tj =1]=w1, P[Tj =2]=w2 and w1+ w2=1. The gene expression data {x1,…,xn} are observations of i.i.d. random variables {X1,…,Xn}, where Xj has distribution F1 if tj=1 and F2 if tj=2, j=1,…,n. We assume that F1 and F2 are symmetrical distributions parameterized by their means (m1 and m2) and variances (v1 and v2). Thereby we have a statistical model with mixture of two symmetrical distributions with five unknown parameters {w1, m1, v1, m2, v2}. Consistent estimates of all 5 parameters can be found by using the recursive EM-algorithm and the responsibilities {q1(x1),…,qn(xn)} obtained via the estimated parameters can be used to predict the patients’ cancer types {t1,...,tn}. However, the EM-algorithm is sensitive to distribution assumptions that deviates from the real distributions F1, F2 and on the starting point in the recursion.

We propose an alternative method, the hybrid moment-distance (HMD) method, where the observations {x1,…,xn} are used for estimation of the first three moments. These moment estimates are used to reduce the dimensional space of parameters from 5 to 3. The optimal parameters within the lower space are obtained by considering the distance between the empirical distribution and the fitted parametric distributions. Responsibilities {q1(x1),…,qn(xn)}, obtained via the HMD-method’s estimated parameters, are used to predict the patients’ cancer types. Note that the patient´s q-value is the estimated probability that the patient has cancer of certain type.

An extensive simulation study showed that the HMD-algorithm outperformed the EM-algorithm with respect to clustering their performance. The HMD-method was flexible and performed well also under very imprecise model assumptions, which suggest that it is robust and well suited for real problems.

**10/12:**

**Anna-Kaisa Ylitalo, University of Jyväskylä**

**:**

**Eye movements during music reading - A generalized estimating equation approach**

## 2014

**30/1, Andrea Ghiglietti, Milan University**

**: A two-colour Randomly Reinforced Urn design targeting fixed allocations**

There are many experimental designs for clinical trials, in which the proportion of patients allocated to treatments converges to a fixed value. Some of these procedures are response-adaptive and the limiting allocation proportion can depend on treatment behaviours. This property makes these designs very attractive because they are able to achieve two simultaneous goals: (a) collecting evidence to determine the superior treatment, and (b) increasing the allocation of units to the superior treatment. We focus on a particular class of adaptive designs, described in terms of urn models which are randomly reinforced and characterized by a diagonal mean replacement matrix, called Randomly Reinforced Urn (RRU) designs. They usually present a probability to allocate units to the best treatment that converges to one as the sample size increases. Hence, many asymptotic desirable properties concerning designs that target a proportion in (0,1) are not straightforwardly fulfilled by these procedures. Then, we construct a modified RRU model which is able to target any asymptotic allocations in (0,1) fixed in advance. We prove the almost sure convergence of the urn proportion and of the proportion of colours sampled by the urn. We are able to compute the exact rate of convergence of the urn proportion and to characterize the limiting distribution. We also focus on the inferential aspects concerning this urn design. We consider different statistical tests, based either on adaptive estimators of the unknown means or on the urn proportion. Suitable statistics are introduced and studied to test the hypothesis on treatment difference.

**6/2, David Bolin, Chalmers: Multivariate latent Gaussian random field mixture models**

Abstract: A novel class of models is introduced, with potential areas of application ranging from land-use classification to brain imaging and geostatistics. The model class, denoted latent Gaussian random filed mixture models, combines the Markov random field mixture model with latent Gaussian random field models. The latent model, which is observed under measurement noise, is defined as a mixture of several, possible multivariate, Gaussian random fields. Which of the fields that is observed at each location is modelled using a discrete Markov random field. In order to use the model class for massive data sets that arises in many possible areas of application, such as brain imaging, a computationally efficient parameter estimation method is required. Such an estimation method, based on a stochastic gradient algorithm, is developed and the model is tested on a magnetic resonance imaging application.

**13/2, Ege Rubak, Aalborg University, Denmark: Determinantal point processes - statistical modeling and inference**

Time permitting, I will end the talk with a brief demonstration of how recent developments allow us to extend the software to handle stationary DPPs on a sphere (e.g. the surface of Earth).

The main part of the work has been carried out in collaboration with Jesper Möller from Aalborg University and Frederic Lavancier from Nantes University, while the final part concerning DPPs on spheres is an ongoing collaboration which also includes Morten Nielsen (Aalborg University).

**20/2, Anthony Metcalfe, KTH, Universality classes of lozenge tilings of a polyhedron**

Abstract: A regular hexagon can be tiled with lozenges of three different orientations. Letting the hexagon have sides of length n, and the lozenges have sides of length 1, we can consider the asymptotic behaviour of a typical tiling as n increases. Typically, near the corners of the hexagon there are regions of "frozen" tiles, and there is a "disordered" region in the centre which is approximately circular.

More generally one can consider lozenge tilings of polyhedra with more complex boundary conditions. In this talk we use steepest descent analysis to examine the local asymptotic behaviour of tiles in various regions. Tiles near the boundary of the equivalent "frozen" and "disordered" regions are of particular interest, and we give necessary conditions under which such tiles behave asymptotically like a determinantal random point field with the Airy kernel. We also classify necessary conditions that lead to other asymptotic behaviours, and examine the global asymptotic behaviour of the system by considering the geometric implications of these conditions.

**27/2, Sergei Zuyev, Chalmers: Discussion seminar: Probing harmony with algebra (or attractiveness with statistics)**

**6/3, Tuomas Rajala, Chalmers: Denoising polar ice data using point pattern statistics**

Abstract: Point pattern statistics analyses point configurations suspended in 2- and 3 dimensional volumes of continuous material or space. An example is given by the bubble patterns within polar ice samples, drilled from the ice sheets of Antarctica and Greenland in order to study the climate conditions of the past. The problem with the ice data is that the original configuration of bubbles is overlaid with artefacts that appear during the extraction, transit and storage of the physical samples. This talk will discuss the problem together with some ideas for removing the artefacts.

**13/3, Pierre Nyqvist, KTH: Importance sampling through a min-max representation for viscosity solutions to Hamilton-Jacobi equations**

Abstract:

In applied probability, a lot of effort has been put into the design of efficient simulation algorithm for problems where the standard Monte Carlo algorithm, for various reasons, becomes too inefficient for practical purposes. This happens particularly in the rare-event setting, in which poor precision and/or a high computational cost renders the algorithm virtually useless. As a remedy, different techniques for variance reduction have been developed, such as importance sampling, interacting particle systems and multi-level splitting, MCMC techniques etc.

The focus of this talk will be importance sampling. One way to design efficient algorithms, first discovered by Dupuis and Wang, is the so-called subsolution approach: the sampling algorithm is based on a subsolution to a (problem-specific) Hamilton-Jacobi equation. The aim of the talk will be two-fold: First, to discuss the connections between importance sampling, large deviations and Hamilton-Jacobi equations. Second, to present a recent result of ours that concerns viscosity solutions to Hamilton-Jacobi equations and which enables the construction of efficient algorithms. Given time, the method will be illustrated with an example in the small diffusion setting (the Freidlin-Wentzell theory of large deviations).

The talk is based on joint work with Henrik Hult and Boualem Djehiche. It is as much an overview of the subsolution approach as a presentation of our results. In particular, it will encompass the talk that Professor Djehiche gave in November as well as discuss the relevant background.

**20/3, Arvind Singh, Orsay: Localization of a vertex-reinforced random walk on Z**

Abstract:

We consider the model of the vertex-reinforced random walk on the integer Lattice. Roughly speaking, it is a process which moves, at each unit of time, toward a neighbouring vertex with a probability proportional to a function of the time already spent at that site. When the reinforcement function is linear, Pemantle and Volkov showed that the walk visits only finitely many sites. This result was subsequently improved by Tarrès who showed that the walk get stuck on exactly 5 sites almost surely. In this talk, we discuss the case of sub-linear and super-linear reinforcement weights and show that a wide range of localization patterns may occur.

**8/4, Aernout van Enter, Groningen, The Netherlands: Bootstrap percolation, the role of anisotropy: Questions, some answers and applications**

Abstract:

Bootstrap percolation models describe growth processes, in which in a metastable situation nucleation occurs from the creation of some kind of critical droplet.

Such droplets are rare, but once they appear, they grow to cover the whole of space. The occurrence of such critical droplets in large volumes is ruled by asymptotic probabilities. We discuss how the scaling of these probabilities with the volume is modified in the presence of anisotropy. Moreover we discuss why numerics have

rather bad track record in the subject. This is based on joint work with Tim Hulshof, Hugo Duminil-Copin, Rob Morris and Anne Fey.

**10/4,**

**Rasmus Waagepetersen, Department of Mathematical Sciences, Aalborg University: Quasi-likelihood for spatial point processes**

Fitting regression models for intensity functions of spatial point processes is of great interest in ecological and epidemiological studies of association between spatially referenced events and geographical or environmental covariates. When Cox or cluster process models are used to accommodate clustering not accounted for by the available covariates, likelihood based inference becomes computationally cumbersome due to the complicated nature of the likelihood function and the associated score function. It is therefore of interest to consider alternative more easily computable estimating functions. We derive the optimal estimating function in a class of first-order estimating functions. The optimal estimating function depends on the solution of a certain Fredholm integral

equation which in practise is solved numerically. The derivation of the optimal estimating function has close similarities to the derivation of quasi-likelihood for standard data sets. The approximate solution is further equivalent to a quasi-likelihood score for binary spatial data. We therefore use the term quasi-likelihood for our optimal estimating function approach. We demonstrate in a simulation study and a data example that our quasi-likelihood method for spatial point processes is both statistically and computationally efficient.

**24/4, Oleg Seleznjev, Umeå University: Linear approximation of random processes with variable smoothness**

We consider the problem of approximation of a locally stationary random process with a variable smoothness index defined on an interval. An example of such function is a multifractional Brownian motion, which is an extension of the fractional Brownian motion with path regularity varying in time. Probabilistic models based on the locally stationary random processes with variable smoothness became recently an object of interest for applications in various areas (e.g., Internet traffic, financial records, natural landscapes) due to their flexibility for matching local regularity properties, e.g., [3]. Approximation of continuous and smooth random functions with unique singularity point is studied in [1].

Hermite splines.

[1] Abramowicz, K. and Seleznjev, O. (2011). Spline approximation of a random process with singularity. J. Statist. Plann. Inference 141, 1333–1342.

[2] Hashorva, E., Lifshits, M., and Seleznjev, O. (2012). Approximation of a random process with variable smoothness. ArXiv:1206.1251v1.

[3] Echelard, A., Lévy Véhel, J., Barriére, O. (2010). Terrain modeling with multifractional Brownian motion and self-regulating processes. In: Computer Vision and Graphics. LNCS, 6374, Springer, Berlin, 342–351.

**20/5, Simo Särkkä, Dept. of Biomedical Engineering and Computational Science, Aalto University, Finland: > Theory and Practice of Particle Filtering for State Space Models**

The aim of this talk is to give an introduction to particle filtering, which refers to a powerful class of sequential Monte Carlo methods for Bayesian inference in state space models. Particle filters can be seen as approximate optimal (Bayesian) filtering methods which can be used to produce an accurate estimate of the state of a time-varying system based on multiple observational inputs (data). Interest in these methods has exploded in recent years, with numerous applications emerging in fields such as navigation, aerospace engineering, telecommunications and medicine. Smartphones have also created a recent demand for this kind of sophisticated sensor fusion and non-linear multichannel signal processing methods, as they provide a wide range of motion and environmental sensors together with the computational power to run the methods in real time. The aim of this talk is to provide an introduction to particle filtering in theoretical and algorithmic level as well as to outline the main results in analysis of convergence of particle filters.

**15/5, Marie-Colette van Lieshaut, CWI, The Netherlands: A Spectral Mean for Point Sampled Closed Curves**

Abstract:

We propose a spectral mean for closed curves described by sample points on its boundary subject to misalignment and noise. First, we ignore misalignment and derive maximum likelihood estimators of the model and noise parameters in the Fourier domain. We estimate the unknown curve by back-transformation and derive the distribution of the integrated squared error. Then, we model misalignment by means of a shifted parametric diffeomorphism and minimise a suitable objective function simultaneously over the unknown curve and the misalignment parameters. Finally, the method is illustrated on simulated data as well as on photographs of Lake Tana taken by astronauts during a Shuttle mission.

**27/5, Nanny Wermuth, Johannes Gutenberg-University, Mainz, Traceable regressions: general properties and some special cases**

In this lecture, I discuss properties of corresponding distributions that are needed to read off the graph all implied independences, as well as the additional properties that permit similar conclusions about dependences. Some data analyses are shown and some results are discussed for star graphs, a very special type of graph.

**12/6, F. C. Klebaner, Monas University, Melbourne: When is a Stochastic Exponential of a Martingale a true Martingale?**

Abstract:

The question "When is a Stochastic Exponential E(M) of a Martingale M a true Martingale?" is important in financial mathematics. The best known sufficient condition is due to Novikov, and another one due to Kazamaki. Here we give another condition, which is essentially a linear growth condition on the parameters of the original martingale M. These conditions generalize Benes' idea, but the proofs use a different approach. They are applicable when Novikov's or Kazamaki conditions do not apply. Our approach works for processes with jumps, as well as non-Markov processes. This is joint work with Robert Liptser.

**2/9, Pavel Grabarnik, Laboratory of Ecosystems Modeling, the Russian Academy of Sciences: Spatial complexity of ecosystems: testing models for spatial point patterns**

Abstract:

Goodness-of-fit tests play a fundamental role in ecological statistics and modeling. Testing statistical hypotheses is an important step in building models. Often it is checked whether the data deviate significantly from a null model. In spatial point pattern analysis, typical null models are complete spatial randomness, independent marking or some fitted model. Unlike in classical statistics, where null models are usually represented by a single hypothesis, the hypotheses in spatial statistics have a spatial dimension and therefore a multiple character.

The classical device to overcome the multiple comparison problem in testing a spatial hypothesis is the deviation test, which summarizing differences between an empirical test function and its expectation under the null hypothesis, which depend on a distance variable. Another test is based on simulation envelopes, where a data functional statistic is inspected for a range of distances simultaneously. It was noted that type I error probability, when testing over an interval of distances, exceeds that for individual scales heavily, and therefore, the conventional pointwise simulation envelope test cannot be recommended as a rigorous statistical tool.

To overcome this drawback the refined envelope test was proposed in (Grabarnik et al., 2011) and developed further in a recent work (Myllymaki et al.,2013). It is a procedure where the global type I error probability is evaluated by simulation and taken into account in making conclusions. In this way, it becomes a valuable tool both for statistical inference and for understanding the reasons of possible rejections of the tested hypothesis.

A problem related to testing a goodness-of-fit of fitted models is that the test may be extremely conservative. The remedy is the procedure proposed by Dao and Genton (2013). Based on their idea we suggest a way how to adjust envelopes to make the empirical type I error equal to the nominal one.

We illustrate the applicability of the tests by examples from forest ecology.

References.

Dao, N. A., & Genton, M. G. (2013). A Monte Carlo adjusted goodness-of-fit test for parametric models describing spatial point patterns. Journal of Computational and Graphical Statistics, 23, 497-517.

Grabarnik, P., Myllymäki, M. Stoyan, D. (2011). Correct testing of mark independence for marked point patterns. Ecological Modelling 222, 3888-3894.

Myllymäki, M., Mrkvicka, T., Seijo, H., Grabarnik, P. (2013). Global envelope tests for spatial processes. arXiv preprint arXiv:1307.0239.

**18/9, Jean-François Coeurjolly, LJK, Grenoble, Stein's estimation of the intensity of a stationary spatial Poisson point process**

Abstract:

We revisit the problem of estimating the intensity parameter of a homogeneous Poisson point process observed in a bounded window of Rd making use of a (now) old idea of James and Stein. For this, we prove an integration by parts formula for functionals defined on the Poisson space. This formula extends the one obtained by Privault and Réveillac (Statistical inference for Stochastic Processes, 2009) in the one-dimensional case. As in Privault and Réveillac, this formula is adapted to a notion of gradient of a Poisson functional satisfying the chain rule, which is the key ingredient to propose new estimators able to outperform the maximum likelihood estimator (MLE) in terms of the mean squared error.

The new estimators can be viewed as biased versions of the MLE but with a well--constructed bias, which reduces the variance. We study a large class of examples and show that with a controlled probability the corresponding estimator outperforms the MLE. We will illustrate in a simulation study that for very reasonable practical cases (like an intensity of 10 or 20 of a Poisson point process observed in the euclidean ball of dimension between 1 and 5) we can obtain a relative (mean squared error) gain of 20% of the Stein estimator with respect to the maximum likelihood.

This is a joint work with M. Clausel and J. Lelong (Univ. Grenoble).

**2/10,**

**Vadim Shcherbakov, Royal Holloway, University of London, Long term behaviour of locally interacting birth-and-death processes**

Abstract:

In this talk paper we consider the long-term evolution of a finite system of locally interacting birth-and-death processes labelled by vertices of a finite connected graph. A partial description of the asymptotic behaviour in the case of general graphs is given and the cases of both constant vertex degree graphs and star graphs are considered in more details. The model is motivated by modelling interactions between populations, adsorption-desorption processes and is related to interacting particle systems, Gibbs models with unbounded spins, as well as urn models with interaction. Based on joint work with Stanislav Volkov (Lund University).

**16/10, Peter Guttorp, University of Washington, USA, Comparing regional climate models to weather data**

Climate models do not model weather, and there is no way to collect climate data. From a statistical point of view we can define climate as the distribution of weather. That allows us to compare the distribution of output from historical climate model runs (over time and space) to the distribution of weather observations (also over time and space). This type of comparison is made for extreme temperatures at a single site and over a network of sites in Sweden, as well as for precipitation over Norway. The observed temperature distribution can be well described by the output from a regional climate model, but Norwegian precipitation needs to be corrected in order to achieve any reasonable agreement.

**23/10, Tomasz Kozubowski, University of Nevada, USA, Certain bivariate distributions and random processes connected with maxima and minima**

It is well-known that [S(x)]n and [F(x)]n are the survival function and the distribution function of the minimum and the maximum of n independent, identically distributed random variables, where S and F are their common survival and distribution functions, respectively. These two extreme order statistics play important role in countless applications, and are the central and well-studied objects of extreme value theory. In this work we provide stochastic representations for the quantities [S(x)]α and [F(x)]α, where α > 0 is no longer an integer, and construct a bivariate model with these margins. Our constructions and representations involve maxima and minima with a random number of terms. We also discuss generalisations to random process and further extensions. This research was carried jointly with K. Podgorski.

**6/11, Torgny Lindvall, Chalmers, On coupling of certain Markov processes**

Abstract:

The coupling method is particularly powerful when it comes to birth and death processes and diffusions, e.g. We present applications of the method for ergodicity and stochastic monotonicity of such processes, in one and several dimensions.

**13/11, Giacomo Zanella, University of Warwick, UK, Bayesian complementary clustering, MCMC and Anglo-Saxon placenames**

Abstract: Common cluster models for multi-type point processes model the aggregation of points of the same type. In complete contrast, in the study of Anglo-Saxon settlements it is hypothesized that administrative clusters involving complementary names tend to appear. We investigate the evidence for such an hypothesis by developing a Bayesian Random Partition Model based on clusters formed by points of different types (complementary clustering).

As a result we obtain an intractable posterior distribution on the space of matchings contained in a k-partite hypergraph. We apply the Metropolis-Hastings (MH) algorithm to sample from this posterior. We consider the problem of choosing an efficient MH proposal distribution and we obtain consistent mixing improvements compared to the choices found in the literature. Simulated Tempering techniques can be used to overcome multimodality and a multiple proposal scheme is developed to allow for parallel programming. Finally, we discuss results arising from the careful use of convergence diagnostic techniques.

This allows us to study a dataset including locations and placenames of 1319 Anglo-Saxon settlements dated between 750 and 850 AD. Without strong prior knowledge, the model allows for explicit estimation of the number of clusters, the average intra-cluster dispersion and the level of interaction among placenames. The results support the hypothesis of organization of settlements into administrative clusters based on complementary names.

**27/11, Jennifer Wadsworth, University of Cambridge, Likelihood-based inference for max-stable processes: some recent developments**

Max-stable processes are an important class of models for extreme values of processes indexed by space and / or time. They are derived by taking suitably scaled limits of normalized pointwise maxima of stochastic processes; in practice therefore one uses them as models for maxima over many repetitions. However, the complicated nature of their dependence structures means that full (i.e., d-dimensional, where a process is observed at d locations) likelihood inference is not straightforward. Recent work has demonstrated that by including information on when the maxima occurred, full likelihood-based inference is possible for some classes of models. However, whilst this approach simplifies the likelihood enough to make the inference feasible, it can also cause or accentuate bias in parameter estimation for processes that are weakly dependent. In this talk I will describe the ideas behind full likelihood inference for max-stable processes, and discuss how this bias can occur. Understanding of the bias issue helps to identify potential solutions, and I will illustrate one possibility that has been successful in a high-dimensional multivariate model.

**2/12, David Perrett, Perception Lab, St Andrews University, UK: Statistical analysis of visual cues underlying facial attractiveness**

Abstract:

Our approach involves two phases: (a) identify visual cues correlated with judgments, (b) confirm the impact of those cues on perception by transforming cue values in images or models of faces. We also search for the biological basis or meaning of the cues. I will illustrate the approaches for how skin colour and 3-D face shape affect perception.

Attractiveness of natural facial images is positively correlated with skin yellowness. Carotenoid pigments from fruit and vegetables in our diet impart yellowness (or ‘golden glow’) to the skin: eating more fruit and vegetables is accompanied by an increase in skin yellowness within a few weeks. Transforming facial images simulating an increase in the colour associated with a high carotenoid diet increases the apparent health and attractiveness of most faces. These judgments hold across cultures and ages (from early childhood to late adulthood). Carotenoid ornaments are used in many species as a signal of health, and are sexually selected. In humans too we find that carotenoid colour may provide an index of wellbeing in terms of fitness, and resilience to illness.

To analyse face shape we record a depth map of individual faces. For each face we manually define the position of 50 3-D landmarks (e.g., eye corners) on the depth map and then resample the facial surface so that there are a standard number of vertices between landmarks. Next the dimensions of surface shape variation across different faces are reduced using Principal Components Analysis. The vector between the average male face shape and average female face shape defines an axis of sexual dimorphism (or femininity – masculinity). Transforming the shape of faces along this axis, we find a curvilinear (quadratic) relationship of women’s ratings of attractiveness to men’s facial masculinity, with a peak in attractiveness at +90% shape masculinity and aversion to very low and very high levels of masculinity. This research work shows higher levels of masculinity to be attractive than prior work on the shape of faces using in 2-D images possibly because of the importance of volumetric details and increased realism of 3-D head models.

Other topics to be discussed include the role of (over) generalization in perceptual judgments to specific face cues and non-uniform 3-D facial growth.

**4/12, Anna Kiriliouk, Université Catholique de Louvain, An M-estimator of spatial tail dependence**

Abstract: Tail dependence models for distributions attracted to a max-stable law are fitted using observations above a high threshold. To cope with spatial, high-dimensional data, a rank-based M-estimator is proposed relying on bivariate margins only. A data-driven weight matrix is used to minimize the asymptotic variance. Empirical process arguments show that the estimator is consistent and asymptotically normal. Its finite-sample performance is assessed in simulation experiments involving popular max-stable processes perturbed with additive noise. An analysis of wind speed data from the Netherlands illustrates the method.

**11/12, Nibia Aires, Astellas Pharma, Leiden, Statistics in drug development, what happens after submission?**

In drug development, a new candidate compound with potential good properties to cure a disease condition needs to go through a long path to become a new medicine, new medical treatment or device. Starting at an exploratory phase where, for instance, a new molecule is identified and tested in a range of settings; it continues, if successful, to an early clinical development phase being tested in humans. At this stage, if its toxicity and patient safety is established successfully, the new compound will follow a series of testing in controlled experiments involving humans, so called clinical trials, with the goal to launch the new drug to the market.

**16/12, Sören Christensen, Kiel, Representation Results for Excessive Functions and Application to Stochastic Control Problems**

Abstract:

Two approaches for solving sequential decision problems are presented. Both are based on representation results for excessive functions of Markov processes. In the first approach, we represent these functions as expected suprema up to an exponential time. This leads to generalizations of recent findings for Lévy processes obtained essentially via the Wiener-Hopf factorization to general strong Markov processes on the real line. In the second approach, the Riesz integral representation is utilized to solve sequential decision problems without the machinery of local time-space-calculus on manifolds. In the end, generalizations of these findings to impulse control problems are discussed.

Most results are based on joint work with Paavo Salminen.

**18/12, Mark van de Wiel, Dep. of Epidemiology & Biostatistics and Dep. of Mathematics, VU University medical center and VU university, How to learn from a lot: Empirical Bayes in Genomics**

Abstract:

The high-dimensional character of genomics data generally forces statistical inference methods to apply some form of penalization, e.g. multiple testing, penalized regression or sparse gene networks. The other side of the coin, however, is that the dimension of the variable space may also be used to learn across variables (like genes, tags, methylation probes, etc). Empirical Bayes is a powerful principle to do so. In both Bayesian and frequentist applications it comes down to estimation of the a priori distribution of parameter(s) from the data.

We shortly review some well-known statistical methods that use empirical Bayes to analyse genomics data. We believe, however, that the principle is often not used at its full strength. We illustrate the flexibility and versatility of the principle in three settings: 1) Bayesian inference for differential expression from count data (e.g. RNAseq), 2) prediction of binary response, and 3) network reconstruction.

For 1) we develop a novel algorithm, ShrinkBayes, for the efficient simultaneous estimation of multiple priors, allowing joint shrinkage of multiple parameters in differential gene expression models. This can be attractive when sample sizes are small or when many nuisance parameters like batch effects are present. For 2) we demonstrate how auxiliary information in the form of 'co-data', e.g. p-values from an external study or genomic annotation, can be used to improve prediction of binary response, like tumour recurrence. We derive empirical Bayes estimates of penalties of groups of variables in a classical logistic ridge regression setting, and show that multiple source of co-data may be used. Finally, for 3) we combine empirical Bayes with computationally efficient variational Bayes approximations of posteriors for the purpose of gene network reconstruction by the use structural equation models. These models regress each gene on all others, and hence this setting can be regarded as a combination of 1) and 2). We show the benefits of empirical Bayes on a several real data sets.

**18/12,**

**Lars Rönnegård, Dalarna University: Hierarchical generalized linear models – a Lego approach to mixed models**

Abstract: The method of hierarchical generalized linear models (HGLM) fits generalized linear models with random effects and was introduced by Lee & Nelder (1996). It is based on the extended likelihood principle and is a complete statistical framework including inference and model selection tools. In this presentation I give several examples from genetics where HGLM has been applied. I will also show that the HGLM approach allows extended modelling in a building-block type of structure; like Lego. Together with my colleagues, I have implemented the HGLM method in the R package hglm (available on CRAN) and I will show how this “Lego approach” can be used to fit quantitative genetic models and spatial CAR models in hglm.

## 2013

**31/1, Mikhail Lifshits, S:t Petersburg State University: Small deviation probabilities and their interplay with operator theory and bayesian statistics**

Small deviation, or small ball, probability simply means P(||X||<r) as r tends to zero for X being a random element of a Banach space. Typically X is a trajectory of a random process such as Wiener process, fractional Brownian motion, Levy process, etc., while ||.|| is some norm on a functional space. There is no general technique for evaluating small deviation probability but in some important cases interesting links lead from small deviations to entropy of linear operators, eigenvalues of Sturm-Liouville problems etc. We will discuss these links, supply examples, and will review some applications to Bayesian statistics.

**14/3, Anders Sandberg, Future of Humanity Institute, Oxford, Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes**

Some risks have extremely high stakes. For example, a worldwide pandemic or asteroid impact could potentially kill more than a billion people. Comfortingly, scientific calculations often put very low probabilities on the occurrence of such catastrophes. In this paper, we argue that there are important new methodological problems which arise when assessing global catastrophic risks and we focus on a problem regarding probability estimation. When an expert provides a calculation of the probability of an outcome, they are really providing the probability of the outcome occurring, given that their argument is watertight. However, their argument may fail for a number of reasons such as a flaw in the underlying theory, a flaw in the modeling of the problem, or a mistake in the calculations. If the probability estimate given by an argument is dwarfed by the chance that the argument itself is flawed, then the estimate is suspect. We develop this idea formally, explaining how it differs from the related distinctions of model and parameter uncertainty. Using the risk estimates from the Large Hadron Collider as a test case, we show how serious the problem can be when it comes to catastrophic risks and how best to address it. This is joint work with Toby Ord and Rafaela Hillerbrand.

**4/4, Daniel Johansson, Fysisk resursteori, Chalmers: Climate sensitivity: Learning from observations**

Although some features of climate change are known with relative certainty, many uncertainties in the climate science remain. The most important uncertainty pertains to the Climate Sensitivity (CS), i.e., the equilibrium increase in the global mean surface temperature that follows from a doubling of the atmospheric CO2 concentration. A probability distribution for the CS can be estimated from the observational record of global mean surface temperatures and ocean heat uptake together with estimates of anthropogenic and natural radiative forcings. However, since the CS is statistically dependent on other uncertain factors, such as the uncertainty in the direct and indirect radiative forcing of aerosols, it is difficult to constrain this distribution from observations. The primary aim with this presentation is to analyse how the distribution of the climate sensitivity changes over time as the observational record becomes longer. We are using a Bayesian Markov Chain Monte Carlo approach together with an Upwelling Diffusion Energy Balance Model for this. Also, we will discuss in brief how sensitive the climate sensitivity estimate is to changes in the structure of the geophysical model and to changes on the observational time series.

**11/4, Johan Johansson, Chalmers: On the BK inequality**

A family of binary random variables is said to have the BK property if, loosely speaking, for any two events that are increasing in the random variables, the probability that they occur disjointly is at most the product of the probabilities of the two events. The classical BK inequality states that this holds if the random variables are independent. Since the BK property is stronger than negative association, it is a form of negative dependence property and one would expect other negatively dependent families to have the BK property. This has turned out to be quite a challenge and until very recently, no substantial example beside the independent case were known. In this talk I will give two of these examples, the k-out-of-n measure and pivotal sampling, and sketch how to prove the BK inequality for these. I will also mention a few seemingly "simple questions" and how solutions to these would be profoundly important.

**16/4, Alexandra Jauhiainen: Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles**

Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or expensive to acquire. On the other hand, observational data of the organism in steady state (e.g. wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network.

The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in uncovering the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network.

**2/5, Maryam Zolghadr and Sergei Zuyev, Chalmers: Optimal design of dilution experiments under volume constraints**

Abstract:

We develop methods to construct a one-stage design of dilution experiments under the total available volume constraint typical for bio-medical applications. We consider different optimality criteria based on the Fisher information in both non-Bayesian and Bayesian settings. It turns out that the optimal design is typically one atomic, meaning that all the dilutions should be of the same size. Our proposed approach to solve such optimization problems is a variational analysis of functionals of a measure. The advantage of the measure optimization approach is that additional requirements like a total cost of experiment can be easily incorporated into the goal function.

**21/5, Johan Wallin, Lund: Spatial Matérn fields generated by non-Gaussian noise**

Abstract:

In this work, we study non-Gaussian extensions of a recently discovered link between certain Gaussian random fields, expressed as solutions to stochastic partial differential equations, and Gaussian Markov random fields. We show how to construct efficient representations of non-Gaussian random fields generated by generalized asymmetric Laplace noise and normal inverse Gaussian noise, and discuss parameter estimation and spatial prediction for these models. Finally, we look at an application to precipitation data from the US.

**23/5, Youri Davydov, Université Lille-1: On convex hulls of sequences of stochastic processes**

Abstract:

Let X_i = { X_i (t), t in T} be i.i.d. copies of a d-dimensional process X = { X(t), \; t in T}, where T is a general separable metric space. Assume that X has a.s. bounded paths and consider the convex hulls W_n constructed by the trajectories of X_i's. We are studying the existence of a limit shape W for the sequence {W_n} normalised by appropriate constants b_n. We show that in the case of Gaussian processes, W_n/b_n converges a.s. to W which is nonrandom, whereas for the processes satisfying a regular variation condition the convergence is in law and the limit set W in many cases is a random polytope.

**28/5, Patrik Rydén, Department of Mathematics and Mathematical statistics and Computational Life science Cluster (CLiC), Umeå University: Analysis of high-dimensional genomics data - challenges and opportunities**

Abstract:

High throughput technologies in life science such as high-throughput DNA and RNA sequencing, gene expression arrays, mass spectrometry, ChIP-chip and methylation arrays have allowed genome-wide measurements of complex cellular responses for a broad range of treatments and diseases. The modern technologies are powerful, but in order for them to reach their full potential new statistical tools need to be developed.

I will discuss pre-processing of microarray data (the discussion will also be relevant for other techniques), how pre-processing affects down-stream cluster analysis and why cluster analysis of samples (e.g. tumour samples) often fails to cluster the samples in a relevant manner. Finally, I will give my view on the future in the field of genomics research and what role statisticians can play.

**13/6, Fima Klebaner, Evaluations of expectations of functionals of diffusions by simulations**

Abstract:

We consider the problem of evaluations of expectations by simulations. After a brief introduction, we point out that there is a problem with the standard approach if the functional in question is not continuous. Evaluation of probability of absorption (or ruin probability) by simulations is shown as an example. We give a modification of the standard Euler-Maruyama scheme to obtain convergence. Open problems still remain. This joint work with Pavel Chigansky, Hebrew University.

**29/8, Jesper Möller, Aalborg University, Determinantal point process models and statistical inference**

Statistical models and methods for determinantal point processes (DPPs) seem largely unexplored, though they possess a number of appealing properties and have been studied in mathematical physics, combinatorics, and random matrix theory. We demonstrate that DPPs provide useful models for the description of repulsive spatial point processes, particularly in the 'soft-core' case. Such data are usually modelled by Gibbs point processes, where the likelihood and moment expressions are intractable and simulations are time consuming. We exploit the appealing probabilistic properties of DPPs to develop parametric models, where the likelihood and moment expressions can be easily evaluated and realizations can be quickly simulated. We discuss how statistical inference is conducted using the likelihood or mo- ment properties of DPP models, and we provide freely available software for simulation and statistical inference.

The work has been carried out in collaboration with Ege Rubak, Aalborg University, and Frederic Lavancier, University of Nantes. The paper is available at arXiv:1205.4818.

**12/9, Christos Dimitrakakis, Chalmers, ABC Reinforcement Learning**

Abstract:

We introduces a simple, general framework for \emph{likelihood-free} Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The main advantage is that we only require a prior distribution on a class of simulators. This is useful in domains where a probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique in this case. In fact, it can be seen as an extension of simulation methods to both planning and inference.

We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is sound.

**19/9, Jeff Steif, Strong noise sensitivity and Erdos Renyi random graphs**

Abstract: Noise sensitivity concerns the question of when complicated events involving many i.i.d. random variables are (or are not) sensitive to small perturbations in these variables.

The Erdos Renyi random graph is the graph obtained by taking n vertices and connecting each pair of vertices independently with probability p_n.

This random graph displays very interesting behaviour. We will discuss some recent results concerning noise sensitivity for events involving the Erdos Renyi random graph. This is joint work with Eyal Lubetzky.

**24/9, Ben Morris, University of California, Mixing time of the card-cyclic to random shuffle**

Abstract: We analyse the following method for shuffling n cards. First, remove card 1 (i.e., the card with label 1) and then re-insert it randomly into the deck. Then repeat with cards 2, 3,..., n. Call this a round. R. Pinsky showed, somewhat surprisingly, that the mixing time is greater than one round. We show that in fact the mixing time is on the order of log n rounds. Joint work with Weiyang Ning and Yuval Peres.

**26/9,**

**Ben Morris, University of California, Mixing time of the overlapping cycles shuffle and square lattice rotations shuffle**

Abstract: The overlapping cycles shuffle, invented by Johan Jonasson, mixes a deck of n cards by moving either the nth card or (n-k)th card to the top of the deck, with probability half each. Angel, Peres and Wilson determined the spectral gap for the location of a single card and found the following surprising behaviour. Suppose that k is the closest integer to cn for a fixed c in (0,1). Then for rational c, the spectral gap is on the order of n^{-2}, while for poorly approximable irrational numbers c, such as the reciprocal of the golden ratio, the spectral gap is on the order of n^{-3/2}. We show that the mixing time for all the cards exhibits the same behaviour (up to logarithmic factors), proving a conjecture of Jonasson.

The square lattice rotations shuffle, invented by Diaconis, is defined as follows. The cards are arrayed in a square. At each step a row or column is chosen, uniformly at random, and then cyclically rotated by one unit. We find the mixing time of this shuffle to within logarithmic factors. Joint work with Olena Blumberg.

**3/10, Malwina Luczak, Queen Mary, University of London, The stochastic logistic epidemic**

(joint work with Graham Brightwell)

**10/10, Johan Tykesson, The Poisson cylinder model**

Abstract:

We consider a Poisson point process on the space of lines in R^d, where a multiplicative factor u>0 of the intensity measure determines the density of lines. Each line in the process is taken as the axis of a bi-infinite solid cylinder of radius 1. We show that there is a phase transition in the parameter u regarding the existence of infinite connected components in the complement of the union of the cylinders. We also show that given any two cylinders c_1 and c_2 in the process, one can find a sequence of d-2 other cylinders which creates a connection between c_1 and c_2.

The talk is based on joint works with Erik Broman and David Windisch.

**15/10,**

**Manuel García Magariños, UDC, Spain, A new parametric approach to kinship testing**

Abstract:

Determination of family relationships from DNA data goes back decades. Statistical inference of relationships has traditionally followed a likelihood-based approach. In the forensic science, hypothesis testing is usually formulated verbally in order to provide with a good understanding to non-experts. Nonetheless, this formulation lacks a proper mathematical parameterization, leading to controversy in the field. We propose an alternative hypothesis testing framework based on the likelihood calculations for pairwise relationships of Thompson, 1975. This is in turn based on the concept of identity-by-descent (IBD) genes shared between individuals. Pairwise relationships can be specified by (k0,k1,k2), the probability that two individuals share 0, 1 and 2 IBD alleles. The developed approach allows to build a complete framework in statistical inference: point estimation, hypothesis testing and confidence regions for (k0,k1,k2). Theoretical properties have been studied. Extension to trios has been carried out in order to consider common problems in forensics. Results indicate the hypothesis testing procedure is quite powerful, especially with trios. Accurate point estimations of (k0,k1,k2) are obtained. This holds even for low number of markers and intricate relationships. Extensions to more than three individuals and inbreeding cases remain to be developed.

**17/10 kl 13.15-14.30, Tanja Stadler, ETH, Phylogenetics in action: Uncovering macro-evolutionary and epidemiological dynamics based on molecular sequence data**

Abstract:

What factors determine speciation and extinction dynamics? How can we explain the spread of an infectious disease? In my talk, I will discuss computational advances in order to address these key questions in the field of macro-evolution and epidemiology. In particular, I will present phylogenetic methodology to infer (i) macro-evolutionary processes based on species phylogenies shedding new light on mammal and bird diversification, and (ii) epidemiological processes based on genetic sequence data from pathogens shedding new light on the spread of HCV and HIV.

**17/10, Gordon Slade, University of British Columbia, Weakly self-avoiding walk in dimension four**

Abstract: We report on recent and ongoing work on the continuous-time weakly self-avoiding walk on the 4-dimensional integer lattice, with focus on a proof that the susceptibility diverges at the critical point with a logarithmic correction to mean-field scaling. The method of proof, which is of independent interest, is based on a rigorous renormalisation group analysis of a supersymmetric field theory representation of the weakly self-avoiding walk. The talk is based on collaborations with David Brydges, and with Roland Bauerschmidt and David Brydges.

**22/10, Prof. Gennady Martynov, Inst. for Information Transmission Problems, RAS, Moscow: Cramer-von Mises Gaussianity test for random processes on [0,1]**

Abstract. We consider the problem of testing the hypothesis that the observed in the interval (0,1) is a Gaussian random process. Representation of the process in the Hilbert space it is used . The proposed test is based on the classic Cramer-von Mises test. We introduce also a modification of the concept of the distribution function. It was developed an asymmetric Cramer-von Mises test. The methods must be considered for exact calculation of limiting distributions tables of the proposed statistics.

**31/10, Alexandre Proutiere, KTH, Bandit Optimisation with Large Strategy Sets and Applications**

Abstract: Bandit optimisation problems constitute the most fundamental and basic instances of sequential decision problems with an exploration-exploitation trade-off. They naturally arise in many contemporary applications found in communication networks, e-commerce and recommendation systems. In this lecture, we present recent results on bandit optimisation problems with large strategy sets. For such problems, the number of possible strategies may not be negligible compared to the time horizon. Results are applied to the design of protocols and resource sharing algorithms in wireless systems.

**7/11,**

**Boualem Djehiche, KTH, On the subsolution approach to efficient importance sampling**

Abstract: The widely used Monte Carlo simulation technique where all the particles are independent and statistically identical and their weights are constant is by no means universally applicable. The reason is that particles may wander off to irrelevant parts of the state space, leaving only a small fraction of relevant particles that contribute to the computational task at hand. Therefore it may require a huge number of particles to obtain a desired precision, resulting in a computational cost that is too high for all practical purposes. A control mechanism is needed to force the particles to move to the relevant part of the space, thereby increasing the importance of each particle and reducing the computational cost. Importance sampling technique offers a way to choose a sampling dynamics (the main difficult part) to steer the particles towards the relevant part of the state space. In this talk I will review some recent results on the so-called subsolution approach to Importance Sampling that is able to tune the sampling dynamics at hopefully lower costs.

This is joint work with Henrik Hult and Pierre Nyquist.

**12/11, Ioannis Papastathopoulos, Bristol University, Graphical structures in extreme multivariate events**

Abstract: Modelling and interpreting the behaviour of extremes is quite challenging, especially when the dimension of the problem under study is large. Initially, univariate extreme value models are used for marginal tail estimation and then, the inter-relationships between random variables are captured by modelling the dependence of the extremes. Here, we propose graphical structures in extreme multivariate events of a random vector given that one of its components is large. These structures aim to provide better estimates and predictions of extreme quantities of interest as well as to reduce the problems with the curse of dimensionality. The imposition of graphical structures in the estimation of extremes is approached via simplified parameter structure in maximum likelihood setting and through Monte Carlo simulation from conditional kernel densities. The increase in efficiency of the estimators and the benefits of the proposed method are illustrated through simulation studies.

**21/11, Annika Lang, How does one computationally solve a stochastic partial differential equation?**

Abstract:

The solution of a stochastic partial differential equation can for example be seen as a Hilbert-space-valued stochastic process. In this talk I discuss discretizations in space, time, and probability to simulate the solution with a computer and I derive convergence rates for different types of approximation errors.

**28/11 Arne Pommerening, Swiss Federal Institute for Forest, Snow and Landscape Research WSL, Birmensdorf, Switzerland What are the differences between competition kernels and traditional size-ratio based competition indices used in plant ecology?**

For a fair comparison of the two approaches we selected two fundamental and widespread types of competition indices based on distance weighted size ratios, an additional index without distance weighting as a control and developed the corresponding competition kernels. In contrast to the latter, competition indices require individual influence zones derived from tree crown-radius measurements. We applied these competition measures to three spatial tree time series in forest ecosystems in Europe and North America. Stem diameter increment served as a response variable.

Contrary to our expectation, the results of both methods indicated similar performance, however, the use of competition kernels produced slightly better results with only one exception out of six comparisons.

Although the performance of both competition measures is not too different, competition kernels are based on more solid mathematical and ecological grounds. This is why applications of this method are likely to increase. The trade-off of the use of competition kernels, however, is the need for more sophisticated spatial regression routines that researchers are required to program.

**5/12, Kaspar Stucki, University of Göttingen, Germany, Continuum percolation for Gibbs point processes**

Abstract:

We consider percolation properties of the Boolean model generated by a Gibbs point process and balls with deterministic radius. We show that for a large class of Gibbs point processes there exists a critical activity, such that percolation occurs a.s. above criticality. For locally stable Gibbs point processes we show a converse result, i.e. they do not percolate a.s. at low activity.

**12/12, Olle Häggström, Are all ravens black? The problem of induction**

**19/12, Marianne Månsson, Astra Zeneca, Statistical paradoxes and curiosities in medical applications**

Abstract:

Why do my friends have more friends than I have? A feeling shared by most people which was formulated in the 90s as The friendship paradox. Is it really true? Can it be used for prediction of epidemics? This is one of the paradoxes which will be discussed in this seminar.

## 2012

**19/1 Anna-Kaisa Ylitalo, University of Jyväskylä, Statistical inference for eye movements**

Eye movements can be measured by electronic eye trackers, which produce high-precision spatio-temporal data. Eye tracking has become an important and widespread indirect measure of reactions to stimuli both in planned experiments and in observational studies. In applications mainly conventional statistical methods have been used for the analysis of eye tracking data and often the methods are based on strong aggregation. Our aim is to utilize more advanced statistical approaches through modelling in order to extract detailed information on the data. The great challenges are heterogeneity and large variation within and between the units.

**26/1 Chris Jennison, University of Bath, Effective design of Phase II and Phase III trials: an over-arching approach**

This talk will report on work carried out by a DIA (formerly PhRMA) Working Group on Phase II/III Adaptive Programs.

**7/2, Paavo Salminen, Åbo Akademi, Optimal stopping of continuous time Markov processes**

After two motivating examples some methods/verification theorems for solving optimal stopping problems will be discussed. These are based on

- principle of smooth pasting,

- Riesz representation for excessive functions,

- representing excessive functions as expected supremum.

The talk is concluded with further examples, in particular, for Lévy processes

**7/2, Juha Alho, University of Eastern Finland, Statistical aspects of mortality modeling**

Declines of mortality have, during the past century, been clearly faster than anticipated. Mistaken judgment has been the primary reason for erroneous forecasts, but decisions made in statistical modeling can also play a remarkably large role. We will illustrate the problem with data and experiences from Sweden and other countries and comment on the implications on the sustainability of pension systems. In particular, the Finnish life expectancy adjustment and the Swedish NDC system will be mentioned.

**9/2, Stas Volkov, University of Lund, Forest fires on integers**

Consider the following version of the forest-fire model on graph G. Each vertex of a graph becomes occupied with rate one. A fixed chosen vertex, say v, is hit by a lightning with the same rate, and then the whole cluster of occupied vertices containing v is completely burnt out. I will show that when G = Z+, the times between consecutive burnouts, properly scaled, converge weakly to a random variable which distribution is one minus the Dickman function.

**16/2, Anton Muratov, Bit Flipping Models**

In many areas of engineering and science one faces with an array of devices which possess a few states. In the simplest case these could be on-off or idle-activated states, in other situations broken or `dead' states are added. If the activation-deactivation (flipping) or breakage cycles produce in a random fashion, a natural question to ask is when, if at all, the system of devices, which we call bits, recovers to some initial or ground state. By this we usually mean the state when all the bits are not active, allowing only for idling and/or broken bits to be seen. When the number of bits is infinite, the time to recover may assume infinite values when the system actually does not recover or finite values. In the former case we speak of transient behaviour of the system. In the latter case, depending of whether the mean of the recover time exists or not, we speak of positive or null-recurrence of the system. The terminology is borrowed from Markov chains setting and the above classification is tightly related to the exact random mechanism governing the change of bits' states.

**1/3, Oleg Sysoev, Linköping University, Monotonic regression for large multivariate datasets**

27/3, Tatyana Turova, Lunds University, Bootstrap percolation on some models of random graphs

27/3, Tatyana Turova, Lunds University, Bootstrap percolation on some models of random graphs

We shall first consider a bootstrap percolation on a classical homogeneous random graph. It is proved in a joint work with S. Janson, T. Luczak . and T. Vallier, that the phase transition is very sharp in this model. Then we discuss some modifications of the bootstrap process on inhomogeneous random graphs, related to the modelling of neuronal activity.

**17/4, Ronald Meester, Scaling limits in fractal percolation**

We use ideas from two-dimensional scaling limits to study curves in > the limiting set of the so called fractal percolation process. More precisely, we show that the set consisting of connected components larger than one point is a.s. the union of non-trivial Holder continuous curves, all with the same exponent. The interesting thing here is the relation between the almost sure convergence of the fractal to its limit set, seen as compact sets, and the weak convergence of curves in a different topology.

**24/4, Krzysztof Bartoszek and Serik Sagitov, Interspecies correlation for randomly evolving traits**

A simple way to model phenotypic evolution is to assume that after splitting, the trait values of the sister species diverge as independent Brownian motions or Ornstein-Uhlenbeck processes. Relying on a prior distribution for the underlying species tree (conditioned on the number of extant species) we study the vector of the observed trait values treating it a random sample of dependent observations. In this paper we derive compact formulae for the variance of the sample mean and the mean of the sample variance. The underlying species tree is modelled by a (supercritical or critical) conditioned branching process. In the critical case we modify the Aldous-Popovic model by assuming a proper prior for the time of origin.

**8/5, Reinhard Bürger, Universität Wien, The effects of linkage and gene flow on local adaptation in a subdivided population: a deterministic two-locus model**

In spatially structured populations, gene flow may counteract local adaptation. We explore the combined effects of recombination and migration on the maintenance of genetic polymorphism and the degree of local adaptation in a spatially subdivided population. To this aim, we study a deterministic continent-island model of gene flow in which a derived (island) population experiences altered environmental conditions and receives maladaptive gene flow from the ancestral (continental) population. It is assumed that locally advantageous mutations have arisen on the island at two linked loci. Gene flow in concert with selection induces substantial linkage disequilibrium which substantially affects adaptation evolution and adaptation. The central mathematical result is an explicit characterization of all possible equilibrium configurations and bifurcation structures in the underlying two-locus model. From this, we deduce the dependence of the maximum amount of gene flow that admits the preservation of the locally adapted haplotype on the strength of recombination and selection. We also study the invasion of beneficial mutants of small effect that are linked to an already present, locally adapted allele. Because of linkage disequilibrium, mutants of much smaller effect can invade successfully than predicted by naive single-locus theory. This raises interesting questions on the evolution of the genetic architecture, in particular, about the emergence of clusters of tightly linked, slightly beneficial mutations and the evolution of recombination and chromosome inversions.

**10/5, Reinhard Bürger, Universität Wien, Invasion and sojourn properties of locally beneficial mutations in a two-locus continent-island model of gene flow**

In subdivided populations, adaptation to a local environment may be hampered by maladaptive gene flow from other subpopulations. At an isolated locus, i.e., unlinked to other loci under selection, a locally beneficial mutation can be maintained only if its selective advantage exceeds the immigration rate of alternative allelic types. As explained in my other talk, recent deterministic theory in the context of a continent-island model shows that, if the beneficial mutation arises in linkage to a locus at which a locally adapted allele is already segregating in migration-selection balance, the new mutant can be maintained under much higher immigration rates than predicted by one-locus theory. This deterministic theory ignores stochastic effects which are especially important in the early phase during which the mutant is still rare. In this talk, I report about work in progress (jointly with Simon Aeschbacher) on a suite of stochastic models with the aim of quantifying the invasion and sojourn properties of mutants in one- and two-locus continent-island models. These models reach from multitype branching processes to diffusion processes and Markov chains of Wright-Fisher type. Preliminary analytical and numerical results will be presented that highlight the influence of the various sources of stochasticity.

**15/5, Mari Myllymäki, Aalto University, Hierarchical modeling of second-order spatial structure of > epidermal nerve fiber patterns**

This talk discusses analysis of the second-order properties of the epidermal nerve fibers (ENFs) located in the epidermis, which is the outmost part of the skin. It has been observed that the ENF density decreases along diabetic neuropathy, while the spatial second-order analysis of ENFs has potential to detect and diagnose diabetic neuropathy in early stages when the ENF density may still be within the normal range. The data are suction skin blister samples from two body locations of healthy subjects and of subjects with diabetic neuropathy. The second-order property of the ENF entry points, i.e. the locations where the ENFs penetrate the epidermis, is summarized by a spatial summary function, namely Ripley's K function. We then apply a hierarchical latent Gaussian process regression in order to investigate how disease status and other covariates such as gender affect the level and shape of the second-order function, i.e. the degree of clustering of the points. This is work in progress.

**22/5, Amandine Veber, Ecole Polytechnique, Paris, Evolution in a spatial continuum**

In this talk, we will present a general framework for studying the evolution of the genetic composition of a population scattered into some area of space. These models rely on a ’duality’ relation between the reproduction model and the corresponding genealogies of a sample, which is of great help in understanding the large scale behaviour of the local (or global) genetic diversities. Furthermore a great variety of scenarii can be described, ranging e.g. from very local reproduction events to very rare and massive extinction/recolonization events. In particular, we shall see how the parameters of local evolution can be inferred despite the (possible) presence of massive events in the distant past having a significant impact. (Joint work with N. Barton, A. Etheridge and J. Kelleher)

**24/5, Amandine Veber, Ecole Polytechnique, Paris, Large-scale behaviour of the spatial Lambda-Fleming-Viot process**

The SLFV process is a population model in which individuals live in a continuous space. Each of them also carries some heritable type or allele. We shall describe the long-term behaviour of this measure-valued process and that of the corresponding genealogical process of a sample of individuals in two cases : one that mimics the evolution of nearest-neighbour voter model (but in a spatial continuum), and one that allows some individuals to send offspring at very large distances. This is a joint work with Nathanaël Berestycki and Alison Etheridge.

**31/5, Mattias Villani, Linköping University, Bayesian Methods for Flexible modeling of Conditional Distributions**

A general class of models and a unified Bayesian inference methodology is proposed for flexibly estimating the distribution of a continuous or discrete response variable conditional on a set of covariates. Our model is a finite mixture model with covariate-dependent mixing weights. The parameters in the mixture components are linked to sets of covariates, and special attention is given to the case where covariates enter the model nonlinearly through additive or surface splines. A new parametrization of the mixture and the use of an efficient MCMC algorithm with integrated Bayesian variable selection in all parts of the model successfully avoids over-fitting, even when the model is highly over-parameterized.

**7/6, Brunella Spinelli and Giacomo Zanella, Chalmers and University of Milan, Stable point processes: statistical inference and generalisations**

Stable point processes arise inevitably in various limiting schemes involving superposition of thinned point processes. When intensities of the processes are finite, the limit is Poisson, otherwise it is a discrete stable (DaS) point process with an infinite intensity measure and as such is an appealing model for various phenomena showing highly irregular (or bursty) behaviour. The first part of the talk will concentrate on estimation procedures of the distribution parameters of a stationary DaS process. The second part presents generalisations of the thinning procedure based on a branching process characterisations of the corresponding branching-stable processes. This generalisation is maximal in the sense that any operation replacing thinning which is required to possess natural associativity and distributivity with respect to superposition properties is necessarily a branching.

**4/9, Giacomo Zanella, Warwick University, UK: Branching stable point processes**

Branching stability is a recent concept in point processes and describe the limiting regime in superposition of point processes where particles are allowed to evolve independently according to a subcritical branching process. It is a far-reaching generalisation of the F-stability for non-negative integer random variables introduced in 2004 by Steutel and Van Harn. We fully characterise such processes in terms of their generating functionals and give their cluster representation for the case of non-migrating particles which correspond to Steutel and Van Harn case. We then extend our results to particular important examples of migration mechanism of the particles and characterise the corresponding stability. Branching stable point processes are believed to be an adequate model for contemporary telecommunications systems which show spatial burstiness, like the position of mobile telephones during festival activities in a big city.

**6/9, Ilya Molchanov, University of Bern, Switzerland**

Invariance properties of random vectors and stochastic processes based on the zonoid concept Abstract: Two integrable random vectors in the Euclidean space are said to be zonoid equivalent if their projections on each given direction share the same first absolute moments. The paper analyses stochastic processes whose finite-dimensional distributions remain zonoid equivalent with respect to time shifts (zonoid stationarity) and permutations of time instances (swap-invariance). While the first concept is weaker than the stationarity, the second one is a weakening of the exchangeability property. It is shown that nonetheless the ergodic theorem holds for swap-invariant sequences.

**25/9, Prof. Günter Last, Karlsruhe Institute of Technology, Germany: Fock space analysis of Poisson functionals - 1 & 2**

These are the first two lectures in the series of four-lecture course summarising some recent developments in the theory of general Poisson processes. It can also be passed as a PhD course "Poisson measures" organised by Prof. Sergei Zuyev. His introductory lectures on the field are held on Tuesday 18/09 13:15-15:00 and Thursday 20/09 13:15-15:00. In the first lecture we will prove an explicit Fock space representation of square-integrable functions of a general Poisson process based on iterated difference operators [1]. As general applications we shall discuss explicit Wiener-Ito chaos expansions and some basic properties of Malliavin operators [1]. In the second lecture we will derive covariance identities and the Clark-Okone martingale representation for Poisson martingales [2]. Our first application are short proofs of the Poincare- and the FKG-inequality for Poisson processes. A second application is Wu's [3] elegant proof of a general log-Sobolev inequality for Poisson processes. The final application is minimal variance hedging for financial markets driven by Levy processes.

[1] Last, G. and Penrose, M.D. (2011). Fock space representation, chaos expansion and covariance inequalities for general Poisson processes. Probability Theory Related Fields, 150, 663-690.

[2] Last, G. and Penrose, M.D. (2011). Martingale representation for Poisson processes with applications to minimal variance hedging. Stochastic Processes and their Applications 121, 1588-1606.

[3] Wu, L. (2000). A new modified logarithmic Sobolev inequality for Poisson point processes and several applications. Probability Theory Related Fields 118, 427-438.

**27/9, Prof. Günter Last, Karlsruhe Institute of Technology, Germany: Fock space analysis of Poisson functionals - 3 & 4**

These are the last two lectures in the series of four-lecture course summarising some recent developments in the theory of general Poisson processes. It can also be passed as a PhD course "Poisson measures" organised by Prof. Sergei Zuyev. His introductory lectures on the field are held on Tuesday 18/09 13:15-15:00 and Thursday 20/09 13:15-15:00. The third lecture presents some general theory for the perturbation analysis of Poisson processes [1] together with an application to multivariate Levy processes. The fourth and final lecture discusses the recent central limit theorem from [4] that is based on a nice combination of Malliavin calculus and the Stein-Chen method. We will apply this result as well as those from [2] to Poisson flat processes from stochastic geometry [3].

[1] Last, G. (2012). Perturbation analysis of Poisson processes. arXiv:1203.3181v1.

[2] Last, G. and Penrose, M.D. (2011). Fock space representation, chaos expansion and covariance inequalities for general Poisson processes. Probability Theory Related Fields, 150, 663-690.

[3] Last, G., Penrose, M.D., Schulte, M. and Th"ale, C. (2012). Moments and central limit theorems for some multivariate Poisson functionals. arXiv: 1205.3033v1.

[4] Peccati, G., Sole, J.L., Taqqu, M.S. and Utzet, F. (2010). Stein's method and normal approximation of Poisson functionals. Annual Probability 38, 443-478.

**28/9, Martin Rosvall, Umeå universitet: Mapping change in large networks**

Change is a fundamental ingredient of interaction patterns in biology, technology, the economy, and science itself: Interactions within and between organisms change; transportation patterns by air, land, and sea all change; the global financial flow changes; and the frontiers of scientific research change. Networks and clustering methods have become important tools to comprehend instances of these large-scale structures, but without methods to distinguish between real trends and noisy data, these approaches are not useful for studying how networks change. Only if we can assign significance to the partitioning of single networks can we distinguish meaningful structural changes from random fluctuations. Here we show that bootstrap resampling accompanied by significance clustering provides a solution to this problem. To connect changing structures with the changing function of networks, we highlight and summarize the significant structural changes with alluvial diagrams and realize de Solla Price's vision of mapping change in science: studying the citation pattern between about 7000 scientific journals over the past decade, we find that neuroscience has transformed from an interdisciplinary specialty to a mature and stand-alone discipline.

**11/10, Nanny Wermuth, Chalmers and International Agency of Research on Cancer, Lyon, France**

Traceable regressions applied to the Mannhein study of children at risk Abstract: We define and study the concept of traceable regressions and apply it to some examples. Traceable regressions are sequences of conditional distributions in joint or single responses for which a corresponding graph captures an independence structure and represents, in addition, conditional dependences that permit the tracing of pathways of dependence. We give the properties needed for transforming these graphs and graphical criteria to decide whether a path in the graph induces a dependence. The much stronger constraints on distributions that are faithful to a graph are compared to those needed for traceable regressions.

**18/10, Stas Volkov, Lund University: On random geometric subdivisions**

I will present several models of random geometric subdivisions, similar to that of Diaconis and Miclo (Combinatorics, Probability and Computing, 2011), where a triangle is split into 6 smaller triangles by its medians, and one of these parts is randomly selected as a new triangle, and the process continues ad infinitum. I will show that in a similar model the limiting shape of an indefinite subdivision of a quadrilateral is a parallelogram. I will also show that the geometric subdivisions of a triangle by angle bisectors converge (but only weakly) to a non-atomic distribution, and, time permitting, that the geometric subdivisions of a triangle by choosing a uniform random points on its sides converges to a “flat” triangle, similarly to the result of the paper mentioned above.

**1/11, Uwe Rösler, University of Kiel: On Stochastic Fixed Point Equations and the Weighted Branching Process**

Stochastic fixed point equations X=f(U,(X_n)_{n\in\N}) U, X_i are independent and X_i=X (all equalities are in distribution) have now some interest of its own. The starting point was the characterization of the limiting distribution of the sorting QUICKSORT as a solution of a fixed point equation. After that many more examples popped up, characterization of old ones like stable distributions, many new ones in the analysis of algorithms by the contraction method, in population dynamics and in financial mathematics.

**8/11, Eugene Mamontov, Chalmers: Non-stationary invariant and dynamic-equilibrium Markov stochastic processes**

The present work considers continuous Markov stochastic processes defined in the entire time axis. They are of a considerable importance in the natural/life sciences and engineering. They draw attention to invariant Markov processes, which are non-stationary. The work discusses the key features of latter processes, their covariance and spectral-density functions, as well as some of the related notions such as dynamic equilibrium Markov processes and stability in distribution. The meaning of the dynamic equilibrium processes is also emphasized in connection with their role in living systems.

**13/11, Måns Henningson, Chalmers: Quantum theory and probability**

Classical physics fall in the framework of philosophical realism: There are objective facts, regardless of our knowledge about them. Einstein added that each such fact must be localized in space-time, and that its influence could not propagate faster than the speed of light. The result of an experiment is in principle determined by these facts, but possibly there are also "hidden variables" whose values we cannot directly determine. One could then introduce a probability distribution for these, from which follows a probability distribution for the result of our experiment. Quantum physics gives a rather different view of the world. Here "randomness" appears to enter at a more fundamental level and has nothing to do with our lack of knowledge of any hidden variables. John Bell constructed a Gedankenexperiment (which has later been performed in reality) to shed light on this. He derived, under the assumptions of classical physics together with Einstein's amendment, an inequality that must be obeyed by certain statistical correlations for experimental results. Quantum physics violates the Bell inequalities, and the real experiments confirm quantum physics. This conflict in a sense derives from the quantum notion of "entanglement", which does not have any classical counterpart: It reflects the impossibility to describe the state of a composite system in terms of the states of its constituent parts (which do not even have to "interact" with each other).

**15/11, Anders Johansson, Gävle: Existence of matchings in random sub-hypergraphs**

*H*of a fixed hypergraph

*G*. Such laws can be established when, say,

*G*is complete and

*H*is a Bernoulli process on

*G*, using a local symmetry of the distribution of

*H*. The same symmetry argument allows for the problem of finding factors in random graphs. I will also discuss problems regarding Latin Squares where this argument breaks down and where new ideas are needed.

**22/11, David Bolin, University of Lund: Excursion and contour uncertainty regions for latent Gaussian models**

An interesting statistical problem is to find regions where some studied process exceeds a certain level. Estimating these regions so that the probability for exceeding the level jointly in the entire set is some predefined value is a difficult problem that occurs in several areas of applications ranging from brain imaging to astrophysics. In this work, we propose a method for solving this problem, and the related problem of finding uncertainty regions for contour curves, for latent Gaussian models. The method is based on using a parametric family for the excursion sets in combination with integrated nested Laplace approximations and an importance sampling-based algorithm for estimating joint probabilities. The accuracy of the method is investigated using simulated data and two environmental applications are presented. In the first, areas where the air pollution in the Piemonte region in northern Italy exceeds the daily limit value, set by the European Union for human health protection, are estimated. In the second, regions in the African Sahel that experienced an increase in vegetation after the drought period in the early 1980s are estimated.

**29/11, Dietrich von Rosen, SLU Uppsala: From univariate linear to multilinear models**

The presentation is based on a number of figures illustrating appropriate linear spaces reflecting a tour from univariate to multilinear models. The start is the classical Gauss-Markov model from where we jump into the multivariate world, i.e. MANOVA. The next stop will be the Growth Curve model and then a quick exposure of Extended growth curves will take place. The tour is ended with some comments on multilinear models

**11/12, Jimmy Olsson,**

**Lund University:**Metropolising forward particle filtering backward simulation and Rao-Blackwellisation using multiple trajectoriesSmoothing in state-space models amounts to computing the conditional distribution of the latent state trajectory, given observations, or expectations of functionals of the state trajectory with respect to this distribution. In recent years there has been an increased interest in Monte Carlo-based methods, often involving particle filters, for approximate smoothing in nonlinear and/or non-Gaussian state-space models. One such method is to approximate filter distributions using a particle filter and then to simulate, using backward kernels, a state trajectory backwards on the set of particles. In this talk we show that by simulating multiple realizations of the particle filter and adding a Metropolis-Hastings step, one obtains a Markov chain Monte Carlo scheme whose stationary distribution is the exact smoothing distribution. This procedure expands upon a similar one recently proposed by Andrieu, Doucet, Holenstein, and Whiteley. We also show that simulating multiple trajectories from each realization of the particle filter can be beneficial from a perspective of variance versus computation time, and illustrate this idea using two examples.

**13/12, Erik Lindström, Lund University: Tuned Iterated Filtering**

Maximum Likelihood estimation for partially observed Markov process models is a non-trivial problem, as the likelihood function often is unknown. Iterated Filtering is a simple, yet very general algorithm for computing the Maximum Likelihood estimate. The algorithm is 'plug and play' in the sense that it can be used with rudimentary statistical knowledge. The purpose of this talk is to discuss the algorithm, pointing out practical limitations, and suggest extensions and/or modifications that will improve the robustness and/or performance of the algorithm. We will also discuss the connection between the Iterated Filtering algorithm, and algorithms commonly used in engineering (system identification, signal processing etc.), illustrating that a similar algorithm has been known for several decades.

## 2011

**13/1 Dr. Raphaël Lachièze-Rey, University of Lille-1, France: Ergodicity of STIT tessellations**

Random tessellations form a relevant class of models for many natural phenomena in biology, geology, materials science. STIT tessellations (for STable under ITeration), introduced in the 2000's, are characterised by their stability under an operation called "iteration", which confers to them a privileged role in modelling phenomena of cracking or of division in nature. After a clear exposition of the model, we will present its main characteristics, establishing in particular its mixing properties.

**27/1 Alexey Lindo, Department of Mathematical Sciences, Chalmers University of Technology: A probabilistic analysis of Wagner's k-tree algorithm**

David Wagner introduced an algorithm for solving a k-dimensional generalization of the birthday problem (see [1]). It has wide applications in cryptography and cryptanalysis. A probabilistic model of Wagner's algorithm can be described as follows. Suppose that elements of the input lists are drawn from additive group of integers modulo $n$. Let the random variable W represents the number of solutions found by Wagner's algorithm in the introduced model. We first observe that W is the sum of the dependent indicators. Then using Chen-Stein method we derive Poisson approximation to the distribution of W. An upper bound on a total variation distance given in [2,3] is particularly essential for the proof. The bound allows to estimate the strength of encoding by the algorithm in terms of its parameters.

**3/2 Sergei Zuyev, Chalmers: Discussion seminar - Optimal design of dilution experiments**

This is the first in a (hopefully) series of Discussion seminars: more questions than answers are expected, so come open-minded and be ready for discussion!

**17/2 Peter Gennemark, Mathematical sciences: Identifying and compensating for systematic errors in a large-scale phenotypic screens**

We consider statistical questions concerning analysis of yeast growth curves. Each curve is based on measurements of the growth of a cell culture during 48 hours with three measurements per hour. The experimental set-up is large scale and allows 200 cultures to be monitored simultaneously. We study reproducibility in such large-scale experiments using a set of control experiments of only wild-type strains.

**24/2 Takis Konstantopoulos, Uppsala University: A stochastic ordered graph model**

We consider a stochastic directed graph on the integers whereby a directed edge between $i$ and a larger integer $j$ exists with probability $p_{j-i}$ depending solely on the distance between the two integers. Under broad conditions, we identify a regenerative structure that enables us to prove limit theorems for the maximal path length in a long chunk of the graph. We first discuss background literature of this stochastic model. The model is an extension of a special case of graphs studied by Foss and the speaker. We then consider a similar type of graph but on the `slab' $\Z \times I$, where $I$ is a finite partially ordered set. We extend the techniques introduced in the in the first part of the paper to obtain a central limit theorem for the longest path. When $I$ is linearly ordered, the limiting distribution can be seen to be that of the largest eigenvalue of a $|I| \times |I|$ random matrix in the Gaussian unitary ensemble (GUE). This is joint work with S Foss and D Denisov.

**3/3 Victor Brovkin, Max Planck Institute for Meteorology, Hamburg, Germany: Land biosphere models for future climate projections**

The Earth System Models (ESMs) are the best tools available for projecting changes in the atmospheric CO2 concentration and climate in the coming decades and centuries. ESMs include models of land biosphere which are based on well established understanding of plant physiology and ecology, but these models typically use very few observations to constrain model parameters. Extensive ground-based measurements of plant biochemistry, physiology, and ecology have led to a much better quantification of ecosystem processes during the last decades. Recent assimilation of many thousands of measurements of species traits in global databases opens a new perspective to specify plant parameters used in ecosystem models which predominantly operate at the level of large-scale plant units such as plant functional types (PFTs). Instead of constraining model parameters using values from a few publications, a novel approach aggregates plant traits from the species level to the PFT level using trait databases. A study to employ two global databases linking plant functional types to decomposition rates of wood and leaf litter to improve future projections of climate and carbon cycle using an intermediate complexity ESM, CLIMBER-LPJ, will be presented.

**10/3 Andrey Lange, Bauman Moscow State Technical University: Discrete stochastic systems with pairwise interaction**

A model of a system of interacting particles of types T_1, ... , T_n is considered as a continuous-time Markov process on a countable state space. Forward and backward Kolmogorov systems of differential equations are represented in a form of partial differential equations for the generating functions of transition probabilities. We study the limiting behaviour of probability distributions as time tends to infinity for two models of that type.

**10/3, Graham Jones, Durness, Scotland: Stochastic models for phylogenetic trees and networks**

The Tree of Life does not look as though it was generated by a constant rate birth-death process, since too many nodes show unbalanced splits where one branch leads to only a few tips and the other to very many. Generalizations of the constant rate birth-death process (age-dependent and multitype binary branching processes) can produce trees which look more like real phylogenetic trees. A method for numerical calculation of the probability distributions of these trees will be presented.

**31/3, Stig Larsson: Numerical approximation of stochastic PDEs**

Together with several co-workers during recent years I have studied numerical approximation of evolution PDEs perturbed by noise. You may consider this as a ''discussion seminar'' where I will review our work and ask for your advice and possible cooperation for future work.

**9/4, Dustin Cartwright, Berkeley: How are SNPs distributed in genes?**

The organization of genes into 3-base codons has certain consequences for the distribution of bases. Many of these consequences have been known for a long time. I will talk about a particular method for detecting these artefacts if we didn't already know the underlying cause. The central analytical tool will be the notion of the rank of a tensor.

**3/5, Maria Deijfen, Stockholm University: Scale-free percolation**

I will describe a model for inhomogeneous long-range percolation on Z^d with potential applications in network modeling. Each vertex is independently assigned a non-negative random weight and the probability that there is an edge between two given vertices is then determined by a certain function of their weights and of the distance between them. The results concern the degree distribution in the resulting graph, the percolation properties of the graph and the graph distance between remote pairs of vertices. The model interpolates between long-range percolation and inhomogeneous random graphs, and is shown to inherit the interesting features of both these model classes.

**12/5, Andras Balint: The critical value function in the divide and colour model**

The divide and colour model is a simple and natural stochastic model for dependent colourings of the vertex set of an infinite graph. This model has two parameters: an edge-parameter p, which determines how strongly the states of different vertices depend on each other, and a colouring parameter r, which is the probability of colouring a given vertex red. For each value of p, there exists a critical colouring value R such that there is almost surely no infinite red cluster for all r infinite red cluster exists with positive probability for all r>R. In this talk, I will discuss some new results, obtained jointly with Vincent Beffara and Vincent Tassion, concerning different properties, such as (non-)continuity and (non-)monotonicity, of the critical colouring value as a function of the edge-parameter, as well as both deterministic and probabilistic bounds on the critical colouring value.

**26/5, Vitali Wachtel, Mathematical Institute, LMU, München: Random walks in Weyl chambers**

We construct $k$-dimensional random walks conditioned to stay in a Weyl chamber at all times. The chief difficulty is to find a harmonic function for a random walk. It turns out that one needs different approaches under different moment assumptions on unconditioned random walks. We prove also limit theorems for random walks confined to a Weyl chamber.

**31/5, Jenny Jonasson: Discussion seminar - Can we use extreme value theory to analyse data from naturalistic driving studies**

The idea behind naturalistic driving studies is that ordinary people drives cars that are equipped with a number of measuring devices such as cameras both on the road and on the driver, radars, accelerometers, etc. The main question concerns accident prevention. Although the amount of data is enormous there are still not many accidents in the data sets and therefore near-accidents are also extracted from the data. Our task is to decide if accidents and near-accidents are similar in some sense. Near-accidents and accidents can be thought of as extreme events and hence we use extreme value theory.

**7/6, Chris Glasbey, Biomathematics & Statistics Scotland: Dynamic programming versus graph cut algorithms for fitting non-parametric models to image data**

Image restoration, segmentation and template matching are generic problems in image processing that can often be formulated as non-parametric model fitting: maximising a penalised likelihood or Bayesian posterior probability for an I-dimensional array of B-dimensional vectors. The global optimum can be found by dynamic programming provided I=1, with no restrictions on B, whereas graph cut algorithms require B=1 and a convex smoothness penalty, but place no restrictions on I. I compare conditions and results for the two algorithms, using restoration of a synthetic aperture radar (SAR) image for illustration.

**16/6, Stefan Hoberg and Malin Persson: Optimal design for pharmacokinetic trials**(Master thesis presentation)

When performing a pharmacokinetic study one measures the concentration of the drug several times. When, and how many times to do this, is not always easy to determine. Using optimal design theory, this thesis will show a method to find an optimal number of measurements and also the times to conduct them. The robustness of this design will be investigated by shifting the design points to determine if that will have a big effect on the estimations of the parameter values. For the model used in this thesis a design with three different design points was the optimal one. The second and third time points proved to be unaffected by most shifts on the times. If the first design point was moved close to or past the time when the concentration is at its maximum, problems appeared. This resulted in difficulties obtaining estimates for the parameters, and the ones acquired proved to be unreliable.

**1/9, Ilya Molchanov, University of Bern, Switzerland: Partially identified models and random sets**

A statistical model is partially identified if it does not make possible to come up with a unique estimate of the unknown parameter, even if the sample size grows to infinity. The talk presents several examples of such models related to interval regression, statistical analysis of games and treatment response and explains how tools from the theory of random sets can be used to provide a unified solution to all these problems.

**4/10, Mari Myllymäki, Aalto University, Finland: Testing of mark independence for marked point patterns**

The talk discusses the testing of independence of marks for marked point patterns. Many researchers use for this purpose the popular envelope test. However, this may lead to unreasonably high type I error probabilities, because in this test spatial correlations are inspected for a range of distances simultaneously. Alternatively, the deviation test can be used, but it says only little about the reason of rejection of the null hypothesis. In this talk, it is demonstrated how the envelope test can be refined so that it becomes both a valuable tool for statistical inference and for understanding the reasons of possible rejections of the independence hypothesis. This is joint work with Pavel Grabarnik and Dietrich Stoyan.

**6/10, Maria Deijfen, Stockholm University: Stable bigamy on the line**

Consider a vertex set that consists of the points of a Poisson process on R^d. How should one go about to obtain a translation invariant random graph with a prescribed degree distribution on this vertex set? When does the resulting graph percolate? One natural way of constructing the graph is based on the Gale-Shapley stable marriage, and the question of percolation has then turned out to be surprisingly difficult to answer. I will describe some existing results and a number of open problems, with focus on the case d=1 and constant degree 2. (Joint work with Olle Häggström, Alexander Holroyd and Yuval Peres.)

**20/10, Martin S Ridout, University of Kent, UK: Numerical Laplace transform inversion for statisticians**

We review some methods of inverting Laplace transforms numerically, focusing on methods that can be implemented effectively in statistical packages such as R. We argue that these algorithms are sufficiently fast and reliable to be used within iterative statistical inference procedures. Illustrative examples cover calculation of tail probabilities, random number generation and non-Gaussian AR(1) models.

**27/10, Jorge Mateu, Department of Mathematics, University Jaume I, Castellon, Spain: Functional spatial statistics with a focus on geostatistics and point processes**

Observing complete functions as a result of random experiments is nowadays possible by the development of real-time measurement instruments and data storage resources. Functional data analysis deals with the statistical description and modeling of samples of random functions. Functional versions for a wide range of statistical tools have been recently developed. Here we are interested in the case of functional data presenting spatial dependence, and the problem is handled from the geostatistical and point process contexts. Functional kriging prediction and clustering are developed. Additionally, we propose functional global and local marked second-order characteristics.

**26/10, Erik Mellegård: Obtaining Origin/Destination-matrices from cellular network data**

"Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day" says Jeff Jonas, chief scientist at IBM. A lot of this data are passing through the mobile operators systems and are collected for billing and networking purposes. This data could be used to obtain valuable information about people's movements, something that is not being done today. The main reason for this is that the operators are afraid of what would happen if someone would mistreat this data and used if to track people. This thesis presents a method for ﬁnding Origin/ Destination-matrices from the mobile network data in a way that keeps the individuals' privacy. Since the operators are reluctant to let us used any real data, the method has been applied to synthetic data and some call data records. The results of this thesis shows that it is feasible to obtain Origin/Destination-matrices from mobile network data.

**10/11, Adam Andersson, Malliavin's differential calculus for random variables**

I will present a simplified version of what is called Malliavin calculus. In probability theory, random variables are commonly defined on an abstract probability space, with minimal assumptions on the space. Here we choose the topology of the probability space to be the n-dimensional Euclidean space equipped with its Borel sigma-field and a Gaussian measure. Defining a smooth class of random variables, that are differentiable in the underlying chance parameter of the probability space, we develop a differential calculus. With some effort, this is extended to somehow less smooth random variables. As an application I will discuss the existence of densities for random vectors, by looking att properties of the so called Malliavin matrix. The nice thing with this simplified setting is that it makes the differential calculus very clear. Moreover the proofs of some key results are identical to those in the case of an abstract probability space. While all the material is basic and known, the presentation of the subject in this simple form, is hardly found in the literature. The talk is a polished copy of the PhD seminar I gave in the spring.

**17/11, Jeffrey Steif, The behaviour of the lower tail of the distribution of a supercritical branching process at a fixed large time**

We discuss the above and in addition how a supercritical branching process behaves when it survives to a large fixed time but has much smaller size than expected. This is certainly all well known (by some) but there is a nice picture. I wanted to understand this myself since it serves as a 'toy model' for how the spectrum for critical percolation behaves; however, I won't discuss this latter thing.

**1/12, David Belius, ETH, Zürich, Switzerland: Fluctuations of certain cover times**

It is expected that the fluctuations of the cover times of several families of graphs converge to the Gumbel extreme value distribution. However this has been proven in only a few cases and remains open for e.g. the discrete torus in dimensions three and higher. In my talk I will present a recent result that proves Gumbel fluctuations in a different but closely related setting (namely the discrete cylinder), using the theory of random interlacements as a tool.