2015


Abstracts, see below.

27/1, ​Maud Thomas, Laboratoire de Probabilités et Modèles Aléatoires, Université Paris Diderot, ​Tail index estimation, concentration and adaptivity
29/1, Trifon Missov, Max Planck Institute for Demography, Stochastic Models in Mortality Research: Recent Advancements and Applications
12/2, Sergei Zuyev, Chalmers, Optimal sampling of stochastic processes via measure optimisation technique
19/2, Alexey Lindo and Serik Sagitov, Chalmers, A special family of Galton-Watson processes with explosions
26/2, Jean-Baptiste Gouéré, Université d'Orléans, Continuum percolation on R^d
12/3, ​Daniel Simpson, Norwegian University of Science and Technology, With low power comes great responsibility: challenges in modern spatial data analysis
19/3, Emmanuel Schertzer, Sorbonne, France, ​The contour process of Crump-Mode-Jagers trees
26/3, Ildar Ibragimov, Steklov Mathematical Institute, St. Petersburg, Around Skitovich-Darmois and Gourier-Olkin-Zinger theorems
23/4, David Dereudre, Université Lille-1, Consistency of likelihood estimation for Gibbs point processes
7/5, Olle Häggström, ​The current debate on p-values and null hypothesis significance testing
21/5, Jürgen Potthoff, Mannheim Universität, Germany, Sample Properties of Random Fields
28/5, Ingemar Kaj, Uppsala universitet, The Poisson random field site frequency spectrum
4/6, Igor Rychlik, ​How far is it to a coast ? (An application of WAFO.)
11/6, Fima Klebaner, Limit theorems for age distribution in populations with high carrying capacity
23/6, Carmen Minuesa, University of Extremadura, Robust estimation for Controlled Branching Processes
8/9, K. Borovkov, The University of Melbourne, On the asymptotic behaviour of a dynamic version of the Neyman contagious point process
24/9, ​Laurent Decreusefond, ENST, France, Distances between point processes
1/10, ​Tailen Hsing, ​Analysing Spatial Data Locally
8/10, Evsey Morozov, Inst. Applied Math. Research, Russia, ​Stability analysis of regenerative queues: some  recent results
15/10 Johan Lindström, Lund University, Seasonally Non-stationary Smoothing Splines: Post-processing of Satellite data
20/10 Janine Illian, University of St Andrew, Spatial point processes in the modern world – an interdisciplinary dialogue
22/10 ​Sach Mukherjee, German Center for Neurodegenerative Diseases (DZNE), ​High-dimensional statistics for personalized medicine
29/10, ​Peter Olofsson, Trinity University, ​A stochastic model of speciation through Bateson-Dobzhansky-Muller incompatibilities
5/11, Murray Pollock, University of Warwick, A New Unbiased and Scalable Monte Carlo Method for Bayesian Inference
12/11, ​Jimmy Olsson, KTH Royal institute of technology, ​Efficient particle-based online smoothing in general state-space hidden Markov models: the PaRIS algorithm
19/11, Patrik Albin, On Extreme Value Theory for Group Stationary Gaussian Processes
​26/11, ​​Youri K. Belyaev, Umeå University, ​The Hybrid Moments-Distance method for clustering observations with a mixture of two symmetrical distributions
10/12, Anna-Kaisa Ylitalo, University of Jyväskylä, ​Eye movements during music reading - A generalized estimating equation approach

 


 
29/1, Trifon Missov, Max Planck Institute for Demography, Stochastic Models in Mortality Research: Recent Advancements and Applications
Abstract: Stochastic models in mortality research aim to capture observed mortality dynamics over multiple time dimensions (ages, periods, and cohorts), on the one hand, and relate longevity to biological processes, on the other hand. Frailty models provide the basic mathematical tool for studying mortality curves and surfaces. This talk focuses on recent developments in fixed-frailty and changing-frailty (dynamic-frailty) models reflecting observed mortality phenomena: the mortality plateau, the persistent decline in age-specific mortality rates ("lifesaving"), etc.
 
12/1, Sergei Zuyev, Optimal sampling of stochastic processes via measure optimisation technique
Abstract. Let W(t) be a continuous non-stationary stochastic process on [0,1] which can be observed at times T=(t_0 < t_1 < ... t_n) giving rise to a random vector W=(W(t_1),...,W(t_n)). The question we address is how to choose the sampling times T in such a way that the linear spline constructed through the points (T,W) deviates as little as possible from the trajectory (W(t), t in [0,1])? Namely, the average L_2 distance between the paths is minimised. The answer depends on the smoothness coefficient a(t), meaning that the average increment E|W(t+s)-W(t)| behaves like |s|^a(t) for small s. The local variant of the problem for a monotone a(t) was addressed previously by Hashorva, Lifshits and Seleznjev. By using the variation technique on measures we are able to extend the known results and potentially to attack multi-dimensional case of optimal sampling of random fields.
 
19/2, Alexey Lindo and Serik Sagitov, A special family of Galton-Watson processes with explosions
Abstract: The linear-fractional Galton-Watson processes is a well known case when many characteristics of a branching process can be computed explicitly. In this paper we extend the two-parameter linear-fractional family to a much richer four-parameter family of reproduction laws. The corresponding Galton-Watson processes also allow for explicit calculations, now with possibility for infinite mean, or even infinite number of offspring. We study the properties of this special family of branching processes, and show, in particular, that in some explosive cases the time to explosion can be approximated by the Gumbel distribution.
 
26/2, Jean-Baptiste Gouéré, Université d'Orléans, Continuum percolation on R^d
Abstract:
We consider the Boolean model on R^d. This is the union of i.i.d. random Euclidean balls centered at points of an homogeneous Poisson point process on R^d. Choose the intensity of the Poisson point process so that the Boolean model is critical for percolation. In other words, if we lower the intensity then all the connected components of the Boolean model are bounded, while if we increase the intensity then there exists one unbounded component. We are interested in the volumetric proportion of R^d which is covered by this critical Boolean model. This critical volumetric proportion is a function of the dimension d and of the common distribution of the radii. We aim to study this function.
 
12/3, Daniel Simpson, Norwegian University of Science and Technology: With low power comes great responsibility: challenges in modern spatial data analysis
Abstract:
Like other fields in statistics, spatial data analysis has undergone its own "big data" revolution.  Over the last decade, this has resulted in new approximate algorithms and new approximate models being used to fit ever more complicated data. There is a particular role in this revolution for model-based statistics and, in particular, Bayesian analysis. 

The trouble is that as both the data and the models expand, we can end up with complex, unidentifiable, hierarchical, unobserved nightmares.  Hence we are starting to seriously ask the question "What can we responsibly say about this data?".

In this talk, I will go nowhere near answering this fundamental question, but I will provide a clutch of partial answers to simpler problems. In particular, I will outline the trade-offs that need to be considered when building approximate spatial models; the incorporation of weak expert knowledge into priors on the hyper-parameters of spatial models; the dangers of flexible non-stationarity; and the role of prior choice in interpreting posteriors.

This is joint work with Geir-Arne Fuglstad, Sigrunn Sørbye, Janine Illian,  Finn Lindgren, and Håvard Rue.
 
19/3, Emmanuel Schertzer, Sorbonne, France, The contour process of Crump-Mode-Jagers trees
Abstract: The genealogy of a (planar) Galton-Watson branching process is encoded by its contour path, which is obtained by recording the height of an exploration particle running along the edges of the tree from left to right.
Crump-Mode-Jagers (CMJ) branching processes are a generalization of Galton-Watson trees, for which generations can overlap. In general, the contour process of such trees is difficult to characterize. However, we will show that under certain assumptions, it is obtained by a simple transformation of the contour process of the underlying genealogical structure. This work sheds some new light on previous results obtained by Sagitov on the large time behaviour of CMJ branching processes. This is joint work with Florian Simatos.
 
23/4, David Dereudre, Université Lille-1, Consistency of likelihood estimation for Gibbs point processes
Abstract: We prove the strong consistency of the maximum likelihood estimator (MLE) for parametric Gibbs point process models. The setting is very general and includes pairwise pair potentials, finite and infinite multibody interactions and geometrical interactions, where the range can be finite or infinite. Moreover the Gibbs interaction may depend linearly or non-linearly on the parameters, a particular case being hardcore parameters and interaction range parameters. As important examples, we deduce the consistency of the MLE for all parameters of the Strauss model, the hardcore Strauss model, the Lennard-Jones model and the area-interaction model.
 
7/5, Olle Häggström, The current debate on p-values and null hypothesis significance testing
Abstract: The use of p-values and null hypothesis significance testing has been under attack in recent years from practitioners of statistics in various disciplines. One highlight is the publication in 2008 of "The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives" by Stephen Ziliak and Deirdre McCloskey. Another is the ban of p-values from the journal Basic and Applied Social Psychology that its editors announced in February 2015. I will review (parts of) this debate, and stress how important I think it is that we, as statisticians, take part in it.
 
21/5, Jürgen Potthoff, University of Mannheim, Sample Properties of Random Fields
Abstract: A rather general version of the celebrated Kolmogorov–Chentsov–theorem is presented, which provides sufficient criteria for the existence of a (Hölder) continuous modification of a random field, which is indexed by a metric space admitting certain separability properties. For random fields on an open subset of the d–dimensional euclidean space sufficient criteria are presented which guarantee the existence of a sample differentiable modification. If time permits, results concerning the existence of separable and/or measurable modifications are mentioned.
 
28/5, Ingemar Kaj, Uppsala universitet, The Poisson random field site frequency spectrum
Abstract: We study a class of Poisson random measures with intensity measure given by the law of the Wright-Fisher diffusion process, which arises as a limiting model for genetic divergence between species.  Each species is a population of gene sequences subject to mutational change under neutral or selective evolution.  Using the duality relation between Wright-Fisher diffusions and Kingman's coalescent process we derive the non-equilibrium site frequency spectrum. Applications include certain genomic measures used to assess sequence divergence during speciation (such as F_{ST} and dN/dS).
 
11/6, Fima Klebaner, Monash University, Limit theorems for age distribution in populations with high carrying capacity
Abstract: We prove fluid and central limit approximations for measure valued ages under smooth demographic assumptions.

Joint work with Fan, Hamza (Monash) and Jagers (Chalmers).
 
23/6, Carmen Minuesa, University of Extremadura, Robust estimation for Controlled Branching Processes
Abstract: Controlled branching processes are appropriate probabilistic models for the  description of population dynamics in which the number of individuals with reproductive capacity in each generation is controlled by a random mechanism. The probabilistic theory of these processes has been extensively developed, being an important issue to examine the inferential problems arising from them.
The aim of this work is to consider the estimation of the underlying offspring parameters via disparities, assuming that the offspring distribution belongs to a general parametric family.
From a frequentist viewpoint, we obtain the minimum disparity estimators under three possible samples: given the entire family tree up to a certain generation, given the total number of individuals and progenitors in each generation, and given only the population sizes and we examine their asymptotic and robustness properties.
From a Bayesian outlook, we develop an analogous procedure which provides robust Bayesian estimators of the offspring parameter through the use of disparities. The method consists of replacing the log likelihood with an appropriately scaled disparity in the expression of the posterior distribution. For the estimators associated to the resulting distribution, we study their asymptotic properties.
Finally, we illustrate the accuracy of the proposed methods by the way of simulated examples developed with the statistical software R.
 
8/9, K. Borovkov, The University of Melbourne, On the asymptotic behaviour of a dynamic version of the Neyman contagious point process
We consider a dynamic version of the Neyman contagious point process that can be used for modelling the spatial dynamics of biological populations, including species invasion scenarios. Starting with an arbitrary finite initial configuration of points in R^d with nonnegative weights, at each time step a point is chosen at random from the process according to the distribution with probabilities proportional to the points' weights. Then a finite random number of new points is added to the process, each displaced from the location of the chosen "mother" point by a random vector and assigned a random weight. Under broad conditions on the sequences of the numbers of newly added points, their weights and displacement vectors (which include a random environments setup), we derive the asymptotic behaviour of the locations of the points added to the process at time step n and also that of the scaled mean measure of the point process after time step n-->oo.
 
24/9, ​Laurent Decreusefond, ENST, France, ​Distances between point processes
Abstract: Point processes can be mathematically viewed both as a set of points or as a combination of Dirac measures. Depending on the point of view, the distance between two realizations of some point processes can be naturally defined of different manners. This induces different distances between distributions of random point processes. We show on several examples how these distances can be defined and estimated.
 
1/10, Tailen Hsing, Analysing Spatial Data Locally
Abstract: Stationarity is a common assumption in spatial statistics. The justification is often that stationarity is a reasonable approximation if data are collected "locally." In this talk we first review various known approaches for modeling nonstationary spatial data. We then examine the notion of local stationarity in more detail. In particular, we will consider a nonstationary model whose covariance behaves like the Matern covariance locally and an inference approach for that model based on gridded data.
 
8/10, Evsey Morozov, Inst. Applied Math. Research, Russia, Stability analysis of regenerative queues: some  recent results 
Abstract: We consider a general approach to stability (positive recurrence) of the regenerative queueing systems, which is based on an asymptotic property of  the embedded  renewal process  of regenerations. The renewal   process obeys a useful characterization of the limiting remaining regeneration time allowing, for a wide class of queues, to establish minimal stability  conditions by  the following  two-step procedure. At the first step, a negative drift  condition is used to prove that  the basic process does not go to infinity in probability. Then, at the second step,  using a regeneration condition, we show that, starting within a compact set, the process regenerates  in a finite time with a positive probability. It implies the finiteness of the mean regeneration period  (positive recurrence). This approach is effective beyond the class of  Markovian models.  
To illustrate the approach, we present  some  recent   results related, in particular,  to  retrial systems, state-dependent systems, cascade  systems.
 
15/10, Johan Lindström, Lund University: Seasonally Non-stationary Smoothing Splines: Post-processing of Satellite data
Abstract: Post-processing of satellite remote sensing data is often done to reduce noise and remove artefacts due to atmospheric (and other) disturbances. Here we focus specifically on satellite derived vegetation indices which are used for large scale monitoring of vegetation cover, plant health, and plant phenology. These indices often exhibit strong seasonal patterns, where rapid changes during spring and fall contrast to relatively stable behaviour during the summer and winter season. The goal of the post-processing is to extract smooth seasonal curves that describe how the vegetation varies during the year. This is however complicated by missing data and observations with large biases.
Here a method for post-processing of satellite based time-series is presented. The method combines seasonally non-stationary smoothing spline with observational errors that are modelled using a normal-variance mixture. The seasonal non-stationarity allows us to capture the different behaviour during the year, and the error structure accounts for the biased and heavy tailed errors induced by atmospheric disturbances. The model is formulated using Gaussian Markov processes and fitted using MCMC.
 
20/10, Janine B Illian, University of St Andrews and NTNU Trondheim: Spatial point processes in the modern world – an interdisciplinary dialogue
Abstract: In the past, complex statistical methods beyond those covered in standard statistics textbooks would be developed as well as applied by a statistician. Nowadays, freely available, sophisticated software packages such as R are in common use and at the same time increasing amounts of data are collected. As a result, users have both, a stronger need for analysing these data themselves as well as an increasing awareness of the existence of the advanced methodology since it is no longer “hidden” from them in inaccessible statistical journals. As a result, statisticians make their methodology usable 
In this talk, we argue that is necessary to make methods usable and for this to be successful there needs to be a strong interaction with the user community through interdisciplinary work. This implies not only making model fitting feasible by developing computationally efficient methodology to reduce running times but also to improve the practicality of other aspects of the statistical analysis such as model construction, prior choice and interpretation as these equally relevant for users with real data sets and real scientific questions. We discuss the importance of an intense interdisciplinary dialogue for statistics to become relevant in the real world by illustrating it through discussing past and current examples of this ongoing dialogue in the context of spatial point processes and their application – mainly in the context of ecological research. 
 
22/10, Sach Mukherjee, German Center for Neurodegenerative Diseases (DZNE): High-dimensional statistics for personalized medicine
Abstract: Human diseases show considerable heterogeneity at the molecular level. Such heterogeneity is central to personalized medicine efforts that seek to exploit molecular data to better understand disease biology and inform clinical decision making. An emerging notion is that diseases and disease subgroups may differ with respect to patterns of molecular interplay. I will discuss our ongoing efforts to develop statistical methods to investigate such heterogeneity with an emphasis on high-dimensional and causal aspects.
 
 
 
29/10: Peter Olofsson, Trinity University: A stochastic model of speciation through Bateson-Dobzhansky-Muller incompatibilities
Abstract: Speciation is characterized by the development of reproductive isolating barriers between diverging groups. Intrinsic post-zygotic barriers of the type envisioned by Bateson, Dobzhansky, and Muller are deleterious interactions among loci that reduce hybrid fitness, leading to reproductive isolation. The first stochastic model of the development of these barriers was published by Orr in 1995. We generalize Orr's model by incorporating finite protein–protein interaction networks and by allowing for different fixation rates at different loci. Formulas for the speciation probability and the expected time until speciation are established. 
 
5/11, Murray Pollock, University of Warwick: A New Unbiased and Scalable Monte Carlo Method for Bayesian Inference
Abstract: This talk will introduce novel methodology for exploring posterior distributions by modifying methodology for exactly (without error) simulating diffusion sample paths – the Scalable Langevin Exact Algorithm (ScaLE). This new method has remarkably good scalability properties (among other interesting properties) as the size of the data set increases (it has sub-linear cost, and potentially no cost), and therefore is a natural candidate for “Big Data” inference. Joint work with Paul Fearnhead (Lancaster), Adam Johansen (Warwick) and Gareth Roberts (Warwick).
 
12/11, Jimmy Olsson, KTH, Efficient particle-based online smoothing in general state-space hidden Markov models: the PaRIS algorithm
Abstract:
This talk discusses a novel algorithm, the particle-based, rapid incremental smoother (PaRIS), for efficient online approximation of smoothed expectations of additive state functionals in general hidden Markov models. The algorithm, which has a linear computational complexity under weak assumptions and very limited memory requirements, is furnished with a number of convergence results, including a central limit theorem. An interesting feature of PaRIS, which samples on-the-fly from the retrospective dynamics induced by the particle filter, is that it requires two or more backward draws per particle in order to cope with degeneracy of the sampled trajectories and to stay numerically stable in the long run with an asymptotic variance that grows only linearly with time.

19/1
1, Patrik Albin: On Extreme Value Theory for Group Stationary Gaussian Processes
Abstract: We study extreme value theory of right stationary Gaussian processes with parameters in open subsets with compact closure of (not necessarily Abelian) locally compact topological groups. Even when specialized to Euclidian space our result extend results on extremes of stationary Gaussian processes and fields in the literature by means of requiring weaker technical conditions as well as by means of the fact that group stationary processes need not be stationary in the usual sense (that is, with respect to addition as group operation).
 
26/11, Youri K. Belyaev, Umeå University, The Hybrid Moments-Distance method for clustering observations with a mixture of two symmetrical distributions
Joint work with D. Källberg and P. Rydén
Abstract:
Clustering cancer patients based on high-dimensional gene expression data are essential in discovering new subtypes of cancer. Here we present a novel univariate clustering approach that can be used for variable selection in high-dimensional clustering problems.

We observe gene expression data on one gene and n patients, where the jth patient has cancer of type 1 (tj =1) or type 2 (tj =2). The aim is to predict the unobservable list of types {t1,...,tn}.

Here {t1,...,tn} are values of i.i.d. random variables {T1,...,Tn} such that P[Tj =1]=w1, P[Tj =2]=w2  and  w1+ w2=1. The gene expression data {x1,…,xn} are observations of i.i.d. random variables {X1,…,Xn}, where Xj has distribution F1 if tj=1 and F2 if tj=2, j=1,…,n. We assume that F1 and F2 are symmetrical distributions parameterized by their means (m1 and m2) and variances (v1 and v2). Thereby we have a statistical model with mixture of two symmetrical distributions with five unknown parameters {w1, m1, v1, m2, v2}. Consistent estimates of all 5 parameters can be found by using the recursive EM-algorithm and the responsibilities {q1(x1),…,qn(xn)} obtained via the estimated parameters can be used to predict the patients’ cancer types {t1,...,tn}. However, the EM-algorithm is sensitive to distribution assumptions that deviates from the real distributions F1, F2 and on the starting point in the recursion.

We propose an alternative method, the hybrid moment-distance (HMD) method, where the observations {x1,…,xn} are used for estimation of the first three moments. These moment estimates are used to reduce the dimensional space of parameters from 5 to 3. The optimal parameters within the lower space are obtained by considering the distance between the empirical distribution and the fitted parametric distributions. Responsibilities {q1(x1),…,qn(xn)}, obtained via the HMD-method’s estimated parameters, are used to predict the patients’ cancer types. Note that the patient´s q-value is the estimated probability that the patient has cancer of certain type.

An extensive simulation study showed that the HMD-algorithm outperformed the EM-algorithm with respect to clustering their performance. The HMD-method was flexible and performed well also under very imprecise model assumptions, which suggest that it is robust and well suited for real problems.
10/12: Anna-Kaisa Ylitalo, University of JyväskyläEye movements during music reading - A generalized estimating equation approach
Abstract: Eye tracking has long research traditions in text reading and picture inspection, but studies on eye movements in music reading are still relatively rare. Thus there is no standardised methodology for analysing eye movements in music reading and, in fact, rather little is known about visual processing of musical notation at all. In our experiment participants read and performed simple melodies on an electric piano. Some of the melodies included melodic skips and we study how these skips affect visual processing of the melody. In addition, we are interested in the effects of tempo, participant’s expertise and placement of melodic skips. The eye movement data are analysed using generalised estimating equation (GEE) approach, which is an extension of generalised linear models (GMLs) to longitudinal data.

Published: Fri 26 Apr 2019.