Participants

David R. Cox, Nuffield College, Oxford, UK, david.cox@nuffield.ox.ac.uk

Matthias Drton, University of Chicago, USA, drton@galton.uchicago.edu

Sung-Ho Kim, Korea Advanced Institute of Science and Technology, Daejeon, Korea, sh12kim@gmail.com

Giovanni Marchetti, University of Florence, Italy, giovanni.marchetti@ds.unifi.it

Peter McCullagh, University of Chicago, USA, pmcc@galton.uchicago.edu

Zongming Ma, Stanford University, USA, zongming@stanford.edu

Helen Massam, York University, Canada, massamh@mathstat.yorku.ca

Juni Palmgren, Stockholm University, Sweden, juni@math.su.se

Bala Rajaratnam, Stanford University, USA, brajarat@stanford.edu

Nancy Reid, University Of Toronto, Canada, reid@utstat.toronto.edu

Kayvan Sadeghi, Chalmers/University of Gothenburg, Sweden, kayvan_s@yahoo.com

Rolf Sundberg, Stockholm University, Sweden, rolfs@math.su.se

Elena Stanghellini, University of Perugia, Italy, stanghel@stat.unipg.it

Ivonne Solis-Trapala, University of Lancaster, UK, i.solis-trapala@lancaster.ac.uk

Nanny Wermuth, Chalmers/University of Gothenburg, Sweden, wermuth@math.chalmers.se

Michael Wiedenbeck, ZUMA Mannheim, Germany, michael.wiedenbeck@gesis.org

Xianchao Xie, Harvard University, USA, xie1981@gmail.com

Abstracts

**David R. Cox**, Nuffield College, Oxford, UK

Derived variables and graphical models

Derived variables are deterministic functions of two or more directly measured variables. An example is body mass index, body weight divided by the square of heights. The objective in using derived variables is to simplify and enhance interpretation. In graphical terms the objective is, starting from a graph in which nodes correspond to measured variables, to develop a new graph in which the nodes correspond to derived variables. Four ways of doing this outlined, starting taking a chain block representation as an appropriate start. In the first the objective is to combine variables in two different blocks in effect to induce a regression adjustment. Body mass index used in a study of obesity is an example. A second type of analysis is internal, that is deals with a single block of variables which may be represented in an undirected graph. The objective is to produce a single new variable which captures the most important features of the data, in a sense to be defined. A third possibility is to require the derived variables to have some simple specifies structure such as that of a seemingly unrelated regression scheme. Finally the relevance of the previous method for multiple time series analysis is set out. In all the detailed discussion linear transformations of continuous variables Extension to binary variables is briefly considered.

**Matthias Drton**, University of Chicago, USA

Discrete chain graph models

The statistical literature discusses different types of Markov properties for chain graphs that lead to four possible classes of chain graph Markov models. The different models are rather well-understood when the observations are continuous and multivariate normal, and it is also known that one model class, referred to as models of LWF (Lauritzen-Wermuth-Frydenberg) or block concentration type, yields discrete models for categorical data that are smooth. We study the structural properties of the discrete models based on the three alternative Markov properties. It is shown by example that two of the alternative Markov properties can lead to non-smooth models. The remaining model class, which can be viewed as a discrete version of multivariate regressions, is proven to comprise only smooth models. The proof employs a simple change of coordinates that also reveals that the model's likelihood function is unimodal if the chain components of the graph are complete sets.

**Sung-Ho Kim**, Korea Advanced Institute of Science and Technology, Daejeon, Korea

A note on multivariate autoregressive modelling based on marginal models

In analysing fMRI(functional magnetic resonance imaging) data for modeling effective connectivity of the brain, we use the structural equation model(SEM) as one of the most proper models. The fMRI data are given in the form of multivariate time series. Due to some mechanical or other external constraints, the time series are of limited length. This is a main source of restriction upon the size and complexity of the SEM model. As a way of coping with this restriction, we considered marginal models of a multivariate time series model and found that the marginal models are helpful in searching for the graphical structure of the SEM model. A simulation experiment using a 10-dimensional vector-autoregressive model of order 3 (VAR(3)) strongly supports the usefulness of the marginal model approach.

**Giovanni Marchetti**, University of Florence, Italy

Chain graph models for discrete data

Joint-response chain graph models are a flexible family of models representing dependencies in systems in which the variables are grouped in blocks as responses, intermediate responses and purely explanatory factors. Two basic chain graph models for discrete data are compared on the basis of their Markov properties, explaining the interpretation of missing edges and associated conditional independencies. Then the parameterization of the discrete chain graph models with the multivariate regression interpretation is discussed with several examples, in the framework of the complete hierarchical parameterization by Bergsma and Rudas.

**Peter McCullagh**, University of Chicago, USA

Random effects, selection bias and logistic models

The subject of this talk is correlation between responses on distinct units induced by common random effects. A Cox model is used to generate (X,Y) pairs sequentially in time. A quota sample is one in which an x-configuration is pre-specified, and pairs are selected to satisfy the quota. It is shown that the conditional distribution of Y given X for a sequential sample is not the same as the distribution for a quota sample unless certain rather stringent conditions are satisfied. The implications for applied work are discussed.

**Zongming Ma**, Harvard University, USA

Chain graph models selection

Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this talk, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse.

**Helen Massam**, York University, Canada

A conjugate prior for hierarchical log-linear models

Bayesian analysis of multi-way contingency tables, the selection of a prior distribution for either the log-linear parameters or the cell probabilities parameters is a major challenge. In this talk we define a flexible family of conjugate priors for the wide class of discrete hierarchical log-linear models which includes the class of graphical models. These priors are defined as the Diaconis-Ylvisaker conjugate priors on the log-linear parameters subject to 'baseline constraints' under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical log-linear models for a six-way contingency table.

**Juni Palmgren**, Stockholm University, Sweden

Models for family data in genetic epidemiology

Steps in the genetic epidemiologic research process are reviewed: (i) segregation analysis and familial aggregation, (ii) linkage and association analysis and (iii) characterization of how genetic and environmental factors contribute to the burden of disease in the population. Under (i) twin models for complex phenotypes are discussed, including a correlated frailty model for an illness-death process and a random change point model for repeated measures. Under (ii) a parametric GLMM for binary phenotypes is introduced for testing association in the presence of linkage. The fixed effects term defines association, while the random effects term captures correlation due to linkage using inheritance vectors. Spurious association due to population structure is avoided by conditioning on parental genotypes. The test is shown to be more powerful than FBAT in simulations.

**Bala Rajaratnam**, Stanford University, USA

Marginal likelihood inference for the eigenvalues of covariance matrices

Inference about eigenvalues of covariance matrices is a fundamental topic having widespread applications. We specifically consider marginal likelihood inference for a pxp positive-definite covariance matrix ? based on a sample of size n from the Np(0,?) distribution. The eigenstructure of S, the usual (MLE) estimator of ?, tends to be systematically distorted unless p/n is quite small. Much recent work in the area of multivariate analysis has focused on improved estimation of ? from both frequentist and Bayesian perspectives. A Laplace approximation to the joint density of the sample eigenvalues was developed by Anderson and Muirhead. Although the Laplace approximation cannot be used directly for marginal likelihood inference, Muirhead (1982) derived from it an estimator having bias reduction properties. In the talk, we investigate Muirhead?s estimator and demonstrate that it maximizes an adjusted profile likelihood. Furthermore, by using additional computational techniques such as importance sampling, we correct the Laplace approximation to make it appropriate for maximum likelihood estimation. The optimality of the estimators based on the approximate marginal likelihood and their risk properties are studied. For higher dimensions, shrinkage properties of the marginal likelihood approach yields estimators that perform better than the standard MLE estimates. Implications for hypothesis testing within the marginal likelihood framework will also be discussed. Finally, using recent advances in special function theory, we are able to perform exact marginal inference for eigenvalues in situations where p is small.

**Nancy Reid**, University Of Toronto, Canada

Approximate likelihoods

Composite likelihoods are constructed by multiplying together the joint density of low-dimensional marginal or conditional distributions, in cases where the full joint density may be difficult to calculate. Inference based on composite likelihoods seems to be efficient in many practical examples, especially clustered longitudinal data. Approximate exponential family models are constructed by combining the log-likelihood function with a particular type of sample-space derivative of the log-likelihood. These models can be used for obtaining higher order approximations for likelihood based inference. This talk surveys some of the literature on composite likelihood and highlights some open questions, and describes the construction of the approximate exponential family and its use in higher order asymptotics.

**Kayvan Sadeghi**, Chalmers/University of Gothenburg, Sweden

Independence preserving graphs

Every joint density generated over a given parent graph satisfies a set of conditional independent statements, called its independence structure, which could be derived from a few known criteria. We introduce a graph theoretical approach to redefine two edge minimal graphs with different types of edge, called MC and summary graphs, and an appropriate criterion to preserve the independence structure after marginalizing over and conditioning on disjoint subsets of the node set of the graph. We also derive algorithms to generate MC and summary graphs from the given parent or other MC and summary graphs.

**Rolf Sundberg**, Stockholm University, Sweden

Multimodality of likelihoods

It was found by Drton&Richardson (2004) that the seemingly unrelated regressions (SUR) model can have a multimodal likelihood with maxima of the same magnitude. Here this phenomenon is further studied and a result is shown for curved exponential families that when such multiple maxima appear, the likelihood deviance is bounded below by a quantity depending on the curvature of the model and being proportional to the amount of information (e.g. sample size). This relates the phenomenon to model lack of fit.

**Elena Stanghellini**, University of Perugia, Italy

The effect of selection in graphical Gaussian models

We consider the effects induced by truncation on the concentration matrix and, therefore, on graphical Gaussian models. Truncation mechanisms on some of the variables, which may or may not be observed, have an important role in the construction of selection type models. Distortion induced by selection arises when the response variables are missing in a non random fashion or the random allocation of units to a treatment is destroyed by posttreatment bias, such as non-compliance with the assignment. The derivations through graphical models allow (a) to recover existing results in a unified framework and (b) to provide some insights on the distortion induced in linear regression coefficients by the selection mechanism. In some cases, this bias can also be consistently estimated from the observed data. We further focus on models for binary response models with selection. These models may arise when (a) the whole population can be thought of as generated by a selectivity condition or (b) only the response variable is missing in a selective way. For this class of models we detail the issue of inference based on ML, with particular reference to the EM algorithm.

**Ivonne Solis-Trapala**, University of Lancaster, UK

Graphical models with latent variables and their application in neuropsychology

In this talk we propose a strategy of statistical inference for graphical models with latent Gaussian variables, and observed variables that follow non-standard sampling distributions. We restrict our attention to those graphs in which the latent variables have a substantive interpretation. In addition, we adopt the assumption that the distribution of the observed variables may be meaningfully interpreted as arising after marginalising over the latent variables. We provide two examples of longitudinal studies that investigate developmental changes in cognitive functions of young children in one case and of cognitive decline of Alzheimer s patients in the other. These studies involve the assessment of competing causal models for several psychological constructs; and the observed measurements are gathered from the administration of batteries of tasks subject to complicated sampling protocols. Finally, we extend our strategy to the analysis of EEG (electroencephalogram) brain responses. As a motivating example, we discuss a longitudinal study that investigates interrelations between cortical regions that are thought to underlie the neural mechanisms of resistance to peer influence during adolescence.

**Nanny Wermuth**, Chalmers/University of Gothenburg, Sweden

Some properties of multivariate regression chains

Recursive sequences of generalized multivariate regressions are used in cohort studies, for multi-wave panel data, in controlled clinical trials with sequentially administered treatments, but also for cross-sectional and even retrospective studies. Their independence structure is captured by multivariate recursive regression chain graphs which are in turn a subclass of summary graphs. Operators for matrix representations of graphs are useful to study properties of summary graphs and of corresponding densities. Summary graphs preserve their form after marginalising and conditioning. They capture precisely the independence structure implied by a starting directed acyclic graph or by a multivariate regression chain graph for the variables that remain after marginalizing and conditioning.They lead to interpretations of types of path and they may separate generating dependencies from distorting effects.

**Michael Wiedenbeck**, ZUMA Mannheim, Germany

Partial inversion, partial mapping and Moebius inversion

Partial inversion is introduced as an operator acting upon matrices as parametrizations of linear mappings. Its applicability to matrices with non-singular principal minors is discussed, as well as its group-theoretic properties. Further, it is shown that partially inverted matrices can be decomposed into a product of two factors which can be defined as partial mappings or as partial replications. This decomposition is used for representing any partially invertible matrix as a product of a lower block-triangular, a block-diagonal, and an upper block-triangular matrix, whose submatrices coincide with submatrices of partially inverted matrices. Finally it is shown that Moebius inversion can be interpreted via matrix inversion: a well-known recursive formula to compute the Moebius function coincides essentially with partial inversion applied repeatedly to a special upper triangular matrix.

**Xianchao Xie**, Harvard University, USA

Collapsibility of directed acyclic graphs

Collapsibility means that certain aspects of a model are preserved after marginalization over some variables. In this talk, we present conditions for three kinds of collapsibility for directed acyclic graphs (DAGs): estimate collapsibility, conditional independence collapsibility and model collapsibility. We show that unlike in graphical log-linear models or hierarchical log-linear models, the estimate collapsibility and the model collapsibility are not equivalent in DAG models. We discuss the relationship among them and illustrate how the results obtained can be applied in simplifying the inference problems in DAGs. Algorithms are also given to find a minimum variable set containing a subset of variables of interest onto which a DAG model is collapsible.