Back to Seminar listing
Abstract: The elimination of onchocerciasis through community-based Mass Drug Administration (MDA) of ivermectin (Mectizan) is hampered by co-endemicity of Loa loa, as individuals who are highly co-infected with Loa loa parasites can suffer serious and occasionally fatal neurological reactions from the drug. Testing all individuals participating in MDA has some operational constraints ranging from the cost to limited availability of diagnostic tools. Therefore, there is a need for a way to establish whether an area is safe for MDA using the prevalence of loiasis derived from multiple diagnostic tools. Existing statistical methods only focus on using data from one diagnostic tool and ignore the potential information that could be derived from other datasets. In this talk, I will talk about how we address this issue by developing a joint geostatistical model that combines data from multiple Loa loa diagnostic tools. I will present how we developed the model and our method for inference. We applied this framework to Loa loa data from Gabon. We also propose a two-stage strategy to identify areas that are safe for MDA. Lastly, I will discuss how this work contributes to the global effort towards the elimination of onchocerciasis as a public health problem by potentially reducing the time and cost required to establish whether an area is safe for MDA.
The dynamics of a network is not only encoded in node variables but also in edge variables, like currents and fluxes. The graph Laplacian has been extensively used in Machine Learning and Artificial Intelligence to process node signals, however the graph Laplacian and its higher-order versions (the Hodge Laplacians) can only process separately nodes, edge signals and topological signal of higher dimensions. The Topological Dirac operator [1] allows cross-talk of the topological signals, on nodes, edges, triangles and so on, encoded in the topological spinor, and is able to jointly treat and process them. The Topological Dirac operator is currently attracting growing attention in AI and Network Science applications. In this talk we will provide an overview of the theory beyond the Topological Dirac operator and their applications in AI focusing in particular on Dirac signal processing [2] and its physics-inspired generalizations [3]. [1] Bianconi, G., 2021. The topological Dirac equation of networks and simplicial complexes. Journal of Physics: Complexity, 2(3), p.035022. [2] Calmon, L., Schaub, M.T. and Bianconi, G., 2023. Dirac signal processing of higher-order topological signals. New Journal of Physics, 25(9), p.093013. [3] Wang, R., Tian Y., Lio, P. and Bianconi, G., (in preparation).
Abstract:
Qualitative descriptions of the shape of a function (e.g., convex, monotone, bimodal) and their use in developing and evaluating methodological tools abound in functional data analysis. The shape of a function is intimately tied to its amplitude, and a common approach to access shape information in a functional dataset is through a registration or alignment procedure to decouple amplitude from phase variations, which inevitably affects any downstream analysis. A hitherto unexplored alternative is to work directly with the shape space defined as the quotient of the function space under a group of shape-preserving transformations that treats phase as nuisance. I will discuss a stratified geometry for such a shape space that has regions of non-positive and (positive) unbounded curvature and discuss its statistical implications when analysing functional data.
Abstract
Sparse grids are used in stochastic finite element approximation,numerical integration and interpolation. We construct apolynomial interpolator noting that a sparse grid is an echelondesign that identifies a single model. The interpolator usesinclusion-exclusion (IE) and Betti numbers simplifies IE whileachieving the same result as with exhaustive formulæ.
Abstract: In this talk, I will introduce novel estimators for computing the curvature, tangent spaces, and dimension of data from manifolds, using tools from diffusion geometry. Although classical Riemannian geometry is a rich source of inspiration for geometric data analysis and machine learning, it has historically been hard to implement these methods in a way that performs well statistically. Diffusion geometry lets us develop Riemannian geometry tools that are accurate and, crucially, also extremely robust to noise and low-density data. The methods we introduce here are comparable to the existing state-of-the-art on ideal dense, noise-free data, but significantly outperform them in the presence of noise or sparsity. In particular, our dimension estimate improves on the existing methods on a challenging benchmark test when even a small amount of noise is added. Our tangent space and scalar curvature estimates do not require parameter selection and substantially improve on existing techniques.
Barycentric subspace analysis (BSA) is introduced for a set of unlabelled graphs, which are graphs with no correspondence between nodes. Identifying each graph by the set of its eigenvalues, the graph spectrum space is defined as a novel and computationally efficient quotient manifold of isospectral graphs. In such a manifold, the notion of BSA is extended. We showcase how BSA can be used as a powerful dimensionality reduction technique for complex data: BSA searches for a subspace of a lower dimension, minimizing the projection of data points. As the subspace is identified by a set of reference points, the interpretation is straightforward. BSA is performed and compared with clustering and PCA on a simulated dataset and a real-world dataset of airline company networks. This is a joint work with Elodie Maignant, Alain Trouvé, and Xavier Pennec.
Seminar cancelled
The input to many problems in data analysis can be considered as a (possibly noisy) finite sample of a geometric object embedded in a high-dimensional Euclidean space (e.g., biomedical images or multichannel physiological recordings). Standard approaches to learning the topology of the underlying structure may suffer from high sensitivity to noise and outliers, and they rely heavily on the choice of the metric for the data.
Joint work with Sam Livingstone
Implied volatility (IV) forecasting is inherently challenging due to its high dimensionality across various moneyness and maturity, and nonlinearity in both spatial and temporal aspects. We utilize implied volatility surfaces (IVS) to represent comprehensive spatial dependence and model the nonlinear temporal dependencies within a series of IVS. Leveraging advanced kernel-based machine learning techniques, we introduce the functional Neural Tangent Kernel (fNTK) estimator within the Nonlinear Functional Autoregression framework, specifically tailored to capture intricate relationships within implied volatilities. We establish the connection between NTK and functional kernel regression, emphasizing its role in contemporary nonparametric statistical modeling. Empirically, we analyze S&P 500 Index options from January 2009 to December 2021, encompassing more than 6 million European calls and puts, thereby showcasing the superior forecast accuracy of fNTK. We demonstrate the significant economic value of having an accurate implied volatility forecaster within trading strategies. Notably, short delta-neutral straddle trading, supported by fNTK, achieves a Sharpe ratio ranging from 1.45 to 2.02, resulting in a relative enhancement in trading outcomes ranging from 77% to 583%.
Functional Time Series (FTS) are sequences of dependent random elements taking values on some functional space. Most of the research on this domain focuses on producing a predictor able to forecast the next function, having observed a part of the sequence. For this, the Autoregressive Hilbertian process is a suitable framework. Here, we address the problem of constructing simultaneous predictive confidence bands for a stationary FTS. The method is based on an entropy measure for stochastic processes. To construct predictive bands, we use a Reproducing Kernel Hilbert Spaces (RKHS) to represent the functions and a functional bootstrap procedure that allows us to estimate the prediction law and a Reproducing Kernel Hilbert Spaces (RKHS) to represent the functions, considering then the basis associated to the reproducing kernel. We then classify the points on the projected space according to those that belong to the minimum entropy set (MES) and those that do not. We map the minimum entropy set back to the functional space and construct a band using the regularity property of the RKHS. The proposed methodology is illustrated through artificial and real data sets.
Choice experiments are widely used in transportation, marketing, health and environmental research to measure consumer preferences. From these consumer preferences, we can calculate willingness to pay for an improved product or state, and hence make policy decisions based on these preferences. In a choice experiment, we present choice sets to the respondent sequentially. Each choice set consists of m options, each of which describes a product or state, which we generically call an item. Each item is described by a set of attributes, the features that we are interested in measuring. Respondents are asked to select the most preferred item in each choice set. We then use the multinomial logit model to determine the importance of each attribute.In some situations we may be interested in whether an item's position within the choice set affects the probability that the item is selected. This problem is reminiscent of donkey voting in elections, and can also be seen in the design of tournaments, where the home team is expected to have an advantage. In this presentation, we present a discussion of stated choice experiments, and then discuss a model that incorporates position effects for choice experiments with arbitrary m. This is an extension of the model proposed by Davidson and Beaver (1977) for m = 2. We give optimal designs for the estimation of attribute main effects plus the position effects under the null hypothesis of equal selection probabilities. We conclude with some simulations that compare how well optimal designs and near-optimal designs estimate the attribute main effects and position effects
This presentation summarises the main content of my PhD, researching intothe robustness of incomplete block designs, which introduces a VulnerabilityMeasure to determine the likelihood of a design becoming disconnected withinestimable treatment contrasts, as a result of random observation loss. Forany general block design, formulae have been derived and a program has beenwritten to calculate and output the vulnerability measures.Comparisons are made between the vulnerability and optimality of designs.The vulnerability measures can aid in design construction, be used as a pilotprocedure to ensure the proposed design is sufficiently robust, or as a methodof design selection by ranking the vulnerability measures of a set of competingdesigns in order to identify the least vulnerable design. In particular, this candistinguish between non-isomorphic BIBDs. By observing combinatorial relationshipsbetween concurrences and block intersections of designs, this rankingmethod is compared with other approaches in literature that consider theeffects on the efficiency of BIBDs, by either the loss of two complete blocks, orthe loss of up to three random observations.The loss of whole blocks of observations is also considered, presenting improvementson bounded conditions for the maximal robustness of designs.Special cases of design classes are considered, e.g. complement BIBDs and repeatedBIBDs, as well as non-balanced designs such as Regular Graph Designs
Smooth supersaturated polynomial interpolators (Bates et al. 2009)are an alternative to modelling computer simulations. They havethe flexibility of polynomial modeling, while avoiding theinconvenience of undesired polynomial oscillations (i.e. Runge'sphenomenon). Smooth polynomials have been observed to be mosteffective for small sample sizes, although their use is notrestricted in this respect. The talk will survey the smoothpolynomial technique, comparing with traditional alternatives likekriging or thin-plate splines. Extensions and examples will bepresented.This is joint work with Henry Wynn and Ron Bates (LSE).
The biased coin design introduced by Efron (1971, Biometrika) is a design for allocating patients in clinicaltrials which helps to maintain the balance and randomness of the experiment. Its power is studied by Chen (2006,Journal of Statistical Planning and Inference) and compared with that of repeated simple random sampling whenthere are two treatment groups and patients’ responses are normally distributed. Another design similar to Efron’sbiased coin design called the adjustable biased coin design has been developed by Baldi Antognini and Giovagnoli(2004, Journal of the Royal Statistical Society Series C) for patient allocation. Both designs aim to balance thenumber of patients in two treatment groups. It is shown by Baldi Antognini (2008, Journal of Statistical Planningand Inference) theoretically the adjustable biased coin design is uniformly more powerful than Efron’s biased coindesign. It means that the adjustable biased coin design gives a more balanced trial than Efron’s biased coin design.Moreover, the biased coin design methods can also be applied to patients grouped by prognostic factors in orderto balance the number of patients in two treatments for each of the factors. This is called the covariate-adaptivebiased coin design by Shao, Yu and Zhong (2010, Biometrika). It is believed that the covariate-adaptive biasedcoin design gains more power than Efron’s biased coin design and recently the covariate-adjustable biased coindesign is also under investigation. However, the case when there is an interaction between covariates has not beenlooked at in details for any of the above designs.This talk will consist of three parts. First, numerical values for the simulated power for the adjustablebiased coin design which has not been studied before will be shown and compare with the simulated power ofEfron’s biased coin design. Then, the powers of repeated simple random sampling and the biased coin design willbe studied when responses are binary. The theoretical calculations and exact numerical results will then be givenfor the unconditional powers of the two designs for binary responses. Finally, the expression for the power ofcovariate-adaptive randomization by normal approximation will be introduced. Numerical values for the normalapproximation will also be given to compare with the exact value of the biased coin design. In addition, forthe covariate-adaptive biased coin design, the idea of global and marginal balance will also be introduced andcompared their difference when we have interactions for the covariates.
A large clinical trial for testing a new drug usually involves a large number of patients and is carried out in different countries using multiple clinical centres. The patients are recruited in different centres, after a screening period they are randomized to different treatments according to some randomization scheme and then get a prescribed drug. A design of multicenter clinical trials consists of several stages including statistical design (choosing a statistical model for the analysis of patient responses, randomization scheme, sample size needed for testing hypothesis, etc.) and predicting patient's recruitment and drug supply needed to cover patient's demand.The talk is devoted to the discussion of the advanced statistical techniques for modelling and predicting stochastic processes describing the behaviour of trial in time. For modelling patient's recruitment, the innovative predictive analytic statistical methodology is developed [1,2,3]. Patient's flows are modelled by using Poisson processes with random delays and gamma distributed rates. ML and Bayesian techniques for estimating parameters using recruitment data and asymptotic approximations for creating predictive bounds in time for the number of patients in centres/regions are developed. It allows also to evaluate the optimal number of clinical centres needed to complete the trial before deadline with a given probability and predict trial performance. This technique is extended further to predicting the number of different events in trials with waiting time to response. Closed-form analytic expressions for the predictive distributions are derived. Implementation in oncology trials is considered.The technique for predicting the number of patients randomized to different treatments for the basic randomization schemes – unstratified and centre/region-stratified, is developed and the impact of randomization process on the statistical power and sample size of the trial is also investigated [4]. Using these results, an innovative risk-based statistical approach to predicting the amount of drug supply required to cover patient demand with a given risk of stock-out is developed [3]. The software tools in R for patient's recruitment, event modelling and drug supply modelling based on these techniques are developed. These tools are on the way of implementation in GSK and already led to significant benefits and cost savings.References[1] Anisimov, V.V., Fedorov, V.V., Modeling, prediction and adaptive adjustment of recruitment in multicentre trials. Statistics in Medicine, Vol. 26, No. 27, 2007, pp. 4958-4975.[2] Anisimov, V.V., Recruitment modeling and predicting in clinical trials, Pharmaceutical Outsourcing. Vol. 10, Issue 1, March/April 2009, pp. 44-48.[3] Anisimov, V.V., Predictive modelling of recruitment and drug supply in multicenter clinical trials. Proc. of the Joint Statistical Meeting, Washington, USA, August, 2009, pp. 1248-1259.[4] Anisimov, V., Impact of stratified randomization in clinical trials, In: Giovagnoli A., Atkinson AC., Torsney B. (Eds), MODA 9 - Advances in Model-Oriented Design and Analysis. Physica-Verlag/Springer, Berlin, 2010, pp. 1-8.[5] Anisimov, V., Drug supply modeling in clinical trials (statistical methodology), Pharmaceutical Outsourcing, May/June, 2010, pp. 50-55.[6] Anisimov, V.V., Effects of unstratified and centre-stratified randomization in multicentre clinical trials. Pharmaceutical Statistics, v. 10, iss. 1, 2011, pp. 50-59.
I consider a very simple prediction problem and contrast two classical approaches with the Bayesian approach: firstly in the case of no selection (or selection at random) and secondly with limited design information in the form of unequal probability weights for the sampled units.I find the Bayesian approach much less ad hoc than the alternatives!
The estimation of the variance components of a response surface model for aSplit-plot design has been of much interest in recent years. Different techniquesare available for estimating these variance components that includes REML, abayesian approach, the replication of the center point runs and a randomization based approach. The available numbers of degrees of freedom is also animportant issue when estimating these variance components. In our talk, wewill present an algorithm for generating a D-optimal Split-plot design such thatthe generated design has a required number of degrees of freedom for estimatingthe variance components using the randomization based approach. One advantage of using this approach is that it gives pure error estimates of the variancecomponents.
In some recent applications, the interest is in combining information about relationships between variables from independent studies performed under partially comparable circumstances. One possible way of formalising this problem is to consider combination of families of distribution respecting conditional independence constraints with respect to a graph G, i.e., graphical models. In this talk I will start by giving a brief introduction to graphical models and by introducing some motivating examples of the research question. Then I will present some relevant types of combinations and associated properties. Finally I will discuss some issues related to the estimation of the parameters of the combination.
Risk-benefit assessment for decision-making based on evidence is a subject of continuing interest. However, randomised clinical trials evidence of risks and benefits are not always available especially for drugs used in children mainly due to ethical concern of children being subjects of clinical trials. This thesis appraises risk-benefit evidence from published trials in children for the case study; assesses the risk-benefit balance of drugs, proposes a framework for risk-benefit evidence synthesis, and demonstrates the extent of its contribution.The review shows trial designs lack safety planning leading to inconsistency safety reporting, and lack of efficacy evidence. The General Practice Research Database (GPRD) data was exploited to synthesise evidence of risks of cisapride and domperidone in children with gastro-oesophageal reflux as a case study. Efficacy data are only available through review evidence.Analysis of prescribing trends does not identify further risk-benefit issues but suggest the lack of evidence has led to inappropriate prescribing in children. Known adverse events are defined from the British National Formulary and quantified. Proportional reporting ratio technique is applied to other clinical events to generate potential safety signals. Signals are validated; and analysed for confirmatory association through covariates adjustment in regressions. The degree of associations between signals and drugs are assessed using Bradford Hill’s criteria for causation. Verified risks are known adverse events with 95% statistical significance, and signals in abdominal pain group and bronchitis and bronchiolitis group.The drugs’ risk-benefit profiles are illustrated using the two verified signals and an efficacy outcome. Sensitivity of input parameters is studied via simulations. The findings are used to hypothetically advise risk-benefit aspects of trial designs. The value of information from this study varies between stakeholders and the keys to communicating risks and benefits lie in presentation and understanding. The generalisability and scope of the proposed methods are discussed
Complex deterministic dynamical models are an important tool for climate prediction. Often though, such models are computationally too expensive to perform the many runs required. In this case one option is to build a Gaussian process emulator which acts as a surrogate, enabling fast prediction of the model output at specified input configurations. Derivative information may be available, either through the running of an appropriate adjoint model or as a result of some analysis previously performed. An emulator would likely benefit from the inclusion of this derivative information. Whether further efficiency is achieved, however, depends on the computational cost of obtaining the derivatives. Results of the emulation of a radiation transport model, with and without derivatives, are presented.The knowledge of the derivatives of complex models can add greatly to their utility, for example in the application of sensitivity analysis or data assimilation. One way of generating such derivatives, as suggested above, is by coding an adjoint model. In climate science in particular adjoint models are becoming increasingly popular, despite the initial overhead of coding the adjoint and the subsequent, additional computational expense required to run the model.We suggest an alternative method for generating partial derivatives of complex model output, with respect to model inputs. We propose the use of a Gaussian process emulator which can be used to estimate derivatives even without any derivative information known a priori. We show how an emulator can be employed to provide derivative information about an intermediate complexity climate model, C-GOLDSTEIN, and compare the performance of such an emulator to the C-GOLDSTEIN adjoint model.
Suppose we have a system that we wish to make repeated measurements on,but where measurement is expensive or disruptive. Motivated by anexample of probing data networks, we model this as a black box system:we can either chose to open the box or not at any time period, and ouraim is to infer the parameters that govern how the system evolves overtime.By regarding this system evolution as an experiment that is to beoptimised, we present a method for finding optimal time points at whichto measure, and discuss some numerical results. We show how we can generalise this result to find optimal measurementtimes for any system that evolves according to the Markov principle.This is joint work with Steven Gilmour and John Schormans (Queen Mary).
In many real life applications, it is impossible to observe the feature of interestdirectly. For example, non-invasive medical imaging techniques rely on indirectobservations to reconstruct an image of the patient’s internal organs. In this paper,we investigate optimal designs for such indirect regression problems.We use the optimal designs as benchmarks to investigate the efficiency of designscommonly used in applications. Several examples are discussed for illustration.Our designs provide guidelines to scientists regarding the experimental conditionsat which the indirect observations should be taken in order to obtain an accurateestimate for the object of interest.This is joint work with Nicolai Bissantz and Holger Dette (Bochum) and EdmundJones (Bristol).
Sub-Saharan African populations are characterized by a relatively complex genetic architecture. Their excessive allele frequency differentiation, linkage disequilibrium patterns and haplotype sharing have been understudied. The aim of our newly launched project on African diversity is to understand the genetic diversity among sub-Saharan African populations and its correlation with ethnic, archaeological and linguistic variation. Ultimately, the study is hoping to disentangle past population histories and therefore detect the evolutionary history of sub-Saharan African populations, who are the origin of anatomically modern humans. Additionally, subsequent genome-wide association studies, mostly related to lipid metabolism, are expected to identify previously unsuspected biological pathways involved in disease etiology. The talk is meant to broadly address the scope of the study and to outline the associated statistical challenges.
Nanoparticle clustering within composite materials is known to affect the performance of the material, such as its toughness, and can ultimately cause its mechanical failure.The type of nanoparticle dispersion is often judged through micrographs of the material, obtained using an electron or atomic force microscope. However no standard quantitative method is in use for classifying these materials into good (homogeneous) and poor (heterogeneous) dispersion. For material scientists it is of pressing concern that a suitable method is found to measure particle dispersion to enable further progress to be made in understanding the effect of morphology on the material properties.This talk aims to be of general interest, providing the engineering background, proposed method and measurement results of test cases.
The main focus of this seminar is the industrial application of tools for stochastic analysis within the standard engineering design process.Various applications will be discussed and the use of statistical methods in engineering will be highlighted.The talk will also explore aspects of uncertainty management, and highlight some of the challenges faced in delivering practical stochastic analysis methods for the engineering community.
In contrast to the usual sequential nature of response surface methodology(RSM), recent literature has proposed both screening and response surfaceexploration using a single three-level design. This approach is known as³one-step RSM². We discuss and illustrate shortcomings of the currentone-step RSM designs and analysis. Subsequently, we propose a class ofthree-level designs and an analysis that will address these shortcomings. Weillustrate the designs and analysis with simulated and real data.
Agronomic and forestry breeding trials tend to be large, often using hundreds of plants and showing considerable spatial variation. In this study, we present various alternatives for the design and analysis of field trials to identify “optimal” or “near optimal” experimental designs and statistical techniques for estimating genetic parameters through the use of simulated data for single site analysis. These simulations investigated the consequences of different plot types (single- or four-plant row), experimental designs and patterns of environmental heterogeneity.Also, spatial techniques such as nearest neighbor methods and modeling of the error structure by specifying an autoregressive covariance were compared. Because spatial variation cannot usually be accounted for in the trial design another strategy is to improve trial analysis by using post-hoc blocking. We studied several typical experimental designs and compared their efficiency with post-hoc blocking of the same designs over a randomized complete block.Usually, early stages of a breeding program there is a large availability of genotypes that could be tested but limited resources. Here, unreplicated trials have been recommended as an option to support on an early screening of genetic material that can be pre-selected and later tested in more formal replicated trials for single or multiple sites. In this study, we provide with a better evaluation/understanding of the statistical and genetic advantages and disadvantages of the use of unreplicated trials by using simulated data, particularly for clonal trials, under different replication alternatives. We also measure the gain in precision of using spatial analysis in unreplicated trials and evaluate the effects of different genetic structures (additive, dominant and epistasis) on these analyses.
It is not possible to completely randomize the order of runs in some multi-factor factorial experiments.This often results in a generalization of the factorial designs called split-plot designs. Sometimes inindustrial experiments complete randomization is not feasible because of having some factors whoselevels are difficult to change. When properly taken into account at the design stage, hard-to-changefactors lead naturally to multi-stratum structures. Mixed models are used to analyze multi-stratumdesigns as each stratum may have random effects on the responses. We intend to designexperiments and analyze categorical data with hard-to-set factors with the motivation of randomeffects structure in the mixed models. The current study is motivated by a polypropylene experimentby four Belgian companies where responses are continuous and categorical. We have analyzed thedata from the current experiment using mixed binary logit and mixed cumulative logit models in aBayesian approach. Also we obtained outputs following the simplified models by Goos and Gilmour(2010). While simplified models were used, the output obtained by Bayesian methods were similar tothose obtained by likelihood methods as non-informative priors were considered for the fixed effects.
In 1979 Barnett derived a series of classical tests that were based on simulations to test whether extreme observations in a sample were outliers. He did this for a variety of different probability distributions, including the Normal, Exponential, Uniform and Pareto distributions. In 1988 Pettit considersthis problem for Exponential samples by using a Bayesian approach based onderiving Bayes Factors to perform these tests. Then in 1990 he studies themultivariate Normal distribution in some detail and approaches this problemby deriving various results using the conditional predictive ordinate. Sincethen this problem has been considered for both the Poisson and Binomialdistributions.Recently I have been studying this problem for the Uniform and Paretodistributions and our talk is based on the results that I have obtained forthe Uniform case. The talk will be in two parts, first we look at the onesided Uniform distribution, where I have shown that the largest observationin the sample minimises the conditional predictive ordinate and then derivedthe Bayes Factor to test whether it is an outlier. I then derived the BayesFactors for the cases when we have multiple outliers generated by the sameprobability distribution and generated by different probability distributions.For the one sided Uniform distribution all the results that I obtained managedto be exact in the fact that I did not have to approximate any integrals.The second part of the talk looks at the two sided Uniform distribution,where the structure of the problem was exactly the same as for the one sidedUniform distribution except that it was a lot more complicated because ofit being a two parameter problem. I dealt with this by using a transformation that made this a one parameter problem and then used an analytical approach to approximate the Bayes Factors by an infinite series, where a full derivation for the approximation and proof that the series converges are given. Finally in this section, I extend my ideas to solve the problem for multivariate Uniform distribution.
In this talk I will first introduce cumulants which form a convenient language to describe and approximate probability distributions. A rich combinatorial structure of cumulants helps to understand them better. The combinatorial version of the definition of cumulants gives also a direct generalization to L-cumulants. Without going to much into technical details I will try to show how L-cumulants can be used in the analysis of certain statistical models. Our example focuses on phylogenetic tree models which are graphical models with hidden data. I will also mention some links with free probability.
This paper considers the problem of parameter estimation in a model for acontinuous response variable y when an important ordinal explanatoryvariable x is missing for a large proportion of the sample. Non-missingnessof x, or sample selection, is correlated with the response variableand/or with the unobserved values the ordinal explanatory variable takeswhen missing. We suggest solving the endogenous selection, or `not missingat random' (NMAR), problem by modelling the informative selection mechanism, the ordinal explanatory variable, and the response variable together.The use of the method is illustrated by re-examining the problem of the ethnic gap in school achievement at age 16 in England using linked data fromthe National Pupil database (NPD), the Longitudinal Study of Young Peoplein England (LSYPE), and the Census 2001.
The aim of the presentation is to present a novel, integratedtheoretical framework for the analysis of stochastic biochemicalreactions models. The framework includes efficient methods forstatistical parameter estimation from experimental data, as well astools to study parameter identifiability, sensitivity and robustness.The methods provide novel conclusions about functionality andstatistical properties of stochastic systems.I will introduce a general model of chemical reactions described bythe Chemical Master Equation that I approximate using the linear noiseapproximation. This allows to write explicit expressions for thelikelihood of experimental data, which lead to an efficient inferencealgorithm and a quick method for calculation of the Fisher InformationMatrices.A number of experimental and theoretical examples will be presented toshow how the techniques can be used to extract information from thenoise structure inherent to experimental data. Examples includeinference of parameters of gene expression using a fluorescentreporter gene data, a Bayesian hierarchical model for estimation oftranscription rates and a study of the p53 system. Novel insights intothe causes and effects of stochasticity in biochemical systems areobtained by the analysis of the Fisher Information Matrices.References:Komorowski, M. , Finkenstädt , B., Rand, D. A. , (2010); Using singlefluorescent reporter gene to infer half-life of extrinsic noise andother parameters of gene expression, Biophysical Journal, Vol 98,Issue 12, 2759-2769,Komorowski, M. , Finkenstädt , B., Harper, C. V., Rand, D. A. ,(2009); Bayesian inference of biochemical kinetic parameters using thelinear noise approximation, BMC Bioinformatics, 2009, 10:343doi:10.1186/1471-2105-10-343, 2009,B. Finkenstadt; E. A. Heron; M. Komorowski; K. Edwards; S. Tang; C. V.Harper; J. R. E. Davis; M. R. H. White; A. J. Millar; D. A. Rand,(2008);Reconstruction of transcriptional dynamics from gene reporter datausing differential equations, Bioinformatics 15 December 2008; 24:2901 - 2907.
Graphical models provide a very promising avenue for making sense of large, complex datasets. In this talk I review strategies for learning Bayesian networks, the most popular graphical models currently in use, and introduce a new graphical model, the chain event graph, which is an improvement on using the Bayes net in many cases but which introduces its own challenges for learning, prediction and causation.
We discuss a numerical analysis of the parametric identifiability of electrochemicalsystems. Firstly, we analyze global identifiability of the entire set of parameters in a single ac voltammetry experiment and examine the effect of different waveforms(square, sawtooth) on the accuracy of the identification procedure. The analysis of global identifiability is equivalent to finding a global optimum of a specially designed function. The optimization problem is solved by a random search method and a statistical analysis of the obtained solution allows for selection of a subset of the parameters (or they linear combinations), which can be identified. Finally, we discuss optimization of the waveform for better identifiability.
Cluster analysis is a well established statistical technique which aims to detect groups in data. Its main use is as an exploratory tool ratherthan a conclusive technique.Recently there has been growing interest in expansions of this techniqueunder the general umbrella name "Persistence of homology". This new topic is in the crossroads between statistics and topology; and the main aim isto describe other features than groups present in multivariate data andthus it is a natural extension of clustering. Betti numbers are used to describe data, and applying the first Betti number coincides with cluster analysis, whereas subsequent Betti numbers enable detection of "holes" or loops in data. For example, cluster analysis is unable to detect whether data is gathered around a circle, but with persistent homology this feature is immediately detected.The Seminar aims to survey both techniques and illustrate with some examples.This Seminar is the result of EPSRC Vacation Bursary Scheme 2010, wonby QMUL undergraduate Ramon Rizvi.
For a single-phase experiment, we allocate treatments toexperimental units using a systematic plan, and then randomize bypermuting the experimental units by a permutation chosen at random froma suitable group. This leads to the theory developed in J. A.Nelder's 1965 Royal Society papers. Recently, C. J. Brien and I havebeen extending this theory to experiments such as two-phaseexperiments, where the produce, or outputs, from the first phase arerandomized to a new set of experimental units in the second phase.This brings in new difficulties, especially with standard software.
We describe a standardised audio listening test known as MUSHRA, used to evaluate the perceptual quality of intermediate-quality audio algorithms (for example MP3 compression). The nature of the test involves aspects of continuous rating as well as ranking of items. We discuss the statistics used to analyse test data, in light of recent experiences conducting a user group study.
An increasing number of microarray experiments produce timeseries of expression levels for many genes. Some recent clusteringalgorithms respect the time ordering of the data and are, importantly,extremely fast. The aim is to cluster and classify the expressionprofiles in order to identify genes potentially involved in, andregulated by, the circadian clock. In this presentation we report newdevelopments associated with this methodology. The partition space isintelligently searched placing most effort in refining the partitionwhere genes are likely to be of most scientific interest.
There is an increasing demand to test more than one new treatment in the hope offinding at least one that is better than the control group in clinical trials. A likelihoodratio test is developed using order restricted inference, a family of tests is defined andit is shown that the LRT and Dunnett-type tests are members of this family. Tests arecompared, using power and a simple loss function which takes incorrect selection, andits impact, into account. The optimal allocation of patients to treatments were soughtto maximize power and minimize expected loss.For small samples, the LRT statistic for binary data based on order restricted inferenceis derived and used to develop a conditional exact test. Two-stage adaptive designs forcomparing two experimental arms with a control are developed, in which the trial isstopped early if the difference between the best treatment and the control is less thanC1; otherwise, it continues, with one arm if one experimental treatment is better thanthe other by at least C2, or with both arms otherwise. Values of the constants C1 andC2 are compared and the adaptive design is found to be more powerful than the fixeddesign.
In this talk, we consider Bayesian shrinkage predictions for the Normal regression problem under the frequentist Kullback-Leibler risk function. This result is an extension of Komaki (2001, Biometrika) and George (2006, Annals. Stat.).
Firstly, we consider the multivariate Normal model with an unknown mean and a known covariance. The covariance matrix can be changed after the first sampling. We assume rotation invariant priors of the covariance matrix and the future covariance matrix and show that the shrinkage predictive density with the rescaled rotation invariant superharmonic priors is minimax under the Kullback-Leibler risk. Moreover, if the prior is not constant, Bayesian predictive density based on the prior dominates the one with the uniform prior. In this case, the rescaled priors are independent of the covariance matrix of future samples. Therefore, we can calculate the posterior distribution and the mean of the predictive distribution (i.e. the posterior mean and the Bayesian estimate for quadratic loss) based on some of the rescaled Stein priors without knowledge of future covariance. Since the predictive density with the uniform prior is minimax, the one with each rescaled Stein prior is also minimax.
Next we consider Bayesian predictions whose prior can depend on the future covariance. In this case, we prove that the Bayesian prediction based on a rescaled superharmonic prior dominates the one with the uniform prior without assuming the rotation invariance. Applying these results to the prediction of response variables in the Normal regression model, we show that there exists the prior distribution such that the corresponding Bayesian predictive density dominates that based on the uniform prior. Since the prior distribution depends on the future explanatory variables, both the posterior distribution and the mean of the predictive distribution may depend on the future explanatory variables.
The Stein effect has robustness in the sense that it depends on the loss function rather than the true distribution of the observations. Our result shows that the Stein effect has robustness with respect to the covariance of the true distribution of the future observations.
Enzymes are biological catalysts that act on substrates. The speed of reaction as a function of substrate concentration typically followsthe nonlinear Michaelis-Menten model. The reactions can be modified by the presence of inhibitors, which can act by several different mechanisms, leading to a variety of models, all also nonlinear.The paper describes the models and derives optimum experimental designs for model building. These include D-optimum designs for all the parameters and Ds-optimum designs for subsets of parameters. The Ds-optimum designs may be nonsingular and so do not provide estimates of all parameters; designs are suggested which have both good D- and Ds-efficiencies. Also derived are designs for testing the equality of parameters.
The classical problem of ascertaining the connectivity status of an m-way design has received much attention, particularly in the cases where m=2 and m=3. In the general case, a new approach yields the connectivity status for the overall design and for each of the individual factors directly from the kernel space of the design matrix. Furthermore, the set of estimable parametric functions in each factor is derived from a segregated component of this kernel space.The kernel space approach enables a simple derivation of some classical results. Examples are given to illustrate the main results.
The information approach to optimal experimental design is widened to include information-based learning more generally, drawing on the classical work of Renyi, Lindley, de Groot and others. Learning is considered as occurring when the posterior distribution of the quantity of interest is more peaked than the prior, in a certain sense. A key theorem states when this is expected to occur. Some special examples are considered which show the boundary between when learning occurs and when it does not.
Transformation on both sides of a nonlinear regression model has been used in practice to achieve, for example, linearity in the parameters of the model, approximately normally distributed errors, and constant error variance. The method of maximum likelihood is the most common method for estimating the parameters of the nonlinear model and the transformation parameter. In this talk we will discuss a new method, which we call the Anova method, for estimating all the parameters of the transform-both-sides nonlinear model. The Anova method is computationally simpler than the maximum likelihood approach and and allows a more natural separation of different sources of lack-of-fit. Considering the Michaelis-Menten model as an example, we will show the results of a simulation study for comparing maximum likelihood and Anova methods, where the Box-Cox transformation is used for transforming both sides of the Michaelis-Menten model. We will also show the use of the Anova method in fitting more complex transform-both-sides nonlinear models, such as transform-both-sides nonlinear mixed effects models and transform-both-sides nonlinear model with random block effects. At the end of the talk, we will briefly present a new approach of designing transform-both-sides nonlinear Michaelis-Menten model.
Zhou et.al (2006) developed Bayesian dose-escalation procedures forearly phase I clinical trials in oncology.They are based on with discretemeasures of undesirable events and continuous measures of therapeuticbenefit. The objective is to find the optimal dose associated with somelow probability of an adverse event.To understand their methodology I tried to reproduce their resultsusing a hierarchical linear model (Lindley and Smith (1972)) with differentorderings of the data. Computations were done in R. I found my resultswere consistent with one another but different to the published results.I then also programmed the model using ``WinBugs'' and again found theresults to be consistent with mine. I concluded that the published resultswere in error.My main interests are in Bayesian approaches for the design and analysisof dose escalation trials, which involves prior information concerningparameters of the relationships between dose and the risk of an adverseevent, with the chance to update after every dosing period using Bayestheorem. In this talk I will discuss some of these issues and also shallreport my current work.
Dating back to Dixon and Mood (1948), an Up-and-Down procedure is a sequential experiment used in binary response trials for identifying the stress level (treatment) corresponding to a pre-specified probability of positive response. In Phase I clinical trials U&D rules can bee seen as a development of the traditional dose-escalation procedure (Storer, 1998). Recently Baldi Antognini et al. (2008) have proposed a group version of U&D procedures whereby at each stage a group of m units is treated at the same level and the number of observed positive responses determines how to randomize the level assignment of the next group. This design generalizes a vast class of U&Ds previously considered (Derman, 1957; Durham and Flournoy 1994; Giovagnoli and Pintacuda, 1998; Gezmu and Flournoy, 2006). The properties of the design change as the randomization method varies: appropriate randomization schemes guarantee desirable results in terms of the asymptotic behaviour of the experiment (see also Bortot and Giovagnoli, 2005). Results can be extended to continuous responses (Ivanova and Kim, 2009).
Other approaches for identifying a target dose, alternative to the nonparametric U&D, are the parametric Continual Reassessment Method introduced by O'Quigley et al. (1990), and several recent modifications thereof. The debate on dose escalation procedures in the recent statistical literature continues to be very lively.
In the social, behavioral, educational, economic, and biomedical sciences, data are often collected in ways that introduce dependencies in the observations to be compared. For example, the same respondents are interviewed at several occasions, several members of networks or groups are interviewed within the same survey, or, within families, both children and parents are investigated. Statistical methods that take the dependencies in the data into account must then be used, e.g., when observations at time one and time two are compared in longitudinal studies. At present, researchers almost automatically turn to multi- level models or to GEE estimation to deal with these dependencies. Despite the enormous potential and applicability of these recent developments, they require restrictive assumptions on the nature of the dependencies in the data.
The marginal models of this talk provide another way of dealing with these dependencies, without the need for such assumptions, and can be used to answer research questions directly at the intended marginal level. The maximum likelihood method, with its attractive statistical properties, is used for fitting the models. This talk is based on a recent book by the authors in the Springer series Statistics for the Social Sciences, see www.cmm.st.
Conformal Prediction (CP) is a distribution-free and non-asymptotic uncertainty estimation method, i.e. it does not rely on assumptions on the underlying data distribution and provides finite-sample guarantees. Given any pre-trained prediction algorithm and a test sample, a CP algorithm produces a Prediction Set (PS), i.e. a subset of the label space, that is guaranteed to contain the test label with lower-bounded marginal probability. We address the problem of making the PS locally adaptive. The proposed new strategy produces PS that are marginally valid but have input-dependent sizes. The localization process is cast into a smooth minimization problem and can be solved through standard gradient methods.
Pre-smoothing is a technique aimed at increasing the signal-to-noise ratio in data to improve subsequent estimation and model selection in regression problems. However, pre-smoothing has thus far been limited to the univariate response regression setting. Motivated by the widespread interest in multi-response regression analysis in many scientific applications, this article proposes a technique for data pre-smoothing in this setting based on low rank approximation. We establish theoretical results on the performance of the proposed methodology, and quantify its benefit empirically in a number of simulated experiments. We also demonstrate our proposed low rank pre-smoothing technique on real data arising from the environmental sciences.
Deep learning-based image reconstruction approaches have demonstrated considerable success in many imaging modalities. However, their reliance on abundant high-quality paired training data remains a significant hurdle in many problem domains where such datasets are not available, for example in medical imaging. Moreover, deep learning approaches in data scarce scenarios often fail to generalise and are prone to reconstruction artefacts in case of distributional shifts. In this talk we present an unsupervised/selfsupervised deep learning approach aimed to address these challenges through a two-stage methodology. In the first stage the network is pretrained on simulated training data of ground truth images and measurements. In the second stage the parameters are fine-tuned on the target image, adapting the model to the shift in distribution. Experimental results showcase the effectiveness of our approach, revealing accelerated deployment, improved stability, and competitive performance despite limited training data.
I will present a recent generalisation of the popular interior-penalty discontinuous Galerkin (dG) method discretizing general classes of linear and nonlinear advection-diffusion-reaction problems to meshes comprising extremely general, essentially arbitrarily-shaped element shapes. In particular, our analysis allows for curved element shapes, without the use of non-linear elemental maps. The feasibility of the method relies on the definition of a suitable choice of the discontinuity-penalisation, which turns out to be explicitly dependent on the particular element shape, but essentially independent on small shape variations. A priori error bounds for the resulting method will be given, under very mild structural assumptions restricting the magnitude of the local curvature of element boundaries. I also plan to discuss briefly computer implementation aspects of the framework. Numerical experiments will be also presented throughout the talk aiming to motivate and showcase the practicality and the potential advantages of the proposed numerical framework.
In multivariate time series systems, key insights can be obtained by discovering lead-lag relationships inherent in the data, which refer to the dependence between two time series shifted in time relative to one another, and which can be leveraged for the purposes of control, forecasting or clustering. We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models. Within our framework, the envisioned pipeline takes as input a set of time series, and creates an enlarged universe of extracted subsequence time series from each input time series, via a sliding window approach. This is then followed by an application of various clustering techniques, (such as k-means++ and spectral clustering), employing a variety of pairwise similarity measures, including nonlinear ones. Once the clusters have been extracted, lead-lag estimates across clusters are robustly aggregated to enhance the identification of the consistent relationships in the original universe. We establish connections to the multireference alignment problem for both the homogeneous and heterogeneous settings. Since multivariate time series are ubiquitous in a wide range of domains, we demonstrate that our method is not only able to robustly detect lead-lag relationships in financial markets, but can also yield insightful results when applied to an environmental data set.
The inverse problem of tomographic imaging is the reconstruction of a 3D sample from 2D projection images. An estimate for the 3D reconstruction of a sample is usually obtained by discretizing the reconstruction volume using a voxel grid. This discretization may not be ideal in scenarios where additional prior knowledge is available. In this talk, we look at two applications where grid-free alternatives are advantageous: first, we look at the problem of reconstructing a nanocrystal at atomic resolution from electron microscopy images taken at a few tilt angles. We propose a grid-free algorithm that allows for continuous deviations of the atom locations. We show that this allows for a meaningful incorporation of additional prior knowledge about the system, in particular the potential energy of the configuration, and is able to resolve lattice defects in simulated data. In addition, we show how augmenting such an approach with a model for deformation allows us to propose a grid-free algorithm for tilt-series alignment in cryo-electron tomography. We compare this second approach with existing approaches for tilt-series alignment and show that we can reliably estimate marker locations and deformations without labelling markers in projection data.
Due in part to a wider acceptance of advanced convex optimization methods, nonsmooth regularization terms are now a mainstay of variational approaches in inverse problems, optimal control, and beyond. A majority of those used in practice are positively one-homogeneous, which means that they can be seen as the Minkowski or gauge functional of an infinite-dimensional convex set, the generalized unit ball associated to the regularizer.
Under compactness assumptions which are in any case required for the regularization method to be well-posed, these balls can be described as the convex hull of their extremal points. Making such a description explicit has a multitude of applications mostly revolving around sparsity, which is usually the motivation for introducing such regularization functionals in the first place. These include results showing existence of solutions that can be expressed using finitely many of these extremal points, and optimization algorithms based on such iterates, which often admit fast convergence guarantees and grid-free implementations.
In this talk we will consider this description of extremal points in some specific cases. We provide a full characterization for two infimal convolution-type functionals, the total generalized variation in one dimension and Kantorovich-Rubinstein norms in spaces of signed measures in Euclidean space, as well as some results on different variants of the total (gradient) variation.
Based on joint works with Daniel Walter, Marcello Carioni, Giacomo Cristinelli and Kristian Bredies.
Agriculture is both one of the sectors most susceptible to climate change and a significant contributor to it. Therefore, it is essential to consider both mitigation and adaptation strategies, as well as transforming agricultural practices to promote sustainability and resilience in the agricultural sector. A key objective of application of artificial intelligence (AI) and satellite imagery in agricultural settings is to develop more reliable and scalable methods for monitoring global crop conditions promptly and transparently, while also exploring how we can adapt agriculture to mitigate the effects of climate change. Agricultural monitoring with earth observation data provides a timely and reliable way to access the state of the field or farm and the surrounding territories, used for gathering data and producing forecasts. Computer vision and signal processing techniques play a crucial role in extracting meaningful information from raw satellite data. Growing adoption of AI and machine learning (ML) tools has significantly influenced the expansion of Earth Observation (EO) and remote sensing to agricultural management. In this talk, I will discuss advanced techniques employed throughout the entire data processing cycle, encompassing tasks such as data compression, transmission, image recognition, and forecasting environmental factors like land cover, land use, biomass, organic soil carbon and soil nutrient and more.
Network data are commonly observed in a wide variety of applications. Such data may arise in the form of a single network observed at a given point in time, or as multiple networks on the same set of nodes, for example, social networks on the same set of individuals over time, or from different social platforms at a given point in time. A nonparametric approach to studying structure in unlabeled networks is offered by the graphon function. There has been a growing interest on the problem of graphon estimation as well as its application to important problems such as bootstrapping networks, estimation of missing links etc. In this talk, I will present results on graphon estimation from a single network observed with node covariates and a natural extension of the graphon model to the bivariate setting where a pair of possibly correlated networks on the same set of nodes are observed.
Approximation properties of infinitely wide neural networks have been studied by several authors in the last few years. New function spaces have been introduced that consist of functions that can be efficiently (i.e., with dimension-independent rates) approximated by neural networks of finite width, e.g. Barron spaces for networks with a single hidden layer. Typically, these functions act between Euclidean spaces, typically with a high-dimensional input space and a lower-dimensional output space. As neural networks gain popularity in inherently infinite-dimensional settings such as inverse problems and imaging, it becomes necessary to analyse the properties of neural networks as nonlinear operators acting between infinite-dimensional spaces. In this talk, I will discuss a generalisation of Barron spaces to functions that map between Banach spaces and present Monte-Carlo (1/sqrt(n)) approximation rates.
We consider statistical inference for a single coordinate of a high-dimensional parameter in sparse linear regression. It is well-known that high-dimensional procedures such as the LASSO can provide biased estimators for this problem and thus require debiasing. Motivated by recent theoretical advances on debiased Bayesian inference, we propose a scalable variational Bayes approach to this problem. We investigate the numerical performance of this algorithm and establish accompanying theoretical guarantees for estimation and uncertainty quantification. Joint work with Ismael Castillo, Alice L’Huillier and Luke Travis.
This talk will study spatiotemporal data collected from “drifters” which are instruments designed to freely float around our ocean, mimicking particles of water. While the focus of the talk is in oceanography, this form of data is ubiquitous, for example the spatiotemporal data collected from wearable devices (e.g. medical wristwatches), therefore much of the methodology presented can translate to other applications. The focus is on data-driven statistical solutions, the presentation will not be too technical, and no prerequisite knowledge of oceanography is expected from the audience!
I will present two statistical techniques that were specifically designed to address problems in network analysis. The first is a statistical algorithm to determine if a network meets the prerequisite conditions to be meaningfully summarized through clusters. Clustering algorithms will always identify clusters. Unfortunately, if a network does not possess a clustered structure, the (node) clustering exercise will not only be a waste of time, it will inevitably result in misleading conclusions. The second technique is a statistical routine that seeks to answer the question "is network G1 similar to network G2?". To answer this question, we transform the graph into a probability distribution and use a standard Kolmogorov-Smirnov test.
Vector autoregressions (VARs) are popular in analyzing economic time series. However, VARs can be over-parameterized if the numbers of variables and lags are moderately large. Tensor VAR, a recent solution to overparameterization, treats the coefficient matrix as a third-order tensor and estimates the corresponding tensor decomposition to achieve parsimony. In this paper, the inference of Tensor VARs is inspired by the literature on factor models. Firstly, we determine the rank by imposing the Multiplicative Gamma Prior to margins, i.e. elements in the decomposition, and accelerate the computation with an adaptive inferential scheme. Secondly, to obtain interpretable margins, we propose an interweaving algorithm to improve the mixing of margins and introduce a post-processing procedure to solve column permutations and sign-switching issues. In the application of the US macroeconomic data, our models outperform standard VARs in point and density forecasting and yield interpretable results consistent with the US economic history.
This article develops a novel approach for estimating a high-dimensional and nonparametric graphical model for functional data. Our approach is built on a new linear operator, the functional additive partial correlation operator, which extends the partial correlation matrix to both the nonparametric and functional settings. We show that its nonzero elements can be used to characterize the graph, and we employ sparse regression techniques for graph estimation. Moreover, the method does not rely on any distributional assumptions and does not require the computation of multi-dimensional kernels, thus avoiding the curse of dimensionality. We establish both estimation consistency and graph selection consistency of the proposed estimator, while allowing the number of nodes to grow with the increasing sample size. Through simulation studies, we demonstrate that our method performs better than existing methods in cases where the Gaussian or Gaussian copula assumption does not hold. We also demonstrate the performance of the proposed method by a study of an electroencephalography data set to construct a brain network.
This paper introduces Type I and Type II Tobit Bayesian Additive Regression Trees (TOBART-1 and TOBART-2). Simulation results and applications to real data sets demonstrate that TOBART-1 produces more accurate predictions than competing methods, and provides posterior intervals for the conditional expectation and other quantities of interest.
TOBART-2 extends the Type II Tobit model to account for nonlinearities and model uncertainty by including sums of trees in both the selection and outcome equations. A Dirichlet Process Mixture distribution for the error term allows for departure from the assumption of bivariate normally distributed errors. Simulation studies suggest that TOBART-2 can produce more accurate treatment effect estimates than competing approaches. We illustrate the method with an application to the RAND Health Insurance Experiment.
Excess hazard modelling is one of the main tools in population-based cancer survival research. This setting allows for direct modelling of the survival due to cancer in the absence of reliable information on the cause of death, which is common in population-based cancer epidemiology studies. We propose a unifying link-based additive modelling framework for the excess hazard that allows for the inclusion of many types of covariate effects, including spatial and time-dependent effects, using any type of smoother, such as thin plate, cubic splines, tensor products and Markov random fields. Three case studies that illustrate the type of applications of interest in practice will be presented. We will conclude with a discussion on available software tools (in R), as well as a general discussion on the use of the relative survival framework.
Stepped wedge trials are cluster randomised clinical trials in which each “cluster” of participants (e.g. all users of a local health service) are randomised, not to one treatment condition or another, but to a particular schedule for crossing from the control condition to the intervention condition. Some clusters might cross over before data collection even begins; some might cross over at some point during the prospective data collection interval, and some might not cross over at all during that interval. In a stepped wedge trial the cross-over is always unidirectional. You can cross from control to intervention, but never back again from intervention to control. In some stepped wedge trials participants are recruited from each cluster in one, long, consecutive stream; in others they are recruited once at the start of the trial and followed prospectively as a cohort; in still others they are sampled in a series of cross-sectional snapshots of the cluster through time. The unidirectional cross-over, the constraints on how many people you can recruit and when, and the way you model the correlation between health outcomes of individuals from the same cluster (the intra-cluster correlation), all lead to some fascinating problems in the design of experiments, with some equally fascinating solutions. These solutions are of great practical interest to applied health researchers trying to evaluate public health interventions and quality improvement programmes. In my own methods research programme I am particularly interested in two kinds of stepped wedge design: “incomplete” designs, where data collection effort is focused at particular times in particular clusters, and designs with continuous recruitment of participants. I will present some of the findings from this work.
Adaptive numerical quadrature is used to normalize posterior distributions in many Bayesian models. We provide the first stochastic convergence rate for the error incurred when normalizing a posterior distribution under typical regularity conditions. We give approximations to moments, marginal densities, and quantiles, and provide convergence rates for several of these summaries. Low- and high-dimensional applications are presented, the latter using adaptive quadrature as one component of a more sophisticated approximation framework, for which limited theory is given. Extension of the theory to the high-dimensional framework for the Laplace approximation (a specific instance of an adaptive quadrature method) is considered and guarantees are provided under additional regularity assumptions.
Markov chain Monte Carlo samplers still converge to the correct posterior distribution of the model parameters when an unbiased estimator is available for the Likelihood. Whilst this allows inference for a very wide variety of intractable problems, a critical issue for performance is the choice of the number of particles (or samples).
We add the following contributions. We provide analytically derived, practical guidelines on the optimal number of particles to use in general scenarios. We show that the results in the article apply more generally to Markov chain Monte Carlo sampling schemes with the likelihood estimated in an unbiased manner. We introduce recent results on the asymptotic limits as T (the length of the time series) becomes large. Applications include Stochastic Volatility models for which the volatility follows a stochastic differential equation.
In the talk, I will provide an overview of different approaches I have applied to model the unfolding of the COVID-19 pandemic and its effects. In doing so, I will discuss the insights obtained by studying the initial phases of the pandemic, the first wave, and the vaccine rollout in the USA, Europe as well as Latin America. I will also discuss the key role of non-pharmaceutical interventions.
Vector autoregression is an essential tool in empirical macroeconomics and finance, providing simple yet insightful information, such as the impulse response function of different shocks. This paper extends the scope of vector autoregression under a multivariate distribution regression framework and proposes the distribution impulse response function, which provides a more comprehensive picture of the dynamic heterogeneity. As an empirical application, we apply the proposed method to study the conditional joint distribution of GDP growth rate and financial conditions in the U.S. The results from our new framework confirm some existing findings in the literature: 1) the tight financial condition creates multimodality in the conditional joint distribution, and 2) restricting the upper tail of financial condition has a noticeable impact on long-term GDP growth. Yet, the extracted information on the effect of restricting the lower tail of GDP during the global financial crisis suggests an alternative conclusion, i.e., negligible impact on financial condition.
Information theory has interesting connections to the population dynamics of self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher's fundamental theorem of natural selection.
We will discuss a series of bilevel optimisation problems that use a suitable statistics-based upper level objective and lead to automatic selection of spatially dependent parameters for regularisation functionals used in image reconstruction. The spatial dependence of the parameters generally leads to a better recovery of high-detailed areas in the reconstructed image. We will introduce the framework by considering initially as a regulariser the weighted Total Variation, and subsequently discuss its artifact-free, higher order extension, weighted Total Generalised Variation. We will then present some recent results regarding extension of the framework to regularisation functionals that involve a more general class of differential operators. The applicability of this extension will be demonstrated with numerical results in image denoising for a Huber Total Variation functional where also the underlying Huber parameter is chosen to be spatially dependent. This provides further flexibility in the regularisation process and eventually results in an improved reconstruction quality.
It has long been suggested that the mid-latitude atmospheric circulation possesses what has come to be known as "weather regimes", which can roughly be categorised as regions of phase space with above-average density. Their existence and behaviour has been extensively studied in meteorology and climate science, due to their potential for drastically simplifying the complex and chaotic mid-latitude dynamics. Several well-known, simple non-linear dynamical systems have been used as toy-models of the atmosphere in order to understand and exemplify such regime behaviour. Nevertheless, no agreed-upon and clear-cut definition of a "regime" exists in the literature, and unambiguously detecting their existence in the atmospheric circulation is often hindered by the high dimensionality of the system.
In this talk I will first give an overview of some of the approaches used to study and define weather regimes. I will then proceed to propose a definition of weather regime that equates the existence of regimes in a dynamical system with the existence of non-trivial topological structure of the system's attractor. I will discuss how this approach is computationally tractable, practically informative, and identifies the relevant regime structure across a range of examples. This talk is based on the paper https://doi.org/10.1007/s00382-022-06395-x
At the moment, in the UK the majority of women go through the same breast cancer screening programme, but different women have different levels of risk of getting breast cancer.
We look at how we can assess risk using mammograms to enable new breast cancer screening programmes that they are more suited to the level of risk faced by each woman. In particular:
This work is part of the CRUK funded project (reference 49757/A28689) "An Artificial Intelligence System for Real-time Risk Assessment at Mammography Screening (Mammo AI)'"
Rank-ordered data are popular in many fields, including sports, marketing, finance, politics, and health economics. Most of the existing approaches rely on the restrictive assumption of a linear specification for the latent scores that drive the observed ranks. Besides, despite being provided over time by one or multiple rankers, the temporal dimension and properties of these orderings have been rarely investigated in the literature. To deal with these issues, we introduce two novel families of nonparametric order-statistics models that considers a static (ROBART) and an autoregressive process (ARROBART) for the latent scores and allows for a nonlinear impact of each covariate on the latent scores. This is achieved by modeling the regression function via a Bayesian additive regression tree (BART), that defines the overall fit as the sum the fit of many small regression trees. As generalizations of the Thurstone family, the proposed ROBART and ARROBART models preserve interpretability and include several popular frameworks as special cases. Joint work with Eoghan O’Neill, Luca Rossini.
Performing exact Bayesian inference for complex models is computationally intractable. Markov chain Monte Carlo (MCMC) algorithms can provide reliable approximations of the posterior distribution but are expensive for large data sets and high-dimensional models. A standard approach to mitigate this complexity consists in using subsampling techniques or distributing the data across a cluster. However, these approaches are typically unreliable in high-dimensional scenarios. We focus here on a recent alternative class of MCMC schemes exploiting a splitting strategy akin to the one used by the celebrated alternating direction method of multipliers (ADMM) optimization algorithm. These methods appear to provide empirically state-of-the-art performance but their theoretical behavior in high dimension is currently unknown. In this paper, we propose a detailed theoretical study of one of these algorithms known as the split Gibbs sampler. Under regularity conditions, we establish explicit convergence rates for this scheme using Ricci curvature and coupling ideas. We support our theory with numerical illustrations. This is joint work with Maxime Vono (Criteo AI Lab) and Arnaud Doucet (Oxford).
Zoom link
We use Cech closure spaces, also known as pretopological spaces, to develop a uniform framework that encompasses the discrete homology of metric spaces, the singular homology of topological spaces, and the homology of (directed) clique complexes, along with their respective homotopy theories. We obtain nine homology and six homotopy theories of closure spaces. We show how metric spaces and more general structures such as weighted directed graphs produce filtered closure spaces. For filtered closure spaces, our homology theories produce persistence modules. We extend the definition of Gromov-Hausdorff distance to filtered closure spaces and use it to prove that our persistence modules and their persistence diagrams are stable. We also extend the definitions Vietoris-Rips and Cech complexes to closure spaces and prove that their persistent homology is stable.
This is joint work with Nikola Milicevic.
Here is the Zoom link
Echelon designs were first described in the monograph by Pistone et al. (2000). These designs are defined for continuous factors and include, amongst others, factorial designs. They have the appealing property that the saturated polynomial model associated to it mirrors the geometric configuration of the design. Perhaps surprisingly, the interpolators for such designs are based upon the Hilbert series of the monomial ideal associated with the polynomial model and thus the interpolators satisfy properties of inclusion-exclusion.
Echelon designs are quite flexible for modelling and include the recently developed designs known as Smolyak sparse grids. In our talk we present the designs, describe their properties and show examples of application.
This is joint work with H. Wynn (CATS, LSE).
Reference: Pistone et al. (2000) Algebraic Statistics. Chapman & Hall/CRC
Key words: Sparse grids, experimental design, algebraic statistics, polynomial models.
Informed Markov chain Monte Carlo (MCMC) methods have been proposed as scalable solutions to Bayesian posterior computation on high-dimensional discrete state spaces, but theoretical results about their convergence behavior in general settings are lacking. In this talk, we first consider the variable selection problem. We propose a novel informed Metropolis-Hastings algorithm which can achieve a mixing rate that is independent of the number of covariates, under mild high-dimensional conditions. The mixing time proof relies on a novel method called "two-stage drift condition". This result shows that the mixing rate of locally informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation, and thus such methods scale well to high-dimensional data. Second, we consider MCMC sampling on general finite state spaces. We propose a class of methods called informed importance tempering (IIT) and develop generally applicable spectral gap bounds that characterize the convergence rate of IIT. Our theory provides important insights into how to choose the proposal weighting scheme for an informed MCMC method. If time permits, we will also briefly discuss the application of our theory to the high-dimensional structure learning problem. This talk is based on joint works with A. Smith, H. Chang, J. Yang, D. Vats, G. Roberts and J. Rosenthal.
Often, for a given patient population, there will be more than one treatment available for testing at the Phase III stage. Rather than conducting separate randomised controlled trials for each of these treatments (which could require prohibitively high numbers of patients), this study proposes a multi-arm trial assessing the performance of all the available treatments.
A surrogate biomarker/endpoint will be used to judge the performance of the treatments at interims, where, if a treatment under-performs with regard to the surrogate biomarker compared to the control, it will be removed and recruitment instead given to a new promising treatment.
I explored trials of this design which continue over a long period of time, comparing the power of a trial of this design with several consecutive parallel Phase III trials, to explore which design fared best in terms of type I error, power, survival of patients on the trials and long-term survival of patients with the given disease.
Frailty models are typically used when there are unobserved covariates. Here we explore their use in two situations where they provide important insights into two epidemiologic questions.
The first involves the question of type replacement after vaccination against the human papilloma virus (HPV). At least 13 types of HPV are known to cause cancer, especially cervix cancer. Recently vaccines have been developed against some of the more important types, notably types 16 and 18. These vaccines have been shown to prevent infection by the types used with almost 100% efficacy. However a concern has been raised that by eliminating these more common types, a niche will be created in which other types could now flourish and that the benefits of vaccination could be less than anticipated if this were to occur. It will be years before definitive data is available on this, but preliminary evidence could be obtained if it could be shown that there is a negative associated between the occurrence of multiple infections in the same individual. The virus is transmitted by sexual contact and testing for it has become part of cervical screening. As infection increases with greater sexual activity, a woman with one type is more likely to also harbour another type so the question can be phrase as to whether there is a negative association between specific pairs of types in the context of an overall positive association. A frailty model is used for this in which the total number of infections a women has is an unobserved covariate and the question can be rephrased to ask if specific types are negatively correlated conditional on the number of types present in a woman. This is modelled by assuming a multiplicative random variable τ having a log gamma distribution with unit mean and one additional parameter θ so that the occurrence of type j in individual i is modified to be τi pj where τi are iid copies of τ and the joint probability of being infected by types 1,..,k is Ѳkp1…pk with Ѳk = E(τk). A likelihood is obtained and moment based estimation procedures are developed and applied to a large data set.
A second example pertains to an extension of the widely used proportional hazards model for analysing time to event data with censoring. In practice hazards are often not proportional over time and converging hazards are observed, and the effect of a covariate is stronger in early follow up than it is subsequently. This can be modelled by assuming an unobserved multiplicative factor in the hazard function again having a log gamma distribution with unit mean and one additional parameter θ. Integrating out this term leads to a Pareto survival distribution. A (partial) likelihood is obtained and estimation procedures are developed and applied to a large data set.
One problem that arises from spatial data is that spatial correlation often exists among the observations, since spatial unites close to each other are likely to share similar socio-economic, infrastructure or other characteristics. Statistical models that ignored spatial correlation may lead to biased parameter estimates. In the econometrics literature, there are several methods to measure and model such spatial correlated effects. We demonstrate some of these statistical tools using real-world data by exploring factors affecting cancer screening coverage in England. In this particular study, we are interested in the impact of car ownership and public transport usage on breast and cervical cancer screening coverage. District-level cancer screening coverage data (in proportions) and UK census data have been collected and linked.
A non-spatial model (using ordinary least squares, OLS) was firstly fitted, and Moran's I statistic was used and found that significant spatial correlation exists even after controlling for a range of predictors. Two alternative spatial models were then tested, namely: 1) spatial autoregressive (SAR) model, and 2) SAR error model, or simply as spatial error model (SEM).
Results from spatial models are compared with the non-spatial models, it has been found that some coefficient estimates are different, and the former outperforms the latter in terms of goodness-of-fit. In particular, the SEM is the best model for both types of cancer.
Finally, we discuss some general issues in spatial analysis, such as the modifiable areal unit problem (MAUP), different spatial weighting schemes, and other spatial modelling strategies such as a gravity model and a spatially varying coefficient model.
Phase I clinical trials are an essential step in the development of anticancer drugs. The main goal of these studies is to establish the recommended dose and/or schedule of new drugs or drug combinations for phase II trials. The guiding principle for dose escalation in phase I trials is to avoid exposing too many patients to subtherapeutic doses while maintaining rapid accrual and preserving safety by limiting toxic side-effects. STARPAC is a phase 1 trial examining the use of ATRA, a Vitamin A like compound, in combination with established cancer drugs in combatting pancreatic cancer, a cancer with a dismal survival record which is the 4th highest cancer killer world wide. A challenge for toxicity trials that prescribe doses for newly recruited patients based on the dose and toxicity data from previous patients in that patients are recruited before previous patients have reported toxicity data. In order to safely escalate doses we employ a 2 stage process with the first stage an accelerated rule based procedure and the second stage a modified approach based on the Bayesian Continual Reassessment Method that combines a prior toxicity-dose curve with the accumulating patient dose/toxicity data.
Most cancer registries choose not to rely on cause of death when presenting survival statistics on cancer patients, but instead to look at overall mortality after diagnosis and adjust for the expected mortality in the cohort had they not been diagnosed with cancer. For many years the relative survival (observed survival divided by expected survival) was estimated by the Ederer-II method. More recently statisticians have used the theory of classical competing risks to estimate the net survival – that is the survival that would be observed in cancer patients if it were possible to remove all competing causes of death. Pohar-Perme showed that in general estimators of the relative survival and not consistent for the net survival, and proposed a new consistent estimator of the net survival. Poher-Perme’s estimator can have much larger variance than Ederer-II (and may not be robust). Thus whereas some statisticians have argued that one must use the Poher-Perme estimator because it is the only one that is consistent for the net survival, others have argued that there is a bias-variance trade off and Ederer-II may still be preferred even though it is inconsistent.
We draw analogy from the literature regarding robust estimation of location. If one wants to estimate the mean of a distribution consistently, then it may be difficult to improve on the sample mean. But if one simply wants a measure of location then other estimators are possible and might be preferred to the sample mean. We define a measure of net survival to be a functional satisfying certain equivariance and order conditions. The limits of neither Ederer-II nor Pohar-Perme satisfy our definition of being an invariant measure of net survival. We introduce two families of functionals that do satisfy our definition. Consideration of minimum variance and robustness then allows us to select a single member of each family as the preferred measure of net survival.
Noting that in a homogeneous population the relative survival and the net survival are identical and correspond to the survival of the excess hazard, we can then view our functionals of weighted averages of stratum-specific relative-survival, net-survival or excess-hazards. These can be viewed as standardised estimators with standardising weights that are time-dependent. The preferred measures use weights that depend on the numbers at risk in each stratum from a standard population as a function of time. For example, when the strata are defined by age at diagnosis, the standardising weights will depend on the age-specific prevalence of the cancer in the standard population.
We show through simulation that, unlike both Ederer-II and Pohar-Perme, our estimators are invariant and robust under changing population structures, and also that they are consistent and reasonably efficient. Although our estimator does not (consistently) estimate the (marginal) net hazard it performs as well or better than both the crude and standardised versions of both Ederer-II and Pohar-Perme in all simulations.
Joint work with Adam Brentnall
Studies of risk factors in epidemiology often use a case-control design. The concordance index (or area under the receiver operating characteristic (ROC) curve (AUC)) may be used in unmatched case-control studies to measure how well a variable discriminates between cases and controls. The AUC is sometimes used in matched case-control studies by ignoring matching, but it lacks interpretation because it is not based on an estimate of the ROC for the population of interest. An alternative measure of the concordance of risk factors conditional on the matching factors will be introduced, and applied to data from breast and lung cancer case-control studies.
Another common design in epidemiology is the cohort study, where the aim might be to estimate the concordance index for predictors of censored survival data. A popular method only considers pairs of individuals when thesmaller outcome is uncensored (Harrell's c-statistic). While this statistic can be useful for comparing different models on the same data set, it is dependent on the censoring distribution. Methods to address this issue will be considered and applied to data from a breast cancer trial.
Ideally, cancer screening is initially evaluated through randomised controlled trials, the analysis of which should be straightforward. The statistical challenges arise when one is either trying to combine the results of several trials with different designs, or trying to evaluate routine service screening (which may use improved technologies compared to the original randomised controlled trials).
We will briefly discuss the following problems.1. Estimation from interval censored data based on imperfect observations. When screening for asymptomatic pre-cancerous disease, one will only identify the disease if the individual with the disease is screened and if the screening test is positive (leading to further investigations and a definitive diagnosis). In the simple model one may have periodic screening with a fixed sensitivity. A more sophisticated analysis would take account of the possibility that as the precancerous lesion grows the sensitivity of the screening test increases.2. Estimating over-diagnosis (defined as a screen-detected cancer that would not have been diagnosed (before the individual died) in the absence of screening) from a trial in which the control arm are all offered screening at the end of the trial. The idea is that with extended follow-up data one may be able to apply methods designed for non-compliance to estimate over-diagnosis.3. Improving ecological studies and trend analyses to try to estimate the effects of screening on incidence (over-diagnosis or cancer prevention) and mortality taking into account secular trends in incidence and mortality.4. Meta-analysis of randomised trials of screening that are heterogeneous in terms of screening interval, duration of follow-up after the last screen, and whether or not the control group were offered screening at the end of the trial. The idea that we explore is whether by modelling the expected behaviour of the incidence function over time, one can combine estimates of the same quantity in the meta analysis.5. How should one quantify exposure to screening in an observational (case-control) study of cancer screening? The issue is whether one can use such studies to accurately estimate the benefit of screening at different intervals. We will discuss a few options and suggest that they may best be studied by applying them to simulated data.
We consider the problem of clustering in two important families of networks: signed and directed, both relatively less well explored compared to their unsigned and undirected counterparts. Both problems share an important common feature: they can be solved by exploiting the spectrum of certain graph Laplacian matrices or derivations thereof. In signed networks, the edge weights between the nodes may take either positive or negative values, encoding a measure of similarity or dissimilarity. We consider a generalized eigenvalue problem involving graph Laplacians, with performance guarantees under the setting of a signed stochastic block model. The second problem concerns directed graphs. Imagine a (social) network in which you spot two subsets of accounts, X and Y, for which the overwhelming majority of messages (or friend requests, endorsements, etc) flow from X to Y, and very few flow from Y to X; would you get suspicious? To this end, we also discuss a spectral clustering algorithm for directed graphs based on a complex-valued representation of the adjacency matrix, which is able to capture the underlying cluster structures, for which the information encoded in the direction of the edges is crucial. We evaluate the proposed algorithm in terms of a cut flow imbalance-based objective function, which, for a pair of given clusters, it captures the propensity of the edges to flow in a given direction. Experiments on a directed stochastic block model and real-world networks showcase the robustness and accuracy of the method, when compared to other state-of-the-art methods. Time permitting, we briefly discuss potential extensions to the sparse setting and regularization, applications to lead-lag detection in time series and ranking from pairwise comparisons.
As practising Data Science researchers and practitioners, the COVID-19 pandemic has highlighted both the need for data driven decision making and the reality of what it really takes to get to that point. It is not only about throwing data and models at a problem. It is about understanding the environment that one is in and then strategising on what might best work for that environment. In this talk I look back at some of the work we have done within responding to different challenges within both Data Science and Natural Language Processing. I place at the center people and how they are the important piece in our practice.
Please note different time from usual seminar time.
A Peskun ordering between two samplers, implying a dominance of one over the other, is known among the Markov chain Monte Carlo community for being a remarkably strong result, but it is also known for being one that is notably difficult to establish. Indeed, one has to prove that the probability to reach a state, using a sampler, is greater than or equal to the probability using the other sampler, and this must hold for all states excepting the current state. We provide in this paper a weaker version that does not require an inequality between the probabilities for all these states: the dominance holds asymptotically, as a varying parameter grows without bound, as long as the states for which the probabilities are greater than or equal to belong to a mass-concentrating set. The weak ordering turns out to be useful to compare lifted samplers for partially-ordered discrete state-spaces with their Metropolis–Hastings counterparts. An analysis yields a qualitative conclusion: they asymptotically perform better in certain situations (and we are able to identify these situations), but not necessarily in others (and the reasons why are made clear). The difference in performance is evaluated quantitatively in important applications such as graphical-model simulation and variable selection.
Joint work with Florian Maire (Université de Montréal).
The pre-print is available at: https://arxiv.org/abs/2003.05492. In the talk, I will focus on the motivations of our work, which will allow to motivate our theoretical result.
Time series and spatial data are ubiquitous in many application areas, such as environmental data, geosciences, astronomy, and finance. A key statistical modelling and estimation challenge for these data is that of dependance between points at different times or locations. While parametric models of covariance can be estimated via exact likelihood, this is ill-suited for many practical problems due to the heavy computational cost.
A standard approach to address this relies on approximate likelihood methods. The Whittle likelihood is one such approximation for gridded data, based on the Discrete Fourier Transform of the data. It is popular due to its n log n computational cost, robustness to non-Gaussian data, and amenability to interpretation in the spectral domain. However, Whittle likelihood estimates can suffer from a strong bias due to the finite and discrete sampling. This is true in particular for spatial data where bias dominates verses standard deviation in dimension equal or greater than two. Additionally, practical sampling patterns often diverge from theoretical requirements, due to non-square observational domains or missing data. In this presentation we present a recently proposed modification to the Whittle likelihood which addresses all these issues at once.
We provide asymptotic results under a framework which we call Significant Correlation Contribution, which allows us to understand the interplay between the sampling pattern and the covariance model. We demonstrate that our modification renders our estimate asymptotically efficient and normal for a wide class of settings and present some practical use cases.
Today we are confronted with huge and highly complex data and one main challenge is to determine the "structure" of complex networks or ''shape'' of data. In the past few years, geometric and topological methods, as powerful tools that originated from Riemannian geometry, are becoming popular for data analysis. In this seminar, after introducing Ollivier-Ricci curvature for (directed) hypergraphs, as one of the main recent applications, I will present the result of the implementation of this tool for the analysis of chemical reaction networks. We will see that this notion alongside Forman-Ricci curvature are edge-based complementary tools for detecting some important structures in the network.
High dimensional distributions, especially those with heavy tails, are notoriously difficult for off the shelf MCMC samplers: the combination of unbounded state spaces, diminishing gradient information and local moves, results in empirically observed "stickiness" and poor theoretical mixing properties - lack of geometric ergodicity. In this paper we introduce a new class of MCMC samplers that map the original high dimensional problem in Euclidean space onto a sphere and remedy these notorious mixing problems. In particular, we develop Random Walk Metropolis type algorithms as well as versions of Bouncy Particle Sampler that are uniformly ergodic for a large class of light and heavy tailed distributions and also empirically exhibit rapid convergence.
Joint work with Krzysztof Latuszynski and Gareth O. Roberts.
Everyone knows what the uniform probability distribution is on a real interval or on a finite set, but it is not so obvious what we should understand "uniform distribution" to mean on a completely arbitrary space. I will give a general definition, taking "space" to mean something slightly more general than compact metric space. The definition rests on a maximum entropy theorem for distributions on metric spaces, which in turn arose from questions about the measurement of biodiversity. This idea of seeking a systematic general notion of uniform distribution is similar in spirit to the quest for an objective prior, and indeed, is at least loosely related to it, as I will explain. (Joint work with Emily Roff.)
In this talk I will discuss recent and ongoing work on using topology to define and study weather regimes. The talk is based on joint work with K. Strommen, M. Chantry and J. Dorrington, with preprint available at https://arxiv.org/abs/2104.03196.
Zoom link: https://qmul-ac-uk.zoom.us/j/82103051171?pwd=NjJRckR5Z3lJRzRRZlFlblhDNGFzZz09
Abstract: In this talk, I summarise progress in building models for social networks that capture many of their well-known structural features. I focus on a modelling approach which construes global network structure as the outcome of dynamic, potentially realisation-dependent, interactive processes occurring within local neighbourhoods of a network. I describe a hierarchy of models implied by the approach and their estimation from partial network data structures obtained through certain types of network sampling schemes. I illustrate how these models can be used to enrich our understanding of community network structures and hence of processes such as the transmission of infectious diseases.
About the speaker: Prof Pip Pattison is a quantitative psychologist by background and the primary focus of her research is the development and application of mathematical and statistical models for social networks and network processes. She is currently the Deputy Vice-Chancellor (Education) at the University of Sydney.
Zoom Link
Factor copula models have been recently proposed for describing the joint distribution of a large number of variables in terms of a few common latent factors. A Bayesian procedure is employed in order to make fast inferences for multi-factor and structured factor copulas. To deal with the high dimensional structure, a Variational Inference (VI) algorithm is applied to estimate different specifications of factor copula models. Compared to the Markov Chain Monte Carlo (MCMC) approach, the variational approximation is much faster and could handle a sizable problem in limited time. Another issue of factor copula models is that the bivariate copula functions connecting the variables are unknown in high dimensions. An automatic procedure is derived to recover the hidden dependence structure. By taking advantage of the posterior modes of the latent variables, the bivariate copula functions are selected by minimizing the Bayesian Information Criterion (BIC). Simulation studies in different contexts show that the procedure of bivariate copula selection could be very accurate in comparison to the true generated copula model. The proposed procedure is illustrated with two high dimensional real data sets.
Informally, a ‘spurious correlation’ is the dependence of a model on some aspect of the input data that an analyst thinks shouldn’t matter. In machine learning, these have a know-it-when-you-see-it character, e.g., changing the gender of a sentence’s subject changes a sentiment predictor’s output. I'll talk about counterfactual invariance, a causal formalization of the requirement that changing irrelevant parts of the input shouldn’t change model predictions. We connect counterfactual invariance to out-of-domain model performance, and provide schemes for learning (approximately) counterfactual invariant predictors (without access to counterfactual examples). It turns out that both the means and meaning of counterfactual invariance depend fundamentally on the true underlying causal structure of the data. Distinct causal structures require distinct regularization schemes to induce counterfactual invariance. Similarly, counterfactual invariance implies different domain shift guarantees depending on the underlying causal structure. This theory is supported by empirical results on text classification.
Time-varying parameter (TVP) models are a popular tool for handling data with smoothly changing parameters. However, in situations with many parameters the flexibility underlying these models may lead to overfitting models and, as a consequence, to a severe loss of statistical efficiency. This occurs, in particular, if only a few parameters are indeed time-varying, while the remaining ones are constant or even insignificant. As a remedy, hierarchical shrinkage priors have been introduced for TVP models to allow shrinkage both of the initial parameters as well as their variances toward zero.
The talk reviews various approaches of introducing shrinkage priors for TVP models. Recently, Cadonna et al (2020) introduced the (hierarchical) triple Gamma prior which includes other popular shrinkage priors such as the double Gamma prior and the horseshoe prior as special cases. The talk also discussed efficient methods for MCMC inference and investigates the close resemblance of the triple Gamma prior with BMA. For illustration, hierarchical shrinkage priors are applied to TVP-VAR-SV models, a popular tool for modelling multivariate macroeconomic time series. The results clearly indicate that shrinkage priors reduce the risk of overfitting and increase statistical efficiency in a TVP modelling framework.
(based on joint work with Annalisa Cadonna and Peter Knaus, Vienna University of Economics and Business)
Full version of Cadonna et al (2020): https://doi.org/10.3390/econometrics8020020
At the moment, in the UK all women go through the same breast cancer screening programme. But different women have different levels of risk of getting breast cancer. In our project we are looking at how we can adjust breast cancer screening programmes so that they are more suited to the level of risk faced by each woman - this is known as risk-adapted screening. In particular, breast density is the amount of white and bright regions seen on a mammogram. High breast density can make it harder for doctors to detect breast cancer on a screening mammogram and also increases the risk of developing breast cancer. I am going to talk about how we are planning to use AI algorithms to objectively measure breast density and answer the question: how can we tell when a woman might be at risk of getting a false negative during a standard mammogram, and should be offered an alternative screening method? This is part of the CRUK funded project “An Artificial Intelligence System for Real-time Risk Assessment at Mammography Screening (Mammo AI)”
We propose a new autoregressive model for the analysis of time-series with periodic interdependencies. The model is based on the application of a vector autoregressive model to univariate data that is partitioned into ‘blocks’ of observations. For this reason, we refer to it as the block-autoregressive (BAR) model. The untransformed BAR model nests several other autoregressive models such as the regular AR model, the periodic AR model, the (mixed) seasonal AR model, and the scale-specific AR model that was introduced by Bandi et. al (2019). In addition, the BAR model can be transformed using orthonormal bases to unveil dependencies between weighted averages of observations in subsequent blocks. This yields parsimonious model representations that enhance interpretability and improve predictive performance. The model is estimated using OLS and parametric bootstrapping methods in the case of large samples, which is complemented by a basis-specific LASSO step for smaller samples. Both simulated and empirical examples are used to illustrate the model. Joint with Dick van Dijk and Karel de Wit.
With larger amounts of data at their disposal, scientists are emboldened to tackle complex questions that require sophisticated statistical models. It is not unusual for the latter to have likelihood functions that elude analytical formulations. Even under such adversity, when one can simulate from the sampling distribution, Bayesian analysis can be conducted using approximate methods such as Approximate Bayesian Computation (ABC) or Bayesian Synthetic Likelihood (BSL). A significant drawback of these methods is that the number of required simulations can be prohibitively large, thus severely limiting their scope. We propose perturbed MCMC samplers that can be used within the ABC and BSL paradigms to significantly accelerate computation while maintaining control on computational efficiency. The proposed strategy relies on recycling samples from the chain’s past. The algorithmic design is supported by a theoretical analysis while practical performance is examined via a series of simulation examples and data analyses. This is joint work with Dr. Evgeny Levi.
The Bayesian approach to inference stands out for naturally allowing borrowing of information across heterogeneous populations or studies. Several popular classes of models in this setting induce a dependence structure on the observations that can be seen as a mixture between the two extreme cases of exchageability and unconditional independence. As an illustrative example in this direction, a recent proposal based on the Dirichlet process will be described. Such a structure leads one to consider the problem of measuring dependence in terms of the distance of the actual prior specification from the two extremes. The talk will describe a novel approach that relies on the Wasserstein distance and is suitably tailored to random measure based models. An application to some noteworthy models in the literature provides some useful insights.
Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit "finite activity identification", namely, they can identify the low-complexity structure in a finite number of iterations. However, most online algorithms (such as proximal stochastic gradient descent) do not have the property owing to the vanishing step-size and non-vanishing variance. In this talk, by combining with a screening rule, I will show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification. One consequence is that when combined with any convergent online algorithm, sparsity properties imposed by the regularizer can be exploited for computational gains. Numerically, significant acceleration can be obtained.
Complex simulators have become a ubiquitous tool in many scientific disciplines, providing high-fidelity, implicit probabilistic models of natural and social phenomena. Unfortunately, they typically lack the tractability required for conventional statistical analysis. Approximate Bayesian computation (ABC) has emerged as a key method in simulation-based inference, wherein the true model likelihood and posterior are approximated using samples from the simulator. In this talk, we will first draw connections between ABC and generalized Bayesian inference (GBI) by re-interpreting the accept/reject step in ABC as an implicitly defined error model. Then we argue that these implicit error models will invariably be misspecified.
While ABC posteriors are often treated as a necessary evil for approximating the standard Bayesian posterior, this allows us to re-interpret ABC as a potential robustification strategy. In a second step, we will turn our attention to some recent machine learning approaches to simulation-based inference. While those methods are designed to be exact when the true data generating mechanism is known, we will show that neural density estimators can perform poorly when this assumption is violated. Using our findings on ABC we will argue for a combination of machine-learning and statistics approach to obtain a reliable, but highly efficient algorithm for posterior inference in intractable models.
Numerous Bayesian network structure learning algorithms have been proposed in the literature over the past few decades. Each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with synthetic data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This presentation will provide a brief introduction to the two main classes of structure learning, called constraint-based and score-based, and illustrate how different assumptions of data noise influence structure learning performance.
Tensor-valued data are becoming increasingly available in economics and this calls for suitable econometric tools. We propose a new dynamic linear model for tensor-valued response variables and covariates that encompasses some well-known econometric models as special cases. Our contribution is manifold. First, we define a tensor autoregressive process (ART), study its properties, and derive the associated impulse response function. Second, we exploit the PARAFAC low-rank decomposition for providing a parsimonious parametrization and to incorporate sparsity effects. We also contribute to inference methods for tensors by developing a Bayesian framework which allows for including extra-sample information and for introducing shrinking effects. We apply the ART model to time-varying multilayer networks of international trade and capital stock and study the propagation of shocks across countries, over time and between layers.
The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this talk, we propose a nested Common Atoms Model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. If time allows, we will also discuss an application to the analysis of time series calcium imaging experiments in awake behaving animals.We further investigate the performance of our model in capturing true distributional structures in the population by means of simulation studies.
This talk presents a mathematical and computational methodology for performing Bayesian inference in problems where prior knowledge is available in the form of a training dataset or set of training examples. This prior information is encoded into the model by using a deep neural network, which is combined with an explicit likelihood function by using Bayes' theorem to derive the posterior distribution for the quantities of interest given the available data. Bayesian computation is then performed by using appropriate Markov chain Monte Carlo stochastic algorithms. We study the properties of the proposed models and computation algorithms and illustrate performance on a range of inverse problems related to imaging sciences, where they are used to perform Bayesian point estimation, uncertainty quantification, hypothesis testing, and model misspecification diagnosis.
Based on a joint work with Matthew Holden and Kostas Zygalakis.
From a Bayesian perspective, mixture models have been characterised by a restrictive prior modelling since their ill-defined nature makes most of the improper priors not acceptable. In particular, recent results have shown the inconsistency of the posterior distribution on the number of components when using standard nonparametric prior processes.
We propose an analysis of prior choices associated by their property of conservativeness in the number of components. Among the proposals, we derive a prior distribution on the number of clusters which considers the loss one would incur if the true value representing the number of components were not considered. The prior has an elegant and easy to implement structure, which allows to naturally include any prior information one may have as well as to opt for a default solution in cases where this information is not available.
The methods are then applied on two real datasets. The first dataset consists of retrieval times for monitoring IP packets in computer network systems. The second dataset consists of measures registered in antimicrobial susceptibility tests for 14 compounds used in the treatment of M. Tuberculosis. In both the situations, the number of clusters is uncertain and different solutions lead to different interpretations.
This talk introduces a method for selecting high-dimensional models based on a truncation mechanism to generate sparse estimating equations. Given a set of low-dimensional estimating equations for the model parameters, a high-dimensional model is selected by minimizing the distance between a composite estimating equation and the full likelihood scores subject to a L1-type penalty. The proposed strategy reduces the overall model complexity by dropping the noisy terms in the estimating equations. Differently from other approaches to model selection, our penalty involves the inclusion of low-dimensional equations rather than model parameters; this implies that consistency of the final parameter estimates is unaffected by the selection mechanism. Numerical and statistical efficiency of the new methodology is illustrated through examples on simulated and real data.
Zoom link.
Image search with text feedback has promising impacts in various real-world applications, such as e-commerce and internet search. Given a reference image and text feedback from user, the goal is to retrieve images that not only resemble the input image, but also change certain aspects in accordance with the given text. This is a challenging task as it requires the synergistic understanding of both image and text. In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework. Specifically, we propose a composite transformer that can be seamlessly plugged in a CNN to selectively preserve and transform the visual features conditioned on language semantics. By inserting multiple composite transformers at varying depths, VAL is incentive to encapsulate the multi-granular visiolinguistic information, thus yielding an expressive representation for effective image search. We conduct comprehensive evaluation on three datasets: Fashion200k, Shoes and FashionIQ. Extensive experiments show our model exceeds existing approaches on all datasets, demonstrating consistent superiority in coping with various text feedbacks, including attribute-like and natural language descriptions.
This work was presented in CVPR 2020. Link to paper here.
In the face of rapidly changing data, a range of case fatality ratio estimates for coronavirus disease 2019 (COVID-19) have been produced that differ substantially in magnitude. We aimed to provide robust estimates, accounting for censoring and ascertainment biases. These early estimates give an indication of the fatality ratio across the spectrum of COVID-19 disease and show a strong age gradient in risk of death.
Lancet paper here.
In this talk, we propose a novel asymmetric continuous probabilistic score (ACPS) for evaluating and comparing density forecasts. It extends the proposed score and defines a weighted version, which emphasizes regions of interest, such as the tails or the center of a variable's range. A test is also introduced to statistically compare the predictive ability of different forecasts. The ACPS is of general use in any situation where the decision maker has asymmetric preferences in the evaluation of the forecasts. In an artificial experiment, the implications of varying the level of asymmetry in the ACPS are illustrated. Then, the proposed score and test are applied to assess and compare density forecasts of macroeconomic relevant datasets (US employment growth) and of commodity prices (oil and electricity prices) with particular focus on the recent COVID-19 crisis period.
This is a joint work with Matteo Iacopini and Francesco Ravazzolo. Link to the paper here.
Covariate-adjusted response-adaptive (CARA) designs use available responses to skew the treatment allocation in an ongoing clinical trial in favour of the treatment arm found at an interim stage to be best for a patient’s covariate profile.
There has recently been extensive research on CARA designs mainly involving binary responses. Though exponential survival responses have also been considered, the constant hazard property of the exponential model makes the mean residual life for patients constant, making it too restrictive for wide-ranging applicability. To overcome this limitation, designs are developed for Weibull distributed survival responses by deriving two variants of optimal designs based on an optimality criterion.
The optimal designs are based on the covariate-adjusted doubly-adaptive biased coin design (CADBCD) in one case, and the covariate-adjusted efficient randomised adaptive design (CAERADE) in the other.. The observed treatment allocation proportions for these designs converge to the expected targeted values, which are derived based on constrained optimization problems. The existing large sample theory for CARA designs rely on Taylor expansion of the allocation probability function, which do not apply to the CAERADE, as it is a discrete and discontinuous function. To overcome this difficulty of discontinuity, and to establish the asymptotic properties of the CAERADE, a stopping time of a martingale process has been introduced. A comparative analysis of these two optimal designs are also discussed. Given the treatment allocation history, response histories, previous covariate information and the covariate profile of the incoming patient, an expression for the conditional probability of a patient being allocated to a particular treatment has been obtained. To apply such designs, the treatment allocation probabilities are sequentially modified based on the history of previous patients’ treatment assignments, responses, covariates and the covariates of the new patient.
For a Phase III clinical trial, the CAERADE is preferable to the CADBCD when the main objective is to minimise the asymptotic variance of the allocation procedure. However, the former procedure being discrete tends to be slower in converging towards the expected target allocation proportion. Since the CAERADE provides a design with minimum variance, it is better than the CADBCD as far as the power of the Wald test for testing treatment differences is concerned. An extensive simulation study of the operating characteristics of the proposed designs supports these findings. It is concluded that the proposed CARA procedures can be suitable alternatives to the traditional balanced randomization designs in survival trials, provided that response data are available during the recruitment phase to enable adaptations to the designs. The findings are illustrated extensively by redesigning an existing clinical trial for treating colorectal cancer.
Keywords: Censored Responses; Optimum allocation; Power; Variability; Covariate Profile
Cancelled because of coronavirus.
Tests are a building block of our modern education system. Many tests are high-stake, such as admission, licensing, and certification tests, that can significantly change one’s life trajectory. For this reason, ensuring fairness in educational tests is becoming an increasingly important problem. This paper concerns the issue of item preknowledge in educational tests due to item leakage. That is, a proportion of test takers have access to leaked items before a test is administrated, which leads to inflated performance on the set of leaked items. We develop methods for the simultaneous detection of cheating test takers and compromised items based on data from a single test administration, when both sets are completely unknown. Latent variable models are proposed for the modelling of (1) data consisting only of item-level binary scores and (2) data consisting of both item-level binary scores and response time, where the former is commonly available in paper-and-pencil tests and the latter is widely encountered in computer-based tests. The proposed model adds a latent class model component upon a factor model (also known as item response theory model) component, where the factor model component captures item response behaviour driven by test takers’ ability and the latent class model component captures item response behaviour due to item preknowledge. We further propose a statistical decision framework, under which compound decision rules are developed that control local false discovery/non-discovery rates. Statistical inference is carried out under a Bayesian framework. The proposed method is applied to data from a computer-based nonadaptive licensure assessment.
This is a joint work with Prof. Irini Moustaki and Ms. Yan Lu (PhD student).
We investigate semiparametric Bayesian inference for average treatment effects based on observational data, which is a challenging problem due to the missing counterfactuals and selection bias. This model has applications in biostatistics and causal inference.
We show that standard Gaussian process priors satisfy a semiparametric Bernstein-von Mises theorem under sufficient smoothness conditions, thereby showing that the posterior can yield optimal interference. We further propose a novel propensity score-based prior modification that corrects for the first-order posterior bias. Numerical simulations confirm significant improvement in both estimation accuracy and uncertainty quantification compared to using an unmodified Gaussian process.
We present an approach to Bayesian semiparametric inference for Gaussian multivariate response regression. We are motivated by various small and medium dimensional problems from the physical and social sciences. The statistical challenges revolve around dealing with the unknown mean and variance functions and in particular, the correlation matrix. To tackle these problems, we have developed priors over the smooth functions and a Markov chain Monte Carlo algorithm for inference and model selection. Specifically: Dirichlet process mixtures of Gaussian distributions is used as the basis for a cluster-inducing prior over the elements of the correlation matrix. The smooth, multidimensional means and variances are represented using radial basis function expansions. The complexity of the model, in terms of variable selection and smoothness, is then controlled by spike-slab priors. A simulation study is presented, demonstrating performance as the response dimension increases. Finally, the model is fit to a number of real world datasets.
Study designs where data have been aggregated by geographical areas are popular in environmental epidemiology. These studies are commonly based on administrative databases and, providing a complete spatial coverage, are particularly appealing to make inference on the entire population. However, the resulting estimates are often biased and difficult to interpret due to unmeasured confounders, which typically are not available from routinely collected data. We propose a framework to improve inference drawn from such studies exploiting information derived from individual-level survey data. The latter are summarized in an area-level scalar score by mimicking at ecological-level the well-known propensity score methodology. The literature on propensity score for confounding adjustment is mainly based on individual-level studies and assumes a binary exposure variable. Here we generalize its use to cope with area-referenced studies characterized by a continuous exposure. Our approach is based upon Bayesian hierarchical structures specified into a two-stage design: (i) geolocated individual-level data from survey samples are up-scaled at ecological-level, then the latter are used to estimate a generalized ecological propensity score (EPS) in the in-sample areas; (ii) the generalized EPS is imputed in the out-of-sample areas under different assumptions about the missingness mechanisms, then it is included into the ecological regression, linking the exposure of interest to the health outcome. This delivers area-level risk estimates which allow a fuller adjustment for confounding than traditional areal studies. The methodology is illustrated by using simulations and a case study investigating the risk of lung cancer mortality associated with nitrogen dioxide in England (UK).
In shape analysis, objects are often represented as configurations of points, known as landmarks. The case where the correspondence between landmarks on different objects is unknown is called unlabelled shape analysis. The alignment task is then to simultaneously identify the correspondence between landmarks and the transformation aligning the objects. In this talk, I will discuss the alignment of unlabelled shapes, and discuss two applications to problems in structural bioinformatics. The first is a problem in drug discovery, where the main objective is to find the shape information common to all, or subsets of, a set of active compounds. The approach taken resembles a form of clustering, which also gives estimates of the mean shapes of each cluster. The second application is the alignment of protein structures, which will also serve to illustrate how the modelling framework can incorporate very general information regarding the properties we would like alignments to have; in this case, expressed through the sequence order of the points (amino acids) of the proteins.
In hierarchical polynomial regression, an interaction term such as x1x2 is included in the model only if both main effects x1 and x2 are also included in the model. We note that the divisibility conditions implicit in polynomial hierarchy give way to natural constraints for the model parameters. Our work uses this idea to derive versions of strong and weak hierarchy and to extend existing work in the literature, which at the moment is only concerned with models of degree two. We discuss how to estimate parameters in lasso using standard quadratic programming techniques and apply our proposal to some examples. This is joint work with S. Lunagomez (Lancaster University).
Automated decision-making for medical diagnosis consists of producing differentials for various diseases based on evidence about the state of the patient. A particular way to encode the various relationships between symptoms, risk-factors, and diseases is by using a Bayesian network, where the edge structure reflects the underlying causal mechanisms between the nodes. Due to the combinatorial explosion of computing posterior distributions exactly, various approximate inference schemes have been proposed to tackle this problem, such as variational inference and importance sampling, among others. In addition, amortisation techniques allow us to reduce the cost of inference by carrying out and storing some computations offline. In the medical-diagnosis task, producing highly-accurate marginals is key to differential diagnosis. Importance sampling is particularly suited for this, as it is asymptotically exact and a good choice of proposal can provide a reduction in variance. In this talk, I will discuss how we can construct various data-driven proposals by using an inverse factorisation of the model’s joint distribution. The proposal distributions are based on a neural network that is trained with samples from the generative model before inference takes place, whereas the inverse factorisation provides the sampling schedule for the importance sampling scheme. We explored the impact of different inverse factorisations in terms of variance reduction. Our findings reveal that the new scheme produces competitive data-driven proposals for importance sampling.
This is joint work with Divya Gautam, Kostis Gourgoulias, Saurabh Johri and Maneesh Sahani.
Short bio:
Maria Lomeli is currently a research scientist at Babylon Health, UK. Previously, she was a research associate at the Machine Learning group, University of Cambridge, working with Zoubin Ghahramani. She obtained her PhD from the Gatsby Unit, UCL under the supervision of Yee Whye Teh.
The LASSO has recently attracted attention in the context of models with hierarchy restrictions. In these models, an interaction term is allowed only if both main effects are active (strong hierarchy) or if at least one main effect is active (weak hierarchy). For example, under strong hierarchy appearance of the term x1x2 in a model requires both x1 and x2, while under weak hierarchy at least one of x1, x2 is needed. Our work is motivated by possible higher-order interactions in linear regression models. We are concerned with enhancing the performance of LASSO for square-free hierarchical polynomial models when combining validation error with a measure of model complexity. The measure of the complexity is the sum of Betti numbers of the model, seen as a simplicial complex. We represent the polynomial regression model in terms of components and cycles, borrowing from recent developments in computational topology. We use LASSO as our model selection method combined with Betti numbers. We study and propose an algorithm which combines statistical and algebraic criteria. This compound criterion would allow us to deal with model selection problems of higher-order interactions in polynomial regression models.
The preferential attachment (PA) network is a popular way of modeling the social networks, the collaboration networks and etc. The PA network model is an evolving network model where new nodes keep coming in. When a new node comes in, it establishes only one connection with an existing node. The random choice on the existing node is via a multi-nomial distribution with probability weights based on a preferential function f on the degrees. f maps the natural numbers to the positive real line and is assumed apriori non-decreasing, which means the nodes with high degrees are more likely to get new connections, i.e. "the rich get richer". Under sublinear parametric assumptions on the PA function, we proposed the maximum likelihood estimator on f. We show that the MLE yields optimal performance with the asymptotic normality results. Despite the optimal property of the MLE, it depends on the history of the network evolution, which is often difficult to obtain in practice. To avoid such shortcomings of the MLE, we propose the quasi maximum likelihood estimator (QMLE), a history-free remedy of the MLE. To prove the asymptotic normailty of the QMLE, a connection between the PA model and Svante Janson's urn models is exploited.
This is (partially) joint work with Aad van der Vaart.
One of the major challenges for teachers of Statistics to non-statisticians is the high levels of statistical anxiety amongst students, student’s perceptions of what their experience has been to learn statistics or mathematics in the past and the potential of the negative impact of these attitudes or beliefs on how students learn statistics.
This talk will aim to showcase how to use different types inclusive practice activities and assessment methods constructively aligned with the learning outcomes, to support in developing students’ confidence in the classroom; through providing a supportive learning environment that works through building trust, setting expectations and making statistics fun.
The problem of univariate mean change point detection and localization based on a sequence of n independent observations with piecewise constant means has been intensively studied for more than half century, and serves as a blueprint for change point problems in more complex settings. We provide a complete characterization of this classical problem in a general framework in which the upper bound on the noise variance sigma^2, the minimal spacing Delta between two consecutive change points and the minimal magnitude of the changes kappa, are allowed to vary with n. We first show that consistent localization of the change points when the signal-to-noise ratio kappa X sqrt(Delta) / sigma is uniformly bounded from above is impossible. In contrast, when kappa X sqrt(Delta) / sigma is diverging in n at any arbitrary slow rate, we demonstrate that two computationally-efficient change point estimators, one based on the solution to an L0-penalized least squares problem and the other on the popular WBS algorithm, are both consistent and achieve a localization rate of the order log(n) X (sigma / kappa)^2. We further show that such rate is minimax optimal, up to a log(n) term.
Preprint arXiv
Several studies on heritability in twins aim at understanding the different contribution of environmental and genetic factors to specific traits. Considering the national merit twin study, our purpose is to analyse correctly the influence of socio-economic status on the relationship between twins’ cognitive abilities. Our methodology is based on conditional copulas, which enable us to model the effect of a covariate driving the strength of dependence between the main variables. We propose a flexible Bayesian non-parametric approach for the estimation of conditional copulas, which can model any conditional copula density. Our methodology extends the work of Wu, Wang and Walker in 2015 by introducing dependence from a covariate in an infinite mixture model. Our results suggest that environmental factors are more influential in families with lower socio-economic position.
In this talk I will discuss recent work on the problem of testing the independence of two multivariate random vectors, given a sample from the underlying population. Classicalmeasures of dependence such as Pearson correlation or Kendall’s tau are often found to not capture the complex dependence between variables in modern datasets, and in recent years a large literature has developed on defining appropriate nonparametric measures of dependence and associated tests. We take the information-theoretic quantity mutual information as our starting point, and define a new test, which we call MINT, based on the estimation of this quantity, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances.
The proposed critical values of our test, which may be obtained by simulation in the case where an approximation to one marginal is available or by permuting the data otherwise, facilitate size guarantees, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector. The theory is supported by numerical studies on both simulated and real data.
The juvenile life stage is a crucial determinant of forest dynamics and a first indicator of changes to species’ ranges under climate change. However, paucity of detailed re-measurement data of seedlings, saplings and small trees means that their demography is not well understood at large scales. In this study we quantify the effects of climate and density dependence on recruitment and juvenile growth and mortality rates of thirteen species measured in the Spanish Forest Inventory. Single-census sapling count data is used to constrain demographic parameters of a simple forest juvenile dynamics model using a likelihood-free parameterisation method, Approximate Bayesian Computation. Our results highlight marked differences between species, and the important role of climate and stand structure, in controlling juvenile dynamics. Recruitment had a hump-shaped relationship with conspecific density, and for most species conspecific competition had a stronger negative effect than heterospecific competition. Recruitment and mortality rates were positively correlated, and Mediterranean species showed on average higher mortality and lower growth rates than temperate species. Under climate change our model predicted declines in recruitment rates for almost all species. Defensible predictive models of forest dynamics should include realistic representation of critical early life-stage processes and our approach demonstrates that existing coarse count data can be used to parameterise such models. Approximate Bayesian Computation approaches have potentially wide ecological application, in particular to unlock information about past processes from current observations.
Smooth supersaturated polynomials have been used for building emulators in computer experiments. The response surfaces built with this method are simple to interpret and have spline-like properties (Bates et al., 2014). We extend the methodology to build smooth logistic regression models. The approach we follow is to regularize the likelihood with a penalization term that accounts for the roughness of the regression model.
The response surface follows data closely yet it is smooth and does not oscillate. We illustrate the method with simulated data and we also present a recent application to build a prediction rule for psychiatric hospital readmissions of patients with a diagnosis of psychosis. This application uses data from the OCTET clinical trial (Burns et al., 2013).
Latent ability models relate a set of observed variables to a set of latent ability variables. It includes the paired and multiple comparison models, the item response theory models, etc. In this talk, first I will present an online Bayesian approximate method for online gaming analysis using paired and multiple comparison models. Experiments on game data show that the accuracy of the proposed online algorithm is competitive with state of the art systems such as TrueSkill. Second, an efficient algorithm is proposed for Bayesian parameter estimation for item response theory models. Experiments show that the algorithm works well for real Internet ratings data. The proposed method is based on the Woodroofe-Stein identity.
Actor-event data are common in sociological settings, whereby one registers the pattern of attendance of a group of social actors to a number of events. We focus on 79 members of the Noordin Top terrorist network, who were monitored attending 45 events. The attendance or non-attendance of the terrorist to events defines the social fabric, such as group coherence and social communities. The aim of the analysis of such data is to learn about this social structure. Actor-event data is often transformed to actor-actor data in order to be further analysed by network models, such as stochastic block models. This transformation and such analyses lead to a natural loss of information, particularly when one is interested in identifying, possibly overlapping, subgroups or communities of actors on the basis of their attendances to events. In this paper we propose an actor-event model for overlapping communities of terrorists, which simplifies interpretation of the network. We propose a mixture model with overlapping clusters for the analysis of the binary actor-event network data, called manet, and develop a Bayesian procedure for inference. After a simulation study, we show how this analysis of the terrorist network has clear interpretative advantages over the more traditional approaches of network analysis
Derivatives can be presented in a nice, simple and intuitive way that everyone can relate to. In my talk I will not just concentrate on the specific topic of communicating derivatives to the general public but will give other examples, partly taken from the project "The Big Van Theory" that uses comedy as a vehicle to bring science to the general public. For further context, see the pages www.bigvanscience.com/index_en.html(link is external) and www.youtube.com/channel/UCH-Z8ya93m7_RD02WsCSZYA(link is external).
New results about large sample approximations for statistical inference and change point analysis of high dimensional vector time series are presented. The results deal with related procedures that can be based on an increasing number of bilinear forms of the sample variance-covariance matrix as arising, for instance, when studying change-in-variance problems for projection statistics and shrinkage covariance matrix estimation.
Contrary to many known results, e.g. from random matrix theory, the results hold true without any constraint on the dimension, the sample size or their ratio, provided the weighting vectors are uniformly l1-bounded. Those results are in terms of (strong resp. weak) approximations by Gaussian processes for partial sum and CUSUM type processes, which imply (functional) central limit theorems under certain conditions. It turns out that the approximations by Gaussian processes hold not only without any constraint on the dimension, the sample size or their ratios, but even without any such constraint with respect to the number of bilinear forms. For the unknown variances and covariances of these bilinear forms nonparametric estimators are proposed and shown to be uniformly consistent.
We present related change-point procedures for the variance of projection statistics as naturally arising in principal component analyses and dictionary learning, amongst others. Further, we discuss how the theoretical results lead to novel distributional approximations and sequential methods for shrinkage covariance matrix estimators in the spirit of Ledoit and Wolf.
This is joint work with Rainer v. Sachs, UC Louvain, Belgium. The work of Ansgar Steland was support by a grant from Deutsche Forschungsgemeinschaft (DFG), grant STE 1034/11-1.
I will present a statistical approach to distinguish and interpret the complex relationship between several predictors and a response variable at the small area level, in the presence of i) high correlation between the predictors and ii) spatial correlation for the response. Covariates which are highly correlated create collinearity problems when used in a standard multiple regression model. Many methods have been proposed in the literature to address this issue. A very common approach is to create an index which aggregates all the highly correlated variables of interest. For example, it is well known that there is a relationship between social deprivation measured through the Multiple Deprivation Index (IMD) and air pollution; this index is then used as a confounder in assessing the effect of air pollution on health outcomes (e.g. respiratory hospital admissions or mortality). However it would be more informative to look specifically at each domain of the IMD and at its relationship with air pollution to better understand its role as a confounder in the epidemiological analyses. In this paper we illustrate how the complex relationships between the domains of IMD and air pollution can be deconstructed and analysed using profile regression, a Bayesian non-parametric model for clustering responses and covariates simultaneously. Moreover, we include an intrinsic spatial conditional autoregressive (ICAR) term to account for the spatial correlation of the response variable.
I will discuss the power of analytics in a political landscape. How can we revolutionise and communicate politics using analytics and data visualisation?
I will be covering the current downfalls of our current interaction with politics and move on to discuss the power analytics and visuals that could hold if presented well.
Clinical trials typically randomise patients to the different treatment arms using a fixed randomisation scheme, such as equal randomisation. However, such schemes mean that a large number of patients will continue to be allocated to inferior treatments throughout the trial. To address this ethical issue, response-adaptive randomisation schemes have been proposed, which update the randomisation probabilities using the accumulating response data so that more patients are allocated to treatments that are performing well.A long-standing barrier to using response-adaptive trials in practice, particularly from a regulatory viewpoint, is concern over bias and type I error inflation. In this talk, I will describe recent methodological advances that aim to address both of these concerns.
First I give a summary of a paper by Bowden and Trippa (2017) on unbiased estimation for response adaptive trials. The authors derive a simple expression for the bias of the usual maximum likelihood estimator, and propose three procedures for bias-adjusted estimation.I then present recent work on adaptive testing procedures that ensure strong familywise error control. The approach can be used for both fully-sequential and block randomised trials, and for general adaptive randomisation rules. We show there can be a high price to pay in terms of power to achieve familywise error control for randomisation schemes with extreme allocation probabilities. However, for proposed Bayesian adaptive randomisation schemes in the literature, our adaptive tests maintain or increase the power of the trial.
Heart failure is characterised by recurrent hospitalisations and yet often only the first is considered in clinical trial reports. In chronic diseases, such as heart failure, analysing all such hospitalisations gives a more complete picture of treatment benefit.
An increase in heart failure hospitalisations is associated with a worsening condition meaning that a comparison of heart failure hospitalisation rates, between treatment groups, can be confounded by the competing risk of death. Any analyses of recurrent events must take into consideration informative censoring that may be present. The Ghosh and Lin (2002) non-parametric analysis of heart failure hospitalisations takes mortality into account whilst also adjusting for different follow-up times and multiple hospitalisations per patient. Another option is to treat the incidence of cardiovascular death as an additional event in the recurrent event process and then adopt the usual analysis strategies. An alternative approach is the use of joint modelling techniques to obtain estimates of treatment effects on heart failure hospitalisation rates, whilst allowing for informative censoring.
This talk shall outline the different methods available for analysing recurrent events in the presence of dependent censoring and the relative merits of each method shall be discussed.
In this talk we discuss a class of M-estimators of parameters in GARCH models. The class of estimators contains least absolute deviation and Huber's estimator as well as the well-known quasi maximum likelihood estimator. For some estimators, the asymptotic normality results are obtained only under the existence of fractional unconditional moment assumption on the error distribution and some mild smoothness and moment assumptions on the score function. Next we analyse the bootstrap approximation of the distribution of M-estimators. It is seen that the bootstrap distribution (given the data) is a consistent estimate (in probability) of the distribution of the M-estimators. We propose an algorithm for the computation of M-estimates which at the same time is software-friendly to compute the bootstrap replicates from the given data. We illustrate our algorithm through simulation study and the analysis of recent financial data.
Precise knowledge of the tail behaviour of a distribution as well as predicting capabilities about the occurrence of extremes are fundamental in many areas of applications, for instance environmental sciences and finance. Standard inferential routines for extremes require the imposition of arbitrary assumptions which may negatively affect the statistical estimates. The model class of extreme value mixture models, on the other hand, allows for the precise estimation of the tail of a distribution without requiring any arbitrary assumption. After reviewing these models, the talk will discuss two extensions of this approach I have been involved in. First, situations where different extreme structures may be useful to perform inference over the extremes of a time series will be discussed. These are dealt with a novel changepoint approach for extremes, where the changepoints are estimated via Bayesian MCMC routines. Second, an extension of extreme value mixture models to investigate extreme dependence in multivariate applications is introduced and its usefulness is demonstrated using environmental data.
The regression discontinuity design (RDD) is a quasi-experimental design that estimates the causal effects of a treatment when its assignment is defined by a threshold value for a continuous assignment variable. The RDD assumes that subjects with measurements within a bandwidth around the threshold belong to a common population, so that the threshold can be seen as a randomising device assigning treatment to those falling just above the threshold and withholding it from those who fall just below.
Bandwidth selection represents a compelling decision for the RDD analysis, since there is a trade-off between its size and bias and precision of the estimates: if the bandwidth is small, the bias is generally low but so is precision, if the bandwidth is large the reverse is true. A number of methods to select the “optimal” bandwidth have been proposed in the literature, but their use in practice is limited.We propose a methodology that, tackling the problem from an applied point of view, consider units’ exchangeability, i.e., their similarity with respect to measured covariates, as the main criteria to select subjects for the analysis, irrespectively of their distance from the threshold. We use a clustering approach based on a Dirichlet process mixture model and then evaluate homogeneity within each cluster using posterior distribution for the parameters defining the mixture, including in the final RDD analysis only clusters which show high homogeneity. We illustrate the validity of our methodology using a simulated experiment.
This talk will have three differentiated parts. In the first one, I will present some results in which we compare LASSO model selection methods with classical ones in sparse multi-dimensional contingency tables formed with binary variables with a log-linear modelling parametrization. In the second one, I will talk about Mendelian randomization in the presence of multiple instruments and will present results of an application to a data set with multiple metabolites. In the third one, I will talk about a clustering method that uses high-dimensional network theory (Markov Stability), and an application of it to a data set that contains messenger ribonucleic acids (mRNAs) and micro ribonucleic acids (miRNAs) from a toxicological experiment.
The expectation part of a linear model is often presented as an equation with unknown parameters, and the reader is supposed to know that this is shorthand for a whole family of expectation models (for example, is there interaction or not?). I find it helpful to show the family of models on a Hasse diagram. By changing the lengths of the edges in this diagram, we can go a stage further and use it as a visual display of the analysis of variance.
The analyses of randomised controlled trials (RCTs) with missing data typically assume that, after conditioning on the observed data, the probability of missing data does not depend on the patient's outcome, and so the data are ‘missing at random’ (MAR). This assumption is often questionable, for example because patients in relatively poor health may be more likely to drop-out. In these cases, methodological guidelines recommend sensitivity analyses to recognise data may be ‘missing not at random’ (MNAR), and call for the development of practical, accessible, approaches for exploring the robustness of conclusions to MNAR assumptions.
We propose a Bayesian framework for this setting, which includes a practical, accessible approach to sensitivity analysis and allows the analyst to draw on expert opinion. To facilitate the implementation of this strategy, we are developing a new web-based tool for eliciting expert opinion about outcome differences between patients with missing versus complete data. The IMPROVE study, a multicentre trial which compares endovascular strategy (EVAR) with open repair for patients with ruptured abdominal aortic aneurysm, was used in the initial development work. In this seminar, we will discuss our proposed framework and demonstrate our elicitation tool, using the IMPROVE trial for illustration.
Development of treatments for rare diseases is challenging due to the limited number of patients available for participation. Learning about treatment effectiveness with a view to treat patients in the larger outside population, as in the traditional fixed randomised design, may not be a plausible goal. An alternative goal is to treat the patients within the trial as effectively as possible. Using the framework of finite-horizon Markov decision processes and dynamic programming (DP), a novel randomised response-adaptive design is proposed which maximises the total number of patient successes in the trial. Several performance measures of the proposed design are evaluated and compared to alternative designs through extensive simulation studies. For simplicity, a two-armed trial with binary endpoints and immediate responses is considered. However, further evaluations illustrate how the design behaves when patient responses are delayed, and modifications are made to improve its performance in this more realistic setting.
Simulation results for the proposed design show that: (i) the percentage of patients allocated to the superior treatment is much higher than in the traditional fixed randomised design; (ii) relative to the optimal DP design, the power is largely improved upon and (iii) the corresponding treatment effect estimator exhibits only a very small bias and mean squared error. Furthermore, this design is fully randomised which is an advantage from a practical point of view because it protects the trial against various sources of bias.
Overall, the proposed design strikes a very good balance between the power and patient benefit trade-off which greatly increases the prospects of a Bayesian bandit-based design being implemented in practice, particularly for trials involving rare diseases and small populations.
Keywords: Clinical trials; Rare diseases; Bayesian adaptive designs; Sequential allocation; Bandit models; Dynamic programming; Delayed responses.
We consider estimation of the causal treatment effects in randomised trials with non-adherence, where there is an interest in treatment effects modification by baseline covariates.
Assuming randomised treatment is a valid instrument, we describe two doubly robust (DR) estimators of the parameters of a partially linear instrumental variable model for the average treatment effect on the treated, conditionally on baseline covariate. The first method is a locally efficient g-estimator, while the second is a targeted minimum loss-based estimator (TMLE).
These two DR estimators can be viewed as a generalisation of the two-stage least squares (TSLS) method in the instrumental variable methodology to a semiparametric model with weaker assumptions. We exploit recent theoretical results to extend the use of data-adaptive machine learning to the g-estimator. A simulation study is used to compare the estimators' finite-sample performance (1) when fitted using parametric models, and (2) using Super Learner, with the TSLS.
Data-adaptive DR estimators have lower bias and improved precision, when compared to incorrectly specified parametric DR estimators. Finally, we illustrate the methods by obtaining the causal effect on the treated of receiving cognitive behavioural therapy training on pain-related disability, with heterogeneous treatment by depression at baseline, using the COPERS (COping with persistent Pain, Effectiveness Research in Self-management) trial.
Many modern day datasets exhibit multivariate dependance structure that can be modelled using networks or graphs. For example, in social sciences, biomedical studies, financial applications etc. the association of datasets with latent network structures are ubiquitous. Many of these datasets are time-varying in nature and that motivates the modelling of dynamic networks. In this talk I will present some of our recent research which looks at the challenging task of recovering such networks, even in high-dimensional settings.
Our approach studies the canonical Gaussian graphical model whereby patterns of variable dependence are encoded through partial correlation structure. I will demonstrate how regularisation ideas such as the graphical lasso may be implemented when data is drawn i.i.d. but how this may fail in non-stationary settings. I will then present an overview of our work (with Sandipan Roy, UCL) which extends such methods to dynamic settings. By furnishing appropriate convex M-estimators that enforce smoothness and sparsity assumptions on the Gaussian we demonstrate an ability to recover the true underlying network structure. I will present both synthetic experiments and theoretical analysis which shed light on the performance of these methods.
We study the problem of optimal oculomotor control during the execution of visual search tasks. We introduce a computational model of human eye movements, which takes into account various constraints of the human visual and oculomotor systems. In the model, the choice of the subsequent fixation location is posed as a problem of stochastic optimal control, which relies on reinforcement learning methods. We show that if biological constraints are taken into account, the trajectories simulated under learned policy share both basic statistical properties and scaling behaviour with human eye movements. We validated our model simulations with human psychophysical eyetracking experiments.
Multi-arm trials are increasingly being recommended for use in diseases where multiple experimental treatments are awaiting testing. This is because they allow a shared control group, which considerably reduces the sample size required compared to separate randomised trials. Further gains in efficiency can be obtained by introducing interim analyses (multi-arm multi-stage, MAMS trials). At the interim analyses, a variety of modifications are possible, including changing the allocation to different treatments, dropping of ineffective treatments or stopping the trial early if sufficient evidence of a treatment being superior to control is found. These modifications allow focusing of resources on the most promising treatments, and thereby increase both the efficiency and ethical properties of the trial.
In this talk I will describe some different types of MAMS designs and how they may be useful in different situations. I will also discuss the design of trials that test efficacy of multiple treatments in different patient subgroups. I propose a design that incorporates biological hypotheses about links between treatments and biomarker subgroups effects of treatments, but allows alternative links to be formed during the trial. The statistical properties of this design compare well to alternative approaches available.
Most statistical methodology for confirmatory phase III clinical trials focuses on the comparison of a control treatment with a single experimental treatment, with selection of this experimental treatment made in an earlierexploratory phase II trial. Recently, however, there has been increasing interest in methods for adaptive seamless phase II/III trials that combine the treatment selection element of a phase II clinical trial with the definitive analysis usually associated with phase III clinical trials. A number of methods have been proposed for the analysis of such trials to address the statistical challenge of ensuring control of the type I error rate. These methods rely on the independence of the test statistics used in the different stages of the trial.
In some settings the primary endpoint can be observed only after long-term follow-up, so that at the time of the first interim analysis primary endpoint data are available for only a relatively small proportion of the patients randomised. In this case if short-term endpoint data are also available, these could be used along with the long-term data to inform treatment selection. The use of such data breaks the assumption of independence underlying existing analysis methods. This talk presents new methods that allow for the use of short-term data. The new methods control the overall type I error rate, either when the treatment selection rule is pre-specified, or when it can be fully flexible. In both cases there is a gain in power from the use of the short-term endpoint data when the short and long-term endpoints are correlated.
Causal inference from observational data requires untestable assumptions. As assumptions may fail, it is important to be able to understand how conclusions vary under different premises. Machine learning methods are particularly good at searching for hypotheses, but they do not always provide ways of expressing a continuum of assumptions from which causal estimands can be proposed. We introduce one family of assumptions and algorithms that can be used to provide alternative explanations for treatment effects. If we have time, I will also discuss some other developments on the integration of observational and interventional data using a nonparametric Bayesian approach.
A personal perspective, gained from nearly 30 years of applying statistical methods in a variety of industries (FMCG, Defence, Paper, Pharmaceuticals and Vaccines).
The emphasis is on the application (not the theory) of statistics to support the manufacturing, quality control and R&D functions.
The objective of this session is to present real life examples/situations to raise awareness and stimulate discussion.
Buzz words: Experimental Design (DoE), Taguchi Methods, LeanSigma, Design for Manufacture (DfM), Process Capability, Statistical Process Control (SPC), Analytical Method Validation and Good Manufacturing Practice (GMP).
The funnel plot is a graphical visualisation of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modelling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the centre of mass of a physical system. We exploit this fact to forge new connections between statistical inference and bias adjustment in the evidence synthesis and causal inference literatures. An on-line web application (named the `Meta-Analyzer') is introduced to further facilitate this physical analogy. Finally, we demonstrate the utility of the Meta-Analyzer as a tool for detecting and adjusting for invalid instruments within the context of Mendelian randomization.
Statistical design and operation of clinical trials are affected by stochasticity in patient enrolment and various events' appearance. The complexity of large trials and multi-state hierarchic structure of various operational processes require developing modern predictive analytical techniques using stochastic processes with random parameters in the empirical Bayesian setting for efficient modelling and predicting trial operation.
Forecasting patient enrolment is one of the bottleneck problems as uncertainties in enrolment substantially affect trial time completion, supply chain and associated costs. An analytic methodology for predictive patient enrolment modelling using a Poisson-gamma model is developed by Anisimov and Fedorov (2005–2007). This methodology is extended further to risk-based monitoring interim trial performance of different metrics associated with enrolment, screen failures, various events, AE, and detecting outliers.
As the next stage of generalization, to model the complicated hierarchic processes on top of enrolment a new methodology using evolving stochastic processes is proposed. This class of processes provides a rather general and unified framework to describe various operational processes including follow-up patients, patients' visits, various events and associated costs.
The technique for evaluating predictive distributions, means and credibility bounds for evolving processes is developed (Anisimov, 2016). Some applications to modelling operational characteristics in clinical trials are considered. For these models, predictive characteristics are derived in a closed form, thus, Monte Carlo simulation is not required.
References1. Anisimov V., Predictive hierarchic modelling of operational characteristics in clinical trials. Communications in Statistics - Simulation and Computation, 45, 05, 2016, 1477–1488.
Data sets with many variables (often, in the hundreds, thousands, or more) are routinely collected in many disciplines. This has led to interest in variable selection in regression models with a large number of variables. A standard Bayesian approach defines a prior on the model space and uses Markov chain Monte Carlo methods to sample the posterior. Unfortunately, the size of the space (2^p if there are p variables) and the use of simple proposals in Metropolis-Hastings steps has led to samplers that mix poorly over models. In this talk, I will describe two adaptive Metropolis-Hastings schemes which adapt an independence proposal to the posterior distribution. This leads to substantial improvements in the mixing over standard algorithms in large data sets. The methods will be illustrated on simulated and real data with hundreds or thousands of possible variables.
By using advanced motion capture systems, human movement data can be collected densely over time. We construct a functional mixed-effects model to analyse such kind of data. This model is flexible enough to study functional data which are collected from orthogonal designs. Covariance structure plays a central role in functional data analysis. In this method, within-curve covariance is analysed under stochastic process perspective and between-curve covariance structure of functional responses is determined by the design. In particular, we are interested in the problem of hypothesis testing and generalize functional F test to the mixed-effects analysis of variance.
We apply this method to analyse movement patterns in patients with cerebral palsy. Hasse diagrams are used to represent the structure of these gait data from an orthogonal block design. In order to assess effects of ankle-foot orthoses, which are commonly-prescribed to patients with abnormal gait patterns, pointwise F tests and functional F tests are used. To explore more about how ankle-foot orthoses influence human movement, we are observing more gait data in a split-plot design. Randomizations of this design are based on Bailey (2008).
Online changepoint detection has its origins in statistical process control where once a changepoint is detected the process is stopped, the fault rectified and the process monitoring then begins in control again. In modern day applications such as network traffic and medical monitoring it is infeasible to adopt this strategy. In particular the out of control monitoring is often vital to diagnosis of the problem; instead of fault analysis monitoring continues throughout the period of change and a second change is indicated when the process returns to the control state.
Recent offline changepoint detection literature has demonstrated the importance of considering the changepoints globally and not focusing on detecting a single changepoint in the presence of several. In this talk we will argue that this is also the case for online changepoint detection and discuss what is meant by a "global" view in online detection. This presents several problems as the standard definitions of average run length and detection delay are not clearly applicable. Following consideration of this we show the increased accuracy in future (and past) changepoint detections when taking this viewpoint and demonstrate the method on real world applications.
In Bayesian inference the choice of prior distribution is important. The prior represents beliefs and knowledge about the parameter(s). For data from an exponential family a convenient prior is a conjugate one. This can be updated to find the posterior distribution and experts can choose the parameters as equivalent to imaginary samples. This technique can also be used to combine results from different studies. A disadvantage is that we are unaware of any degree of incompatibility between the prior chosen and the data obtained. This could represent overconfidence by selecting too small a variance or indicate differences between studies.
We suggest employing a mixture of conjugate priors which have the same mean but different finite variances. We give a large weight to the component of the mixture with smaller variance. The posterior weight on the first component of the mixture will be a measure of how discordant the data and the expert's prior are or how different are the two studies. We consider choosing the size of the larger variance by considering the difference in information between the two priors. We also investigate the effect of different parameterisations of the parameter of interest. We consider a number of distributions and compare this method for measuring the discordancy with previously suggested diagnostics. This is joint work with Mitra Noosha.
Asymptotic behaviour of a wide class of stochastic approximation procedures will be discussed. This class of procedures has three main characteristics: truncations with random moving bounds, a matrix-valued random step-size sequence, and a dynamically changing random regression function. A number of examples will be presented to demonstrate the flexibility of this class, with the main emphases on on-line procedures for parametric statistical estimation. The proposed method ensures an efficient use of auxiliary information in the estimation process, and is consistent and asymptotically efficient under certain regularity conditions.
When you encounter a flock of birds, with individuals calling to each other, it is often clear that the birds are influencing one another through their calls. Can we infer the structure of their social network, simply by analysing the timing of calls? We introduce a model-based analysis for temporal patterns of animal call timing, originally developed for networks of firing neurons. This has advantages over previous methods in that it can correctly handle common-cause confounds and provides a generative model of call patterns with explicit parameters for the parallel influences between individuals. We illustrate with data recorded from songbirds, to make inferences about individual identity and about patterns of influence in communication networks.
How should statisticians interact with the media? What should statisticians know about how the media operate? For several years I have worked (occasionally) with journalists, and provided expert statistical comments on press releases and media stories. I will describe my experience of the many-sided relationship between researchers, press officers, journalists, and the public they are writing for, from the point of view of the statisticians who are also involved. I will discuss the complicated nature of numbers as facts. Using examples such as the question of whether mobile phones cause brain tumours, I will explain how none of the parties in this relationship makes things easy for the others. Finally I will present a few reasons for being optimistic about the position of statistics in the media.
The final results of a multi-centre clinical trial of a vaccine against malaria, RTS,S, were published in 2015. Along with three other groups, we had access to the trial data to use as inputs into mathematical models of malaria transmission. Public health funding bodies and policy makers would like to know how the trial results generalise to other settings. This talk describes how we made use of the data to predict the population-level impact across Africa that vaccination might have, and how the uncertainty from various sources was incorporated.
Tail-index is an important measure to gauge the heavy-tailed behavior of a distribution. Tail-index regression is introduced when covariate information is available. Existing models may face two challenges: extreme analysis or tail modelling with small to moderate size data usually results in small sample bias, and on the other hand, the issue of storage and computational efficiency with massive data sets also exists for Tail-index regression. In this talk we present new tail-index regression methods, which have unbiased estimates of both regression coefficients and tail-index under small data, and are able to support online analytical processing (OLAP) without accessing the raw data in massive data analysis.
In a selective recruitment design not every patient is recruited onto a clinical trial. Instead, we evaluate how much statistical information a patient is expected to provide (as a function of their covariates) and only recruit patients that will provide a sufficient level of expected information. Patients deemed statistically uninformative are rejected. Allocation to a treatment arm is also done in a manner that maximises the expected information gain.
The benefit of selective recruitment is that a successful trial can potentially be achieved with fewer recruits, thereby leading to economic and ethical advantages. We will explore various methods for quantifying how informative a patient is based on uncertainty sampling, the posterior entropy, the expected generalisation error and variance reduction. The protocol will be applied to both time-to-event outcomes and binary outcomes. Results from experimental data and numerical simulations will be presented.
Dual-agent trials are now increasingly common in oncology research, and many proposed dose-escalation designs are available in the statistical literature. Despite this, the translation from statistical design to practical application is slow, as has been highlighted in single-agent phase I trials, where a 3+3 rule-based design is often still used. To expedite this process, new dose-escalation designs need to be not only scientifically beneficial but also easy to understand and implement by clinicians. We proposed a curve-free (nonparametric) design for a dual-agent trial in which the model parameters are the probabilities of toxicity at each of the dose combinations. We show that it is relatively trivial for a clinician's prior beliefs or historical information to be incorporated in the model and updating is fast and computationally simple through the use of conjugate Bayesian inference. Monotonicity is ensured by considering only a set of monotonic contours for the distribution of the maximum tolerated contour, which defines the dose-escalation decision process. Varied experimentation around the contour is achievable, and multiple dosecombinations can be recommended to take forward to phase II. Code for R, Stata and Excel are available for implementation.
The main purpose of dose-escalation trials is to identify the dose(s) that are safe and efficacious for further investigations in later studies. Therefore, dose-limiting events (DLEs) and indicative responses of efficacy should be considered in the dose-escalation procedure.
In this presentation, Bayesian adaptive approaches that incorporate both safety and efficacy will be introduced. A logistic regression model is used for modelling the probabilities of an occurrence of a DLE at their corresponding dose levels while a linear log-log or a non-parametric model is used for efficacy. Escalation decisions are based on the combination of both models through a gain function to balance efficacy utilities versus costs for safety risks. These dose-escalation procedures aim to achieve either one objective: estimate the optimal dose, calculated via the gain function and interpreted as the safe dose which gives maximum beneficial therapeutic effect; or to achieve two objectives: estimating both the maximum tolerated dose (MTD), the highest dose that is considered as safe, and the optimal dose accurately at the end of a dose-escalation study. The recommended dose(s) obtained under these procedures provide information about the safety and efficacy profile of the novel drug to facilitate later studies. We evaluate the different strategies via simulations based on an example constructed from a real trial. To assess the robustness of the single-objective approach, scenarios where the efficacy responses of subjects are generated from an Emax model, but treated as coming from a linear log-log model are considered. We also find that the non-parametric model estimates the efficacy responses well for a large range of different underlying true shapes. The dual-objective approaches give promising results in terms of having most of their recommendations made at the two real target doses.
Time series models are often fitted to the data without preliminary checks for stability of the mean and variance, conditions that may not hold in much economic and financial data, particularly over long periods. Ignoring such shifts may result in fitting models with spurious dynamics that lead to unsupported and controversial conclusions about time dependence, causality, and the effects of unanticipated shocks. In spite of what may seem as obvious differences between a time series of independent variates with changing variance and a stationary conditionally heteroskedastic (GARCH) process, such processes may be hard to distinguish in applied work using basic time series diagnostic tools. We develop and study some practical and easily implemented statistical procedures to test the mean and variance stability of uncorrelated and serially dependent time series. Application of the new methods to analyze the volatility properties of stock market returns leads to some unexpected surprising findings concerning the advantages of modeling time varying changes in unconditional variance.
Joint work with V. Dalla and P. C. B. Philips
Due to relatively recent algorithmic breakthroughs Bayesian networks have become an increasingly popular technique for risk assessment and decision analysis. This talk will provide an overview of successful applications (including transport safety, medical, law/forensics, operational risk, and football prediction). What is common to all of these applications is that the Bayesian network models are built using a combination of expert judgment and (often very limited) data. I will explain why Bayesian networks 'learnt' purely from data - even when 'big data' is available - generally do not work well, and will also explain the impediments to wider use of Bayesian networks.
Small clinical trials are sometimes unavoidable, for example, in the setting of rare diseases, specifically targeted subpopulation and vulnerable population. The most common designs used in these trials are based on the frequentist paradigm with either a large hypothesized effect size or relaxing the type I and/or II error rates. One of the novel designs that has been proposed is the Bayesian decision-theoretic approach which is more intuitive for trials whose aim is to decide whether or not to conduct further clinical research with the experimental treatment. In this talk, I will start with a review of Bayesian decision-theoretic designs followed by a more detailed discussion on designing a series of trials using this framework.
Graphical models have been studied and formalised across many communities of researchers (artificial intelligence, machine learning, statistics, to name just a few) and nowadays they represent a powerful tool for tackling many diverse applications. They still represent an exciting area of research and many new types of graphical models have been introduced to accommodate more complex situations arising from more challenging research questions and data available. Even the interpretation of graphical models can be quite different in different contexts. If we think for example of high-dimensional settings, the original notion of conditional independence between random variables encoded by the conditional dependence graph is generally lost and the interest is in finding the most important components of thousands of random variables.
In this talk we will present some of the challenges we are faced when using graphical models to address research questions coming from interdisciplinary collaborations. We will present two case studies arising from collaborations with researchers in Biology and Neuropsychology and will try to elucidate some of the new frameworks arising. In particular we will show how graphical models can be very powerful for both an explorative statistical analysis and answering more advanced questions in statistical modelling and prediction.
Statistical design of experiments allows empirical studies in science and engineering to be conducted more efficiently through careful choice of the settings of the controllable variables under investigation. Much conventional work in optimal design of experiments begins by assuming a particular structural form for the model generating the data, or perhaps a small set of possible parametric models. However, these parametric models will only ever be an approximation to the true relationship between the response and controllable variables, and the impact of this approximation step on the performance of the design is rarely quantified.
We consider response surface problems where it is explicitly acknowledged that a linear model approximation differs from the true mean response by the addition of a discrepancy function. The most realistic approaches to this problem develop optimal designs that are robust to discrepancy functions from an infinite-dimensional class of possible functions. Typically it is assumed that the class of possible discrepancies is defined by a bound on either (i) the maximum absolute value, or (ii) the squared integral, of all possible discrepancy functions.
Under assumption (ii), minimax prediction error criteria fail to select a finite design. This occurs because all finitely supported deterministic designs have the problem that the maximum, over all possible discrepancy functions, of the integrated mean squared error of prediction (IMSEP) is infinite.
We demonstrate a new approach in which finite designs are drawn at random from a highly structured distribution, called a designer, of possible designs. If we also average over the random choice of design, then the maximum IMSEP is finite. We develop a class of designers for which the maximum IMSEP is analytically and computationally tractable. Algorithms for the selection of minimax efficient designers are considered, and the inherent bias-variance trade-off is illustrated.
Joint work with Dave Woods, Southampton Statistical Sciences Research Institute, University of Southampton
In this seminar, a novel concept in mathematical statistics is proposed. Ordinarily, some topological factors such as Gromov-Hausdorrf distance, dilatation and distortion are defined inside one metric space. The proposed idea puts a probability operator with some topological factors over different metric spaces which can project from these to one common measure space. By assuming a compact Polish space, first the original information is projected to a metric space. Then the projected information from the different spaces is mapped to one common space where some topological factors are applied. The inference and estimation can be calculated.
The merit of the proposed idea is to be able to compare values and some qualities of different fields with those for one metric space. This novel concept can let Information of post Big Data become more natural for people.
In the seminar, the situation of the Great Earthquake in Japan on 11 March 2011 is also introduced.
I will focus on the derivation of a generalized linear mixed model (GLMM) in the context of completely randomized design (CRD) based on randomization ideas for linear models. The randomization approach to derive linear models is adapted to the link-transformed mean responses including random effects with fixed effects.
Typically, the random effects in a GLMM are uncorrelated and assumed to follow a normal distribution mainly for computational simplicity. However, in our case, due to the randomization the random effects are correlated. We develop the likelihood function and an estimation algorithm where we do not assume that the random effects have a normal distribution.
I will present and compare the simulation results of a simple example with GLM (generalized linear model) and HGLM (hierarchical generalized linear model) which is suitable for normally distributed correlated random effects.
In 1948 the MRC streptomycin trial established the principles of the modern clinical trial, and for longer still the idea of a control or comparison group recruited concurrently to the intervention group has been recognised as essential to obtaining sound evidence for clinical effectiveness. But must a clinical trial proceed by running an intervention and comparator in parallel? In this seminar I will focus on trials where participants are randomised in clusters. This is common when evaluating health service interventions that are delivered within an organisational unit such as a school or general practice. I will look in particular at trials where the comparator is routine care: these trials effectively ask how individuals' outcomes would compare before and after introducing the new treatment in a cluster. I will discuss some surprisingly efficient alternatives to parallel group trial designs in this case, made possible by delaying introduction of the intervention in some clusters after randomisation, with these clusters continuing in the meantime to receive routine care.
The multi-armed bandit problem describes a sequential experiment in which the goal is to achieve the largest possible mean reward by choosing from different reward distributions with unknown parameters. This problem has become a paradigmatic framework to describe the dilemma between exploration (learning about distributions' parameters) and exploitation (earning from distributions that look superior based on limited data), which characterises any data based learning process.
Over the past 40 years bandit-based solutions, and particularly the concept of index policy introduced by Gittins and Jones, have been fruitfully developed and deployed to address a wide variety of stochastic scheduling problems arising in practice. Across this literature, the use of bandit models to optimally design clinical trials became a typical motivating application, yet little of the resulting theory has ever been used in the actual design and analysis of clinical trials. In this talk I will illustrate both theoretically and via simulations, the advantages and disadvantages of bandit-based allocation rules approaches to clinical trials. Based on that, I will reflect on the reasons why these ideas have not been used in practice and describe a novel implementation of the Gittins index rule that overcomes these difficulties, trading off a small deviation from optimality for a fully randomized, adaptive group allocation procedure which offers substantial improvements in terms of patient benefit, especially relevant for small populations.
This talk is based on recent joint work with Jack Bowden and James Wason.
Current aerospace applications exhibit several features that are not yet adequately addressed by the available optimisation tools:- Large scale (~1000 design variables) optimisation problems with expensive (10+ hours) response function evaluations- Discrete optimisation with even moderately expensive response functions- Optimisation with non-deterministic responses- Multidisciplinary optimisation in an industrial setting.
The presentation discusses recent progress towards addressing these issues identifying general trends and metamodel-based methods for solving large scale optimisation problems.
Issues that have to be addressed to obtain high quality metamodels of computationally expensive responses include establishing appropriate Designs of Experiments (DOE) focusing on the optimum Latin hypercube DOEs and including nested DOEs. Several metamodel types will be reviewed focusing on the ones obtained by the Moving Least Squares method due to its controlled noise-smoothing capability and by the Genetic Programming due to its ability to arrive at explicit functions of design variables. The use of variable fidelity responses for establishing high accuracy metamodels is also considered.
Examples of recent aerospace applications include- Turbomachinery applications- Optimisation of composite wing panels- Topology optimisation and parametric optimisation in the preliminary design of a lattice composite fuselage- Optimisation and stochastic analysis of a landing system for the ESA ExoMars mission
In this talk I will discuss a generalized Gaussian process concurrent regression model for functional data where the functional response variable has a binomial, Poisson or other non-Gaussian distribution from an exponential family while the covariates are mixed functional and scalar variables. The proposed model offers a nonparametric generalized concurrent regression method for functional data with multi-dimensional covariates, and provides a natural framework on modeling common mean structure and covariance structure simultaneously for repeatedly observed functional data. The mean structure provides an overall information about the observations, while the covariance structure can be used to catch up the characteristic of each individual batch. The prior specification of covariance kernel enables us to accommodate a wide class of nonlinear models. The definition of the model, the inference and the implementation as well as its asymptotic properties will be discussed. I will also present several numerical examples with different types of non-Gaussian response variables.
A generalised linear model is considered in which the design variables may be functions of previous responses. Interest lies in estimating the parameters of the model. Approximations are derived for the bias and variance of the maximum likelihood estimators of the parameters. The derivations involve differentiating the fundamental identity of sequential analysis. The normal linear regression model, the logistic regression model and the dilution-series model are used to illustrate the approximations.
Design and analysis of experiments is sometimes seen as an area of statistics in which there are few new problems. I will argue that modern biological and industrial experiments, often with automatic data collection systems, require advances in the methodology of designed experiments if they are to be applied successfully in practice. The basic philosophy of design will be reexamined in this context. Experiments can now be designed to maximise the information in the data without computational restrictions limiting either the data analysis that can be done or the search for a design. Very large amounts of data may be collected from each experimental unit and various empirical modelling techniques may used to analyse these data. In order to ensure that the data contain the required information, it is vital that attention be paid to the experimental design, the sampling design and any mechanistic information that can be built into the model. The application of these ideas to some particular processes will be used to illustrate the kinds of method that can be developed.
In this talk, I will present a Bayesian approach to the problem of comparing two independent binomial proportions and its application to the design and analysis of proof-of-concept clinical trials.
First, I will discuss numerical integration methods to compute exact posterior distribution functions, probability densities, and quantiles of the risk difference, relative risk, and odds ratio. These numerical methods are building blocks for applying exact Bayesian analysis in practice. Exact probability calculations provide improved accuracy compared to normal approximations and are computationally more efficient than simulation-based approaches, especially when these calculations have to be invoked repeatedly as part of another simulation study.
Second, I will show applicability of exact Bayesian calculations in the context of a proof-of-concept clinical trial in ophthalmology. A single-stage design and a two-stage adaptive design based on posterior predictive probability of achieving proof-of-concept based on dual criteria of statistical significance and clinical relevance will be presented. A two-stage design allows early stopping for either futility or efficacy, thereby providing a higher level of cost-efficiency than a single-stage design. A take-home message is that exact Bayesian methods provide an elegant and efficient way to facilitate design and analysis of proof-of-concept studies.
Reference:
Sverdlov O, Ryeznik Y, Wu S. (2015). Exact Bayesian inference comparing binomial proportions, with application to proof-of-concept clinical trials. Therapeutic Innovation and Regulatory Science 49(1), 163-174.
This study was motivated by two ongoing clinical trials run by EMR, to see whether B cell pathotype would cause the response rate to differ by two biological therapies for Rheumatoid Arthritis patients. Both trials used B cell pathotype as a stratification factor in the randomizations, and the effect of interest was the interaction between treatments and B cell pathotype. The B cell pathotype was classified by a synovial biopsy that each patient received before the randomization. The categories were B cell rich, B cell poor and Unknown (if the biopsy result was delayed). The biopsy result of unknown patients would be revealed once it was ready during the trial.
Randomizations studied include complete randomization, covariate-adaptive randomization, hierarchical dynamic randomization, permuted block randomization and Begg-Iglewicz randomization. The comparison was based on simulations using the measures: selection bias, imbalance, power for testing treatment and interaction effects and inefficiency of the randomization. Because the outcome was binary variable whether a patient was responder, logistic regression was the natural choice for the post analysis. Treatment and interaction effects as well as the power to detect their significance were estimated using the logistic model with independent variables: treatments, pathotype and their interaction.
As is well-known, the maximum likelihood method overfits regression models when the dimension of the model is large relative to the sample size. To address this problem, a number of approaches have been used, such as dimension reduction (as in, e.g., multiple regression selection methods or the lasso method), subjective priors (which we interpret broadly to include random effects models or Gaussian process regression), or regularization. In addition to the model assumptions, these three approaches introduce, by their nature, further assumptions for the purpose of estimating the model.
The first main contribution of this talk is an alternative method which, like maximum likelihood, requires no assumptions other than those pertaining to the model of interest. Our proposal is based on a new information theoretic Gaussian proper prior for the regression function based on the Fisher information. We call it the I-prior, the 'I' referring to information. The method is no more difficult to implement than random effects models or Gaussian process regression models.
Our second main contribution is a modelling methodology made possible by the I-prior, which is applicable to classification, multilevel modelling, functional data analysis and longitudinal data analysis. For a number of data sets that have previously been analyzed in the literature, we show our methodology performs competitively with existing methods.
Chain event graphs (CEGs) extend graphical models to address situations in which, after one variable takes a particular value, possible values of future variables differ from those following alternative values. These graphs are a useful framework for modelling discrete processes which exhibit strong asymmetric dependence structures, and are derived from probability trees by merging the vertices in the trees together whose associated conditional probabilities are the same.
We exploit this framework to develop new classes of models where missingness is influential and data are unlikely to be missing at random. Context-specific symmetries are captured by the CEG. As models can be scored efficiently and in closed form, standard Bayesian selection methods can be used to search over a range of models. The selected maximum a posteriori model can be easily read back to the client in a graphically transparent way.
The efficacy of our methods are illustrated using a longitudinal study from birth to age 25 of children in New Zealand, analysing their hospital admissions aged 18-25 years with respect to family functioning, education, and substance abuse aged 16-18 years. Of the initial 1265 people, 25% had missing data at age 16, and 20% had missing data on hospital admissions aged 18-25 years. More outcome data were missing for poorer scores on social factors. For example, 21% for mothers with no formal education compared to 13% for mothers with tertiary qualifications.
This is joint work with Lorna Barclay and Jim Smith.
The majority of model-based clustering techniques is based on multivariate Normal models and their variants. This talk introduces and studies the framework of copula-based finite mixture models for clustering applications. In particular, the use of copulas in model-based clustering offers two direct advantages over current methods:
i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, andii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (discrete, continuous, both discrete and continuous) in a natural way.
Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures can be used that can fully exploit the copula structure. The closure properties of the mixture models under marginalisation will be discussed, and for continuous, real-valued data parametric rotations in the sample space will be introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology will be accompanied by the analysis of real and artificial data.
This is joint work with Dimitris Karlis at the Athens University of Economics and Business.
Related preprint: http://arxiv.org/abs/1404.4077
Symmetric positive semi-definite (SPD) matrices have recently seen several new applications, including Diffusion Tensor Imaging (DTI) in MRI, covariance descriptors and structure tensors in computer vision, and kernels in machine learning.
Depending on the application, various geometries have been explored for statistical analysis of SPD-valued data. We will focus on DTI, where the naive Euclidean approach was generally criticised for its “swelling” effect in interpolation, and violations of positive definiteness in extrapolation and estimation. The affine invariant and log-Euclidean Riemannian metrics were subsequently proposed to remedy the above deficiencies. However, practitioners have recently argued that these geometric approaches are an overkill in some relevant noise models.
We will examine a couple of related alternative approaches that in a sense reside in between the two aforementioned extremes. These alternatives are based on the square root Euclidean and Procrustes size-and-shape metrics. Unlike the Riemannian approach, our approaches, we think, operate more naturally with respect to the boundary of the cone of SPD matrices. In particular, we prove that the Procrustes metric, when used to compute weighted Frechet averages, preserves ranks. We also establish and prove a key relationship between these two metrics, as well as inequalities ranking traces (mean diffusivity) and determinants of the interpolants based on the Riemannian, Euclidean, and our alternative metrics. Remarkably, traces and determinants of our alternative interpolants compare differently. A general proof of the determinant inequality was just developed and may also be of value to the more general matrix analysis community.
Several experimental illustrations will be shown based on synthetic and real human brain DT MRI data.
No special background in statistical analysis on non-Euclidean manifolds is assumed.
This is a joint work with Prof Ian Dryden (University of Nottingham) and Dr Diwei Zhou (Loughborough University), with a more recent contribution by Dr Koenraad Audenaert (RHUL).
I call attention to what I call the “crisis of evaluation” in music information retrieval (MIR) research. Among other things, MIR seeks to address the variety of needs for music information of listeners, music recording archives, and music companies. A large portion of MIR research has thus been devoted to the automated description of music in terms of genre, mood, and other meaningful terms. However, my recent work reveals four things: 1) many published results unknowingly use datasets with faults that render them meaningless; 2) state-of-the-art (“high classification accuracy”) systems are fooled by irrelevant factors; 3) most published results are based upon an invalid evaluation design; and 4) a lot of work has unknowingly built, tuned, tested, compared and advertised “horses” instead of solutions. (The true story of the horse Clever Hans provides the most appropriate illustration.) I argue why these problems have occurred, and how we can address them by adopting the formal design and evaluation of experiments, and other best practices.
Relevant publications:[1] B. L. Sturm, “Classification accuracy is not enough: On the evaluation of music genre recognition systems,” J. Intell. Info. Systems, vol. 41, no. 3, pp. 371–406, 2013. http://link.springer.com/article/10.1007%2Fs10844-013-0250-y(link is external)
[2] B. L. Sturm, “A simple method to determine if a music information retrieval system is a “horse”,” IEEE Trans. Multimedia, vol. 16, no. 6, pp. 1636–1644, 2014.http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6847693(link is external)
[3] B. L. Sturm, “The state of the art ten years after a state of the art: Future research in music information retrieval,” J. New Music Research, vol. 43, no. 2, pp. 147–172, 2014.http://www.tandfonline.com/doi/abs/10.1080/09298215.2014.894533#.VMDT0KZ...
A relaxed design is a continuous design whose replications can be any nonnegative real number. The talk introduces the method of relaxed designs and identifies its applications to sample size determination, cost-efficient design, constrained design and multi-stage Bayesian design. The main focus is on applications that could be intractable with standard optimal design.
Consider a population where subjects are susceptible to a disease (e.g. AIDS). The objective is to perform inferences on a population quantity (like the prevalence of HIV on a high-risk subpopulation, e.g. intra-venous drug abusers) via sampling mechanisms based on a social network (link-tracing designs, RDS). We develop a general framework for making Bayesian inference on the population quantity that: models the uncertainty in the underlying social network using a random graph model, incorporates dependence among the individual responses according to the social network via a Markov Random Field, models the uncertainty regarding the sampling on the social network, and deals with the non-ignorability of the sampling design. The proposed framework is general in the sense that it allows a wide range of different specifications for the components of the model we just mentioned. Samples from the posterior distribution are obtained via Bayesian model averaging. Our model is compared with standard methods in simulation studies and it is applied to real data.
I will give an overview of my work in music informatics research (MIR) with some applications to singing research and tracking the evolution of music. I first will give a very high-level overview of my work, starting with my Dynamic Bayesian Network approach to chord recognition, a system for lyrics-to-audio alignment (SongPrompter), and some other shiny applications of Music Informatics (Songle.jp, Last.fm Driver's Seat). Secondly, I will talk about some scientific applications of music informatics, including the study of singing intonation and intonation drift as well as the evolution of music both in the lab and in the real charts.
Markov chain Monte Carlo methods are essential tools for solving many modern day statistical and computational problems, however a major limitation is the inherently sequential nature of these algorithms. In this talk I'll present some work I recently published in PNAS on a natural generalisation of the Metropolis-Hastings algorithm that allows for parallelising a single chain using existing MCMC methods. We can do so by proposing multiple points in parallel, then constructing and sampling from a finite state Markov chain on the proposed points such that the overall procedure has the correct target density as its stationary distribution. The approach is generally applicable and straightforward to implement. I'll demonstrate how this construction may be used to greatly increase the computational speed and statistical efficiency of a variety of existing MCMC methods, including Metropolis-Adjusted Langevin Algorithms and Adaptive MCMC. Furthermore, I'll discuss how it allows for a principled way of utilising every integration step within Hamiltonian Monte Carlo methods; our approach increases robustness to the choice of algorithmic parameters and results in increased accuracy of Monte Carlo estimates with little extra computational cost.
Adaptive designs that are based on group-sequential approaches have the benefit of being efficient as stopping boundaries can be found that lead to good operating characteristics with test decisions based solely on sufficient statistics. The drawback of these so called “pre-planned adaptive” designs is that unexpected design changesare not possible without impacting the error rates. “Flexible adaptive designs”, and in particular designs based on p-value combination, on the other hand can cope with a large number of contingencies at the cost of reduced efficiency.
In this presentation we focus on so called multi-arm multi-stage trials which compare several active treatments against control at a series of interim analyses. We will focus on the methods by Stallard and Todd [1] and Magirr et al. [2], two different approaches which are based on group-sequential ideas, and discuss how these “pre-plannedadaptive designs” can be modified to allow for flexibility. We then show how the added flexibility can be used for treatment selection and evaluate the impact on power in a simulation study. The results show that a combination of a well chosen pre-planned design and an application of the conditional error principle to allow flexible treatment selection results in an impressive overall procedure.______________________[1] Stallard, N, & Todd, S. 2003. Sequential designs for phase III clinical trials incorporating treatment selection. Statistics in Medicine, 22, 689-703.[2] Magirr, D, Jaki, T, & Whitehead, J. 2012. A generalised Dunnett test for multi-arm, multi-stage clinical studies with treatment selection. Biometrika, 99, 494-501.
Two routes most commonly proposed for accurate inference on a scalar interest parameter in the presence of a (possibly high-dimensional) nuisance parameter are parametric simulation (`bootstrap') methods, and analytic procedures based on normal approximation to adjusted forms of the signed root likelihood ratio statistic. Both methods yield, under some null hypothesis of interest, p-values which are uniformly distributed to error of third-order in the available sample size. But, given a specific inference problem, what is the formal relationship between p-values calculated by the two approaches? We elucidate the extent to which the two methodologies actually just give the same inference.
The design of many experiments can be considered as implicitly Bayesian, with prior knowledge being used informally to aid decisions such as which factors to vary and the choice of plausible causal relationships between the factors and measured responses. Bayesian methods allow uncertainty in such decisions to be incorporated into design selection through prior distributions that encapsulate information available from scientific knowledge or previous experimentation. Further, a design may be explicitly tailored to the aim of the experiment through a decision-theoretic approach with an appropriate loss function.
We will present novel methodology for two problems in this area, related through the application of Gaussian process (GP) regression models. Firstly, we consider Bayesian design for prediction from a GP model, as might be used for the collection of spatial data or for a computer experiment to interrogate a numerical model. Secondly, we address Bayesian design for parametric regression models, and demonstrate the application of GP emulators to mitigate the computational issues that have traditionally been a barrier to the application of these designs.
Synchronisation phenomena in their various disguises are among the most prominent features in coupled dynamical structures. Within this talk we first introduce how the vague notion of a phase can be given a more precise meaning using what has been coined as analytic signal processing. This approach then allows to distinguish different types of synchronisation phenomena, and in particular to detect synchronisation of the phase of signals where amplitudes remain uncorrelated. These ideas are finally applied to data sets to explore whether phase synchronisation plays a role in the interpretation of physiological movement data.
Advances in scientific computing have allowed the development of complex models that are being routinely applied to problems in physics, engineering, biology and other disciplines. The utility of these models depends on how well they are calibrated to empirical data. Their calibration is hindered however, both by large numbers of input and output parameters and by run times that increase with the model's complexity. In this talk we present a calibration method called History Matching, which is iterative and scales well with the dimensionality of the problem. History matching is based on the concept of an emulator, which is a Bayesian representation of our beliefs about the model, given the runs that are available to us. Capitalising on the efficiency of the emulator, History Matching iteratively discards regions of the input space that are unlikely to provide a good match to the empirical data, and is based on successive runs of the computer model in narrowing areas of the input space, which are known as waves. This calibration technique can be embedded in a comprehensive error modelling framework, that takes into account various sources of uncertainty, due to the parameters, the model itself, the observations etc. A calibration example of a high dimensional HIV model will be used to illustrate the method.
Smooth supersaturated models are a class of emulators with a supersaturated polynomial basis, that is there are more model terms than design points. In this talk I will give some key results regarding the structure and solvability of these models as well as some insights regarding the numeric stability of fitting these large models. Sensitivity analysis using Sobol indices is often used to reduce the parameter space of expensive computer experiments and a simple formula is given for computing these indices for a smooth supersaturated model. Finally, I present the results of some simulation studies exploring ways to use the emulated response surface to generate new design points.
Following the correct selection of a therapy based on the indication, an optimal dose regimen is the most important determinant of therapeutic success of a medical therapy. After giving an introduction to the Efficient Dosing (ED) algorithm developed by us to compute dose regimens which ensure that the blood concentration of the drug in the body is kept close to the target level, I will show how the algorithm can be applied to the Pharmacodynamic models for infectious diseases. The optimized dose regimens satisfy three conditions: (1) minimize the concentration of the anti-infective drug lying outside the therapeutic window (if any), (2) ensure a target reduction in viral load, and (3) minimize drug exposure once the goal of viral load reduction has been achieved. The algorithm can also be used to compute the number of doses required for treatment.
One key quantity of interest in skin sensitisation hazard assessment is the mean threshold for skin sensitisation for some defined population (called the sensitising potency). Before considering the sensitising potency of the chemical, hazard assessors consider whether the chemical has the potential to be a skin sensitiser in humans. Bayesian belief network approaches to this part of the assessment, which handles the disparate lines of evidence within a probabilistic framework, have been applied successfully. The greater challenge comes in the quantification of uncertainty about the sensitising potency.
To make inferences about sensitising potency, we used a Bayes linear framework to model hazard assessors' expectations and uncertainties and to update those beliefs in the light of some competing data sources. In producing a tool for synthesising multiple lines of evidence and estimating hazard, we developed a transparent mechanism to help defend and communicate risk management decisions. In this talk, I will attempt to describe the principles of this Bayesian modelling and formal processes for capturing expert knowledge. And, hopefully, I will be able to highlight their applicability where fast decisions are needed and data are sparse.
In this talk, models based on orthonormal systems for experimental design are presented. In such models, it is possible to use fast Fourier Transforms (FFT) to calculate the parameters, which are independent, and which are complex numbers expressed as Fourier coefficients.
Theorems for the relation between the Fourier coefficients and the effect of each factor are also given. Using these theorems, the effect of each factor can be easily obtained from the computed Fourier coefficients. The paper finally shows that the analysis of variance can be used on the proposed models without the need to calculate the degrees of freedom.
Human social systems show unexpected patterns when studied from a collective point of view. In this talk I will present a few examples of collective behaviour in social systems: human movement patterns, social encounter networks and music collaboration networks, all of which are data driven. I'll try to make the talk short and aim to start a discussion.
This talk will focus on the challenges faced in mixed modelling with ordinal response variables. Topics covered will include: the advantages of an ordinal approach versus a -widely used in practice- more generic continuous approach; the implications of the proportional odds assumption and more flexible approaches such as the partial proportional odds assumption; and the implementation of mixed models in this context. Both simulations and applications to real data regarding perceptions on environmental matters will be shown.
It is often computationally infeasible to re-estimate afresh a large-scale model when a small number of observations is sequentially modified. Furthermore, in some cases a dataset is too large and might not be able to fit in a computer's memory and in such cases out of core algorithms need to be developed. Similarly data might not be available at once and recursive estimation strategies need to be applied. Within this context the aim is to design computationally efficient and numerically stable algorithms. Initially, the re-estimation of the generalized least squares (GLS) solution after observations are deleted, known as downdating, is examined. The new method to estimate the downdated general linear model (GLM), updates the original GLM with the imaginary deleted observations. This results to a non-positive definite dispersion matrix which comprises complex covariance values. This updated-GLM with imaginary values has been proven to derive the same GLS estimator as that of solving afresh the original GLM after downdating.The estimation of the downdated-GLM is formulated as a generalized linear least squares problem (GLLSP). The solution of the GLLSP derives the GLS estimator even when the dispersion matrix is singular. The main computational tool is the generalized QR decomposition which is employed based on hyperbolic Householder transformations, however, no complex arithmetic is used in practice. The special case of computing the GLS estimator of the downdated-SUR (seemingly unrelated regressions) model is considered. The method is extended to the problem of concurrently adding and deleting observations from the model. The special structure of the matrices and properties of the SUR model are efficiently exploited in order to reduce the computational burden of the estimation algorithm. The proposed algorithms are applied to synthetic and real data. Their performance when compared with algorithms that estimate the same model afresh confirms their computational efficiency.
We study the problem of identifying bat species from echolocation calls in order to build automated bioacoustic monitoring algorithms. We employ the Dynamic Time Warping algorithm which has been successfully applied for bird flight calls identification and show that classification performance is superior to hand crafted call shape parameters used in previous research. This highlights that generic bioacoustic software with good classification rates can be constructed with little domain knowledge. We conduct a study with field data of 21 bat species from the north and central Mexico using a multinomial probit regression model with Gaussian process prior and a full EP approximation of the posterior of latent function values. Results indicate high classification accuracy across almost all classes while misclassification rate across families of species is low highlighting the common evolutionary path of echolocation in bats.
Several problems of genomic analysis involve detection of local genomic signals. Whenthe data are generated by sequence based methods, the variability of read depth at differentpositions on the genome suggests point process models involving non-homogeneous Poissonprocesses, or perhaps negative binomial processes if there is excess variability. We discuss anumber of examples, and consider in detail a model for detection of insertions and deletions(indels) based on paired end reads.This is joint research with Nancy Zhang and Benjamin Yakir.
The talk considers a cluster detection methodology which describes the cluster status of each area, and provides alternative/complementary perspectives to spatial scan cluster detection. The focus is on spatial health risk patterns (area disease prevalence, area mortality, etc) when area relative risks are unknown parameters. The method provides additional insights with regard to cluster centre areas vs. cluster edge areas. The method also considers both low risk clustering and high risk clustering in an integrated perspective, and measures high/low risk outlier status. The application of the method is considered with simulated data (and known spatial clustering), and with real examples, both univariate and bivariate.
In this talk, I will present a brief summary of several classes of univariate flexible distributions employed to model skewness and kurtosis. We will discuss a simple classication of these distributions in terms of their tail behaviour. This classication motivates the introduction of a new family of distributions (double two{piece distributions), which is obtained by using a transformation dened on the family of uni-modal symmetric continuous distributions containing a shape parameter. The proposed distributions contain five interpretable parameters that control the mode, as well as the scale and shape in each direction. Four-parameter subfamilies of this class of transformations are also discussed. It is also presented an interpretable scale and location invariant benchmark prior as well as conditions for the existence of the corresponding posterior distribution. Finally, the use of this sort of models is illustrated with a real data example.
A novel methodology for the detection of abrupt changes in the generating mechanisms (stochastic, deterministic or mixed) of a time series, without any prior knowledge about them, will be presented. This methodology has two components: the first is a novel concept of the epsilon-complexity, and the second is a method for the change point detection. In the talk, we will give the definition of the epsilon-complexity of a continuous function defined on a compact segment. We will show that for the Holder class of functions there exists an effective characterization of the epsilon-complexity. The results of simulations and applications to the electroencephalogram data and financial time series will be presented. (The talk is based on joint work with Boris Darkhovsky at the Russian Academy of Sciences.)
tba
Climate change research methods, particularly those aspects involving projection of future climatic conditions, depend heavily upon statistical techniques but are still at an early stage of development. I will discuss what I see as the key statistical challenges for climate research, including the problems of too little and too much data, the principles of inference from model output, and the relationship of statistics with dynamics (physics). With reference to some specific examples from the latest IPCC report and beyond, I will show that there is a need for statisticians to become more involved with climate research and to do so in a manner that clarifies, rather than obscures, the role and influence of physical constraints, of necessary simplifying assumptions, and of subjective expert judgement.
It is increasingly realised that many industrial experiments involve some factors whose levels are harder to reset than others, leading to multi-stratum structures. Designs are usually chosen to optimise the point estimation of fixed effects parameters, such as polynomial terms in a response surface model, using criteria such as D- or A-optimality. Gilmour and Trinca (2012) introduced the DP- and AP-optimality criteria, which optimise interval estimation, or equivalently hypothesis testing, by ensuring that unbiased (pure) error estimates can be obtained. We now extend these ideas to multi-stratum structures, by adapting the stratum-by-stratum algorithm of Trinca and Gilmour (2014) to ensure optimal interval estimation in the lowest stratum. It turns out that, in most practical situations, this also ensures that adequate pure error estimates are available in the higher strata. Several examples show that good practical designs can be obtained, even with fairly small run sizes.
Smoooth supersaturated models (SSM) are interpolation models in which the underlying model size, and typically the degree, is higher than would normally be used in statistics, but where the extra degrees of freedom are used to make the model smooth.
I will describe the methodology, discuss briefly the role of orthogonal polynomials and then address two design problems. The first is selection of knots and the second a more traditional design problem using SSM to obtain the kernels of interest for D-optimality.
This is joint work with Ron Bates (Rolls-Royce), Peter Curtis (QMUL) and Henry Wynn (LSE).
We study the problem of modelling and the short-term forecasting of electricity loads. Regarding the electricity load on each day as a curve, we propose to model the dependence between successive daily curves via curve linear regression. The key ingredient in curve linear regression modelling is the dimension reduction based on a singular value decomposition in a Hilbert space, which reduces the curve regression problem to several ordinary (i.e. scalar) linear regression problems. We illustrate the method by performing one-day ahead forecasting of the electricity loads consumed by the customers of EDF between 2011 and mid-2012, where we also compare our method with other available models.
This is a joint work with Yannig Goude, Xavier Brossat and Qiwei Yao.
Control charts are well known tools in industrial statistical process control. They are used to distinguish between random error and systematic variability. The use of these tools in medicine has only started in recent years.
In this Seminar we present a project in which we explore the ability of Shewhart's control rules to predict severe manic and depressive episodes in bipolar disorder patients. In our study, we consider three types of control charts and a variety of scenarios using real data.
In the first part of the talk we study saturated fractions of factorial designs under the perspective of Algebraic Statistics. Exploiting the identification of a fraction with a binary contingency table, we define a criterion to check whether a fraction is saturated or not with respect to a given model. The proposed criterion is based on combinatorial algebraic objects, namely the circuit basis of the toric ideal associated to the design matrix of the model. It is a joint work with Fabio Rapallo (Universit`a del Piemonte Orientale, Italy) and Maria Piera Rogantin (Italy).
In the second part of the talk we study optimal saturated designs, mainly Doptimal designs. Efficient algorithms for searching for optimal saturated designs are widely available. They maximize a given efficiency measure (such as D-optimality) and provide an optimum design. Nevertheless, they do not guarantee a global optimal design. Indeed, they start from an initial random design and find a local optimal design. If the initial design is changed the optimum found will, in general, be different. A natural question arises. Should we stop at the design found or should we run the algorithm again in search of a better design? This paper uses very recent methods and software for discovery probability to support the decision to continue or stop the sampling. A software tool written in SAS has been developed.
In order to accelerate drug development, adaptive seamless designs (ASDs) have beenproposed. In this talk, I will consider two-stage ASDs, where in stage 1, data arecollected to perform treatment selection or sub-population selection. In stage 2,additional data are collected to perform confirmatory analysis for the selectedtreatments or sub-populations. Unlike the traditional testing procedures, for ASDs,stage 1 data are also used in the confirmatory analysis. Although ASDs are efficient,using stage 1 data both for selection and confirmatory analysis poses statisticalchallenges in making inference.
I will focus on point estimation at the end trials that use ASDs. Estimation ischallenging because multiple hypotheses are considered at stage 1, and the experimentaltreatment (or the sub-population) that appears to be the most effective is selectedwhich may lead to bias. Estimators derived need to account for this fact. In this talk,I will describe estimators we have developed.
In England and Wales, a large-scale multiple statistical surveillance system for infectious disease outbreakshas been in operation for nearly two decades. This system uses a robust quasi-Poisson regression algorithm toidentify aberrances in weekly counts of isolates reported to the Health Protection Agency. In this paper, wereview the performance of the system with a view to reducing the number of false reports, while retaining goodpower to detect genuine outbreaks. We undertook extensive simulations to evaluate the existing system in arange of contrasting scenarios. We suggest several improvements relating to the treatment of trends, seasonality,re-weighting of baselines and error structure. We validate these results by running the existing and proposednew systems in parallel on real data. We find that the new system greatly reduces the number of alarms whilemaintaining good overall performance and in some instances increasing the sensitivity.
We consider experiments where the experimental units are arranged in a circle or in a single line in space or time. If neighbouring treatments may affect the response on an experimental unit, then we need a model which includes the effects of direct treatments, left neighbours and right neighbours. It is desirable that each ordered pair of treatments occurs just once as neighbours and just once with a single unit in between. A circular design with this property is equivalent to a special type of quasigroup.
In one variant of this, self-neighbours are forbidden. In a further variant, it is assumed that the left-neighbour effect is the same as the right-neighbour effect, so all that is needed is that each unordered pair of treatments occurs just once as neighbours and just once with a single unit in between.
I shall report progress on finding methods of constructing the three types of design.
In many areas of scientific research complex experimental designs are now routinely used. With the advent of mixed model algorithms, implemented in many statistical software packages, the analysis of data generated from such experiments has become more accessible. However, failing to correctly identify the experimental design used can lead to incorrect model selection and misleading inferences. A procedure is described that identifies the structure of the experimental design and, given the randomisation, generates a maximal mixed model. This model is determined before the experiment is conducted and provides a starting point for the final statistical analysis. The whole process can be illustrated using a generalisation of the Hasse diagram called the Terms Relationships diagram. Most parts of the algorithm have been implemented in a program written in R. It is shown that the model selection process can be simplified by placing experimental design (crossed/nested structure and randomisation) at the centre of a systematic procedure.
Suppose that two treatments are being compared in a clinical trial. Then, if complete randomisation is used, the next patient is equally likely to be assigned to one of the two treatments. So this randomisation rule does not take into account the previous treatment assignments, responses and covariate vectors, and the current patient's covariate vector. The use of an adaptive biased coin which takes some or all of this information into account can lead to a more powerful trial.
The different types of such designs which are available are reviewed and the consequences for inference discussed. Issues related to both point and interval estimation will be addressed.
We present a novel strategy of statistical inference for graphical models with latent Gaussian variables, and observed variables that follow non-standard sampling distributions. We restrict our attention to those graphs in which the latent variables have a substantive interpretation. In addition, we adopt the assumption that the distribution of the observed variables may be meaningfully interpreted as arising after marginalising over the latent variables. We illustrate the method with two studies that investigate developmental changes in cognitive functions of young children in one case and of cognitive decline of Alzheimer’s patients in the other. These studies involve the assessment of competing causal models for several psychological constructs; and the observed measurements are gathered from the administration of batteries of tasks subject to complicated sampling protocols.
In this presentation I would like to introduce a multimodal database which was created within thescopes of my PhD study on “Cello Performer Modelling Using Timbre Features”.
The database consists of bowing gestures and music samples of six cello players recorded on two differentcellos. The gesture and audio measurements were collected in order to identify performer-dependent soundfeatures of the players performing on the same instrument and to investigate a potentially existingcorrelation between the individual sound features and specific bowing control parameters necessary forproduction of desired richness of tone. The current study goal is to find such combinations of respectivebowing gestures and acoustical features which can be seen as patterns and are able to characterise eachplayer in the database.
Following the data presentation I would like to state some other research questions that clearly emergeand open a discussion on analysis methods which could help to answer them.
Missing data are common in clinical trials but often analysis is based on “completecases”. Complete-case analyses (which delete observations with missing information on any studied covariate) are inefficient and may be biased. Methodological guidelines recommend using multiple imputation (MI). However, for MI to provide valid inferences, the imputation model must recognise the study design. In this talk, we will survey current missing data practice in the clinical trials literature and describe current good practice methodology for hierarchical data.
Using real data from a cluster randomized trial as an example, we see how treatment effects can be sensitive to the choice of method to address the missing data problem. We finish by presenting a few results from a large simulation study, designed to compare the performance of : (a) Multilevel MI that accounts for clustering through cluster random effects, (b) MI that that includes a fixed effect for each cluster and (c) single-level MI that ignores clustering.
In this talk we introduce the need for the estimation ofcross-sectional dependence, or "network" of a panel of time series. Inspatial econometrics and other disciplines, the so-called spatial weightmatrix in a spatial lag model is always assumed known, when it is stillon debate if results of estimation can be sensitive to such assumedknown weight matrices. Since these weight matrices are often sparse, wepropose to regularize it from the data using a well-known technique bynow -- the adaptive LASSO. The technique in quantifying time dependenceis relatively new for statistics and time series literatures.Non-asymptotic inequalities, as well as asymptotic sign consistency forthe weight matrices elements are presented with explicit rates ofconvergence spelt out. A block coordinate descent algorithm is presentedtogether with results from simulation experiments and a real dataanalysis.
George Box famously said “All models are wrong, some are useful”. The challengesare to determine which models are useful and to quantify how wrong is “wrong”.In this talk I will explore the problem of determining model adequacy in thecontext of health economic decision making.
In health economics, models are used to predict the costs and health benefitsunder the competing treatment options (e.g. drug A versus drug B).
The decision problem is typically of the following form. An expensive new drug,A, has arrived on the market. Should the NHS use it? How much additional healthwill society gain if the NHS uses new drug A over existing drug B? What will theextra cost be? What healthcare activity will be displaced if we use drug Arather than drug B? Will this be a good use of scarce healthcare resources?I will describe a general approach to determining model adequacy that is basedon quantifying the “expected value of model improvement”. I will illustrate themethod in a case study.
I was once a QMUL Maths student, quite lost on what I wanted to do next. I strived to get into Investment banking and it wasn't all that it seems, so I had to make a change. I now programme; I'm now creative.
Not everyone knows what to do after finishing University or how to make the best out of themselves. Join me to talk about tips of how to do well in University and how to do well after, from my own mistakes.
Slides for the seminar are available following the link: http://prezi.com/_hpds1p9jrqx/making-the-best-out-of-things/
We examine optimal design for parameter estimation of Gaussian process regression models under input-dependent noise. Such a noise model leads to heteroscedastic models as opposed to homoscedastic models where the noise is assumed to be constant. Our motivation stems from the area of computer experiments, where computationally demanding simulators are approximated using Gaussian process emulators as statistical surrogates. In the case of stochastic simulators, the simulator may be evaluated repeatedly for a given parameter setting allowing for replicate observations in the experimental design. Our findings are applicable however in the wider context of design for Gaussian process regression and kriging where the parameter variance is sought to be minimised. Designs are proposed with the aim of minimising the variance of the Gaussian process parameter estimates, that is we seek designs that enable us to best learn about the Gaussian process model.
We construct heteroscedastic Gaussian process representations and propose an experimental design technique based on an extension of Fisher information to heteroscedastic models. We empirically show that the although a strict ordering of the Fisher information to the maximum likelihood parameter variance is not exact, the approximation error is reduced as the ratio of replicated points is increased. Through a series of simulation experiments on both synthetic data and a systems biology model, the replicate-only optimal designs are shown to outperform both replicate-only and non-replicate space-filling designs as well as non-replicate optimal designs. We consider both local and Bayesian D-optimal designs in our experiments.
The study aims to evaluate the response of children to a new treatment, Multidimensional Treatment Foster Care in England (MTFCE). Trajectories of child behaviour were studied over time to identify subgroups of treatment response. Growth Mixture Modelling (GMM) was used to find subgroups in the data. A GMM describes longitudinal measures of a single outcome measure as being driven by a set of subject-varying continuous unobserved or latent variables - the so-called growth factors. The growth factors define the individual trajectories. GMM estimates mean growth curves for each class, and individual variation around these growth curves. This allows us to find clusters in the data. Starting characteristics of children were included into the GMM to see if these predicted class membership. Class membership was also checked to see if it predicted outcomes of interest.
Computer simulators
A computer experiment consists of simulation of a computer modelwhich is expected to mimic or represent some aspect of reality.The analysis of computer simulations is a relatively recent newcomerin the bag of tools available for the statistics practitioner.Although simulations do not neccesarily represent reality, it ispossible to gain knowledge about a certain phenomena through theanalysis of such simulations, and the role of the statistician isto design efficient experiments to explore the parameter regionand to model with areasonable degree of accuracy the response.
I intend to guide the talk through a series of examples derivedfrom practice, ranging from analysis of airplane blades to thesensitivity of parameters in a model for disease spread. Themain example will be based on the analysis for a model of theevolution of rotavirus in a population.
When setting a monitoring programme for conditions such as diabetes, hypertension, high blood pressure, kidney disease or HIV, one aspect - the interval between monitoring tests - is often made by consensus rather than from evidence. The difficulty with randomized trials in this area is easily demonstrated. Oxford's Monitoring and Diagnosis group, and collaborators, have used longitudinal modelling to show that over-frequent monitoring leads to a kind of 'multiple testing' problem and hence to over-treatment. This talk will discuss the methods we use and illustrate them with a clinical example.
The analysis of variance (Anova) is one of the most popular statistical methods for analysing data. It is most powerful when applied to data from designed experiments. Statistics courses for biologists and other scientists usually explain the underlying theory for simple designs such as the completely randomized design, the randomized complete block design or two-factor factorial designs. However, in applications usually much more advanced designs are used which involve complicated crossing and nesting structures as well as random and fixed effects. Although when designing an experiment scientists can often rely on their intuition, analysing the collected data frequently represents a major challenge to the non-expert.This talk presents the AutomaticAnova package which has been designed as a user-friendly Mathematica package that enables researchers to analyse complicated Anova models without requiring much statistical background. It is based on RA Bailey's theory of orthogonal designs which covers a wide range of models and in particular all designs that can be obtained by iterative crossing and nesting of factors. The theory distinguishes between block factors, which have random effects, and treatment factors whose effects are fixed and so in general the models are mixed effects models. For these designs the Anova table can be derived in an elegant way by using Hasse diagrams.The AutomaticAnova package provides a graphical user interface which has been implemented by using Mathematica's GUIKit. Input data are submitted in the form of a Microsoft Excel spreadsheet and essentially the user only has to specify which columns in the spreadsheet represent block and treatment factors respectively, and whether only main effects or main effects and interactions should be included in the analysis. Dynamic enabling/disabling of dialogs and controls minimizes the risk of providing incorrect input information. The most important feature of the AutomaticAnova package is then, however, that the model for the analysis of variance is automatically inferred from the structure of the design in the Excel file. In particular, no model formula needs to be specified for the analysis. It is believed that this aspect of the package's functionality will be highly attractive to practitioners. The output includes the Anova table, estimated variance components and Hasse diagrams and can be saved as a PDF file.Another feature of the package is that it can be used at the planning stage of an experiment when no response data are yet available to see what the analysis would like. That is, having only specified the design in the form of a spreadsheet the packages provides the so-called skeleton analysis of variance which shows the breakdown of the sum of squares and corresponding degrees of freedom. Also, when no response data are available the package uses Mathematica's symbolic capabilities to derive analytical formulae for the estimators of the variance components.The presentation will demonstrate the use of the package and describe the principles underlying its implementation. Several examples will be used to illustrate how the AutomaticAnova package can help the non-statistician to analyse complicated Anova designs without having to worry too much about statistics.
Supersaturated designs (SSDs) are used for screening out the important factors from a large set of potentially active variables. The huge advantage of these designs is that they reduce the experimental cost drastically, but their critical disadvantage is the high degree of confounding among factorial effects. In this contribution, we focus on mixed-level factorial designs which have different numbers of levels for the factors. Such designs are often useful for experiments involving both qualitative and quantitative factors. When analyzing data from SSDs, as in any decision problem, errors of various types must be balanced against cost. In SSDs, there is a cost of declaring an inactive factor to be active (i.e. making a Type I error), and a cost of declaring an active effect to be inactive (i.e. making a Type II error). Type II errors are usually considered much more serious than Type I errors. We present a group screening method for analysing data from E(f_{NOD})-optimal mixed-level supersaturated designs possessing the equal occurrence property. Based on the idea of the group screening methods, the f factors are sub-divided into g ?group-factors?. The ?group-factors? are then studied using the penalized likelihood methods involving a factorial design with orthogonal or near-orthogonal columns. The penalized likelihood methods indicate which ?group factors? have a large effect and need to be studied in a follow-up experiment. We will compare various methods in terms of Type I and Type II error rates using a simulation study.Keywords and phrases: Group screening method, Data analysis, Penalized least squares, Super-saturated design.
Insect evidence around a decomposing body can provide a biological clock by which the time of exposure can be estimated. As decomposition progresses, flyy larvae grow and go through distinct developmental stages, and a succession of insect species visits the scene.Viewed broadly, the question, how long the body has been exposed, fitsinto the framework of inverse prediction. However, insect evidence is both quantitative and categorical. Size data are multivariate, and their magnitudes, variances, and correlations change with age. Presence/absence of important species manifests categorically, but the number of distinct categories can number in the thousands.The statistical challenge is to devise an approach that can provide acredible, defensible estimate of postmortem interval based on such data. In this talk I shall present the setting and describe joint work I have undertaken with Jeffrey D. Wells, a forensic entomologist, to address this question.
In this work, we deal with the question of detection of late effects in the setting of clinical trials. The most natural test for detecting this kind of effects was introduced by Fleming and Harrington. However, this test depends on a parameter, that, is the context of clinical trials, must be chosen a priori.
We examine the reasons why this test is adapted to the detection of late effects by studying its optimality in terms Pitman Asymptotic Relative Efficiency. We give an explicit form of the function describing alternatives for which the test is optimal. Moreover, we will observe, by means of a simulations study, this test is not very sensitive to the value of the parameter, which is very reassuring for its use in clinical trials.
After giving a brief overview of the different phases of drug development,I will present a case study that describes the planning of a dose-finding studyfor a compound that was in early clinical development at the time of the study.Data from a previous trial with the same primary endpoint was available for a marketed drug that had the same pharmacological mechanism, which providedstrong prior information for some characteristics of the new compound, includingthe shape of the dose-response relationship. The design used for this trial includedan adaptive element where the allocation of doses to the patients was changedafter an interim analysis. In this talk I will compare the performance different adaptivedesigns and compare them to a corresponding non-adaptive design. I will also comparethe performance of Bayesian and model-based maximum likelihood estimation relativeto the use of simple pairwise comparisons of treatment means.
Suppose that two treatments are being compared in a clinical trial in which response-adaptive randomisation is used. Upon termination of the trial, interest lies in estimating parameters of interest. Although the usual estimators will be approximately unbiased for trials with moderate to large numbers of patients, their biases may be appreciable for small to moderate-sized trials and the corresponding confidence intervals may also have coverage probabilities far from the nominal values. An adaptive two-parameter model is studied in which there is a parameter of interest and a nuisance parameter. Corrected confidence intervals based on the signed root transformation are constructed for the parameter of interest which have coverage probabilities close to the nominal values for trials with a small number of patients. The accuracy of the approximations is assessed by simulation for two examples. An extension of the approach to higher dimensions is discussed.
Hard-to-set factors lead to split-plot type designs and mixed models. Mixed models are used to analyze multi-stratum designs as each stratum may have random effects on the responses. It is usual to use residual maximum likelihood (REML) to estimate random effects and generalized least squares (GLS) to estimate fixed effects. However, a typical property of REML-GLS estimation is that it gives highly undesirable and misleading conclusions in non-orthogonal split-plot designs with few main plots. More specifically, the variance components are often estimated poorly using maximum likelihood (ML) methods when there are few main plots. To overcome the problem a Bayesian method considering informative priors for variance components and using Markov chain Monte Carlo (MCMC) sampling would be an alternative approach. In the current study we have implemented MCMC techniques in two industrial experiments. During binary data analysis, we have faced convergence problems frequently. Perhaps these are due to separation problems in the data. In future, we will define a design criterion that will minimize the problem of separation.
The estimation of population parameters using complex survey data requires careful statistical modelling to account for the design features. The analysis is further complicated by unit and item nonresponse for which a number of methods have been developed in order to reduce estimation bias.In this talk I will address some issues that arise when the target of the inference is the conditional quantile of a continuous outcome. Survey design variables are duly included in the analysis and a bootstrap variance estimation approach is considered. A novel multiple imputation method based on sequential quantile regressions (QR) is developed. Such method is able to preserve the distributional relationships in the data, including conditional skewness and kurtosis, and to successfully handle bounded outcomes. The motivating example concerns the analysis of birthweight determinants in a large cohort of British children.
The accelerated development of high-throughput technologies has enabled understanding of how biological systems function at a molecular level, for instance by unraveling the interaction structure of genes responsible for carrying out a given process. Systems biology has the potential to enhance knowledge acquisition and facilitate the reverse engineering of global regulatory networks using gene expression time course experiments. In this talk I will present some models we have developed for estimating a gene interaction network from time course experimental data. The basic structure of these models is governed by a dynamic Bayesian network, which allows us to include expert biological information as well. Given the complexity of model fit, we resort to numerical methods for model estimation. I will exemplify gene network inference using experimental data from the metabolic change in Streotomyces coelicolor and the circadian clock in Arabidopsis thaliana.
In order to make sense of many biological processes, it is crucial to understand the dynamics of the underlying chemical reactions. Chemical reaction systems are known to exhibit some interesting and complex dynamics, such as multistability (a situation where two or more stable equilibria coexist) or oscillations.Here we take the deterministic approach and assume that the reactions obey the law of mass-action, so the systems are described by ODEs with specific polynomial structure. For such systems, this polynomial structure allows us to gain surprisingly deep insights into systems' dynamics.In my talk I will overview several methods for analysing these specific chemical reaction networks, encompassing algebraic geometry, bifurcation theory and graph theory.
In this talk, we demonstrate a procedure for calibrating and emulating complex computer simulation models having uncertain inputs and internal parameters, with application to the NCAR Thermosphere-Ionosphere-Electrodynamics General Circulation Model (TIE-GCM), and illustrate preliminary findings for Computational Fluid Dynamics and tsunami wave modelling. In the case of TIE-GCM, we compare simulated magnetic perturbations with observations at two ground locations for various combinations of calibration parameters. These calibration parameters are: the amplitude of the semidiurnal tidal perturbation in the height of a constant-pressure surface at the TIE-GCM lower boundary, the local time at which this maximises and the minimum night-time electron density. A fully Bayesian approach, that describes correlations in time and in the calibration input space is implemented. A Markov Chain Monte Carlo (MCMC) approach leads to potential optimal values for the amplitude and phase (within the limitations of the selected data and calibration parameters) but not for the minimum night-time electron density. The procedure can be extended to include additional data types and calibration parameters.
A wide class of on-line estimation procedures will be proposed for the general statistical model.In particular, new procedures for estimating autoregressive parameters in $AR(m)$ models will beconsidered. The proposed method allows for incorporation of auxiliary information into the estimationprocess, and is consistent and asymptotically efficient under certain regularity conditions. Also,these procedures are naturally on-line and do not require storing all the data.Two important special cases will be considered in detail: linear procedures and likelihood procedures with the LS truncations. A specific example will also be presented to briefly discuss some practical aspects of applications of the procedures of this type.
Classical path analysis provides a simple way of expressing the observedassociation of two variables as the sum of two terms which can with goodreason be described as the "direct effect" of one variable on the otherand the "indirect effect" via a third, intervening variable. This resultis used for linear models for continuous variables. It would often be ofinterest to have a similar effect decomposition for cases where some ofthe variables are discrete and modelled using non-linear models. Onesuch problem occurs in the study of social mobility, where the aim is todecompose the association between a person's own and his/her parents'social classes into an indirect effect attributable to associationsbetween education and class, and a direct effect not due to differencesin education.Extending the idea of linear path analysis to non-linear modelsrequires, first, an extended definition of what is meant by total,direct and indirect effects and, second, a way of calculating sampleestimates of these effects and their standard errors. One solution tothese questions is presented in this talk. The method is applied to datafrom the UK General Household Survey, illustrating the magnitude of thecontribution of education to social mobility in Britain in recentdecades.
[This is joint work with John Goldthorpe (Nuffield College, Oxford)]
In 2006 the TGN1412 clinical trial was suddenly aborted due to a very strong cyto-toxic reaction of the six volunteers who were treated with the drug candidate. An Expert Scientific Group on Clinical Trials as well as the RSS Working Party wrote reports on what happened, how this could have been avoided and what to recommend for future trials of this kind. I will present some work related to designs of such trials, in particular my work on adaptive design of experiments. I will also present some recommendations of the RSS Working Group (Senn et al. 2007).
Senn, S., Amin, D., Bailey, R.A., Bird, S.M., Bogacka, B., Colman, P., Garrett, A., Grieve, A., Lachmann, P. (2007).Statistical Issues in first-in-man studies. JRSS A.
For estimation in exponential family models, Kosmidis & Firth (2009, Biometrika) show how the bias of the maximum likelihood estimator may be reduced by appropriate adjustments to the efficient score function. In this presentation the main results of that study are discussed, complemented by recent work on the easy implementation and the beneficial side-effects that bias reduction can have in the estimation of some well-used generalised linear models for categorical responses. The construction of confidence intervals to accompany the bias-reduced estimates is discussed.
Bayesian decision theory can be used not only to establish the optimal samplesize and its allocation in a single clinical study, but also to identify an optimalportfolio of research combining different types of study design. Within a singlestudy, the highest societal pay-off to proposed research is achieved when itssample sizes, and allocation between available treatment options, are chosen tomaximise the Expected Net Benefit of Sampling (ENBS). Where a number ofdifferent types of study informing different parameters in the decision problemcould be conducted, the simultaneous estimation of ENBS across all dimensionsof the design space is required to identify the optimal sample sizes and allocationswithin such a research portfolio. This is illustrated through a simpleexample of a decision model of zanamivir for the treatment of influenza. Thepossible study designs include:i) a single trial of all the parameters;ii) a clinical trial providing evidence only on clinical endpoints;iii) an epidemiological study of natural history of disease andiv) a survey of quality of life.The possible combinations, samples sizes and allocation between trial arms areevaluated over a range of cost-effectiveness thresholds. The computational challengesare addressed by implementing optimisation algorithms to search theENBS surface more efficiently over such large dimensions.
Usual Markov chain Monte Carlo (MCMC) methods use a single Markov chain to sample from the distribution of interest. If the target distribution is described by isolated modes then it may be difficult for these methods to jump between the modes and for this reason, the mixing is slow. Usually different starting positions are used to find out isolated modes but this is not always feasible especially when the modes are difficult to find or there is a big number of them. In this talk, I avoid these problems by introducing a new population MCMC sampler, the tempered simplex sampler. The tempered simplex sampler uses a tempering ladder to promote mixing while a population of Markov chains is regarded under each temperature. The sampler proceeds by first updating the Markov chains under each temperature using ideas from the Nelder-Mead simplex method and then, by exchanging different populations of Markov chains under different temperatures. The performance of the tempered simplex sampler is outlined on several examples.
This talk follows up on a presentation given by Helene Muller from QM's School of Biological and Chemical Sciences at a statistics study group meeting in January 2009. The problem is to devise an appropriate analysis for investigating if bumblebees behave in a consistent way. The dataset consists of N=729 observations which represent repeated measurements on 81 bees under various experimental conditions. A modelling strategy for these data is presented, which yields to fitting a nested linear mixed model to the Box-Cox transformed responses. The results from the corresponding analysis appear to be very satisfying and allow a classification of bees into consistent and inconsistent ones. This is joint work with Helene Muller and Lars Chittka.
In Bayesian Inference the choice of prior is very important toindicate our beliefs and knowledge. However, if these initial beliefsare not well elicited, then the data may not conform to ourexpectations. The degree of discordancy between the observed data andthe proper prior is of interest. Pettit and Young (1996) suggested aBayes Factor to find the degree of discordancy. I have extended theirwork to further examples.I try to find explanations for Bayes Factor behaviour. As analternative I have looked at a mixture prior consisting of theelicited prior and another with the same mean but a larger variance.The posterior weight on the more diffuse prior can be used as ameasure of the prior and data discordancy and also gives an automaticrobust prior. I discuss various examples and show this new measure iswell correlated with the Bayes factor approach.
We consider inference and optimal design problems for finite clusters from bond percolation on the integer lattice Zd or, equivalently, for SIR epidemics evolving on a bounded or unbounded subset of Zd with constant life times. The bond percolation probability p is considered to be unknown, possibly depending, through the experimental design, on other parameters. We consider inference under each of the following two scenarios:
This is a joint work with Professor Gavin Gibson and Dr Stan Zachary, both with Heriot-Watt University, Edinburgh.
When modelling a computer experiment, the deviation between model and simulation data is due only to the bias (discrepancy) between the model for the computer experiment and the deterministic (albeit complicated) computer simulation. For this reason, replications in computer experiments add no extra information and the experimenter is more interested in efficiently exploring the design region.
I'll present a survey of designs useful for exploring the design region and for modelling computer simulations.
Sensitivity analysis in statistical science studies how scientifically relevant changes in the way we formulate problems affect answers to our questions of interest. New advances in statistical geometry allow us to build a rigorous framework in which to investigate these problems and develop insightful computational tools, including new diagnostic measures and plots.
This talk will be about statistical model elaboration using sensitivity analysis aided with geometry. Throughout we assume there is a working parametric model. The key idea here is to explore discretisations of the data, at which point multinomial distributions become universal (all possible models are cov- ered). The resulting structure is well-suited to discussing practically important statistical topics, such as exponential families and generalised linear models. The theory of cuts in exponential families allows clean inferential separation between interest and nuisance parameters and provides a basis for appropriate model elaboration. Examples are given where the resulting sensitivity analyses indicate the need for specific model elaboration or data re-examination.
The quality of incomplete-block designs is commonly assessed by the A-, D-, and E-optimality criteria. If there exists a balanced incomplete-block design for the given parameters, then it is optimal on all these criteria. It is therefore natural to use the proxy criteria of (almost) equal replication and (almost) equal concurrences when choosing a block design.
However, work over the last decade for block size 2 has shown that when the number of blocks is near the lower limit for estimability of all treatment contrasts then the D-criterion favours very different designs from the A- and E-criteria. In fact, the A- and E-optimal designs are far from equi-replicate and are amongst the worst on the D-criterion.
I shall report on current work which extends these results to all block sizes. Thus the problem is not blocks of size 2; it is low replication.
For 2024-2025, the talks are held on Monday at 14:00-15:00pm in room MB-503 on floor 5 of the School of Mathematical Sciences Building, Queen Mary University of London.
The seminar is organised in a hybrid fashion. Attendance can be either in-person or via zoom using that link.
The current seminar organisers are Eftychia Solea and Ingrid Amaranta Membrillo Solis