The SFU/UBC Joint Statistics Seminar is a semi-annual event hosted jointly by the SFU Department of Statistics and Actuarial Science and the UBC Department of Statistics. Over the course of its 19-year history, this seminar has provided graduate students in Statistics and Actuarial Science from both universities a chance to present their research to and connect with their peers. By fostering a relationship between the departments at these two universities we hope to create a collegial and collaborative relationship in the graduate student community in Vancouver. The seminar consists of six talks delivered by graduate students; three from each of the host universities. It will be concluded by a talk from a faculty member from one of the universities; this year it will be given by Dr. Owen Ward from SFU.
We propose the following basket trial to test the effectiveness of a new treatment for several types of cancers for which the endpoint is the survival time. We use Bayesian subgroup analysis to classify the cancer types into different clusters according to both the survival time and the longitudinal biomarker measurements of the patient. We then make Bayesian inference to decide whether to stop recruiting patients for each cluster early, and make conclusions about whether the treatment is effective for each cluster according to the estimated median survival time. The simulation study shows that our proposed method performs better than the independent approach and the BHM method in most of the scenarios.
Variational flows allow practitioners to learn complex continuous distributions, but approximating discrete distributions remains a challenge. Current methodologies typically embed the discrete target in a continuous space - usually via continuous relaxation or dequantization - and then apply a continuous flow. These approaches involve a surrogate target that may not capture the original discrete target, might have biased or unstable gradients, and can create a difficult optimization problem. In this work, we develop a variational flow family for discrete distributions without any continuous embedding. First, we develop a measure-preserving and discrete (MAD) invertible map that leaves the discrete target invariant, and then create a mixed variational flow (MAD Mix) based on that map. Our family provides access to i.i.d. sampling and density evaluation with virtually no tuning effort. We also develop an extension to MAD Mix that handles joint discrete and continuous models. Our experiments suggest that MAD Mix produces more reliable approximations than continuous-embedding flows while being significantly faster to train.
We apply an inverse ensemble forecasting approach to COVID-19 outbreaks in a collection of U.S. counties containing college towns. Modelling disease progression with an SIR model, we define time-dependent maps from the infection parameters to infection levels over time and assume an unobserved probability distribution on the parameter space induces an output distribution on the infection levels. We estimate the output distribution with data and recover a distribution on the parameter space through the formulation of a Stochastic Inverse Problem solved through disintegration. This solution corresponds to a distribution over the possible infection curves from which we can forecast future infection levels in an ensemble forecasting framework. We verify the method through a simulation study, then apply the method to experimental data. Results suggest the method can provide accurate forecasts under certain population and modelling assumptions, but that the SIR model cannot adequately describe the disease dynamics in the population.
The National Flood Insurance Program (NFIP), as the primary provider of residential flood insurance in the United States, has faced financial difficulties due to climate change and extreme weather events. In this study, we introduce a factor-vine copula framework for analyzing multivariate time series corresponding to the flood insurance losses incurred by the NFIP. Within this framework, we employ a flexible extension of the generalized Pareto distribution to accommodate the characteristics of the claim data, including seasonality, zero inflation, and heavy-tailedness. The factor-vine copula models the spatial dependence of the losses with an oblique factor copula while accounting for the time dynamics of the latent process with a multivariate vine copula. Bayesian inference methods were employed relying on reversible jump Markov Chain Monte Carlo algorithms and Hamiltonian Monte Carlo methods, implemented within the STAN framework. The proposed model reveals the hidden dynamics of the latent process, providing insights into the large insurance losses caused by extreme weather events, notably hurricanes and storms. Inspired by the ideas of systemic risk management in the banking industry, we conduct stress testing against extreme weather events and evaluate their impacts on aggregate flood insurance losses. This practice is vital in helping the NFIP manage risks and financial consequences brought by climate change and climate-related hazards.
Multifactor stochastic volatility jump-diffusion models play a pivotal role in understanding complex financial markets by incorporating various risk factors into a single framework. Nevertheless, the estimation of the model parameters remains a significant challenge. In addition, the inclusion of option prices in estimation methods introduces another layer of computational complexity to the problem. In this work, we develop a deterministic likelihood-based prediction-update method for estimating the model parameters and filtering the latent factors based on cumulants. The filtering procedure is based on a marginalized discrete nonlinear filter that uses Kalman-like recursions to deal with states appearing linearly in the model dynamics. We rely on a series of simulation studies to validate the filtering and estimation performance of our approach and compare it with existing methods in the literature. Our method is faster than simulation-based methods and exhibits lower bias compared to quasi-maximum likelihood. Finally, we apply the new approach to observed cumulants computed from S&P 500 option prices to recover parameters and filtered volatility factors.
Non-reversible parallel tempering (NRPT) is an effective algorithm for sampling from target distributions with complex geometry, such as those arising from posterior distributions in weakly identifiable and high-dimensional Bayesian models. In this work we establish the geometric ergodicity of NRPT under an efficient local exploration hypothesis. The rates that we obtain are bounded in terms of an easily-estimable divergence, the global communication barrier (GCB), that was recently introduced in the literature. We obtain analogous ergodicity results for classical reversible parallel tempering, providing new evidence that NRPT dominates its reversible counterpart. We also present some general properties of the GCB and bound it in terms of the total variation distance and the inclusive/exclusive Kullback-Leibler divergences. We conclude with a series of experiments that assess the tightness of our bounds, including a central limit theorem result that follows as a consequence of geometric ergodicity, and that connect geometric ergodicity to useful Markov chain Monte Carlo diagnostics for practitioners.
The analysis of network data has been an extremely popular topic in statistics in recent years, with many theoretical results and real data applications. In this talk I'll provide an approachable overview of the key questions in statistical network analysis, describing common models and tools. I'll then mention a selection of recent problems in the field, including the role of modern machine learning in network data analysis, and extensions to continuous time events on networks. Finally, I talk about my experience going from a grad student to a faculty member, including the job application process and life as a new assistant professor.