MENU

PAYMAN NICKCHI

Title: Linkage fine-mapping on sequences from case-control studies and Goodness-of-fit tests based on empirical distribution function for general likelihood model
Date:
Tuesday, December 19th, 2023
Time: 10:00am
Location: LIB 2020/ Over Zoom
Supervised by: Richard Lockhart and Jinko Graham

Abstract: This thesis investigates two distinct projects: one in statistical genetics focusing on identifying rare causal variants using a sequence-relatedness approach, and another in goodness of-fit test based on the empirical distribution function (EDF) for any general likelihood model. First, we investigate an association method based on sequence-relatedness for identifying causal variants in a genomic region. We focus on conducting linkage analysis by using sequences as the unit of observation rather than the traditional methods that relied on individuals. We introduce two sequence-relatedness approach to associate similarity in genetic relatedness with similarity in trait values. We compare them to two common genotypic-association methods. Based on a simulation study, we show the efficacy of sequence-relatedness methods in improving the localization and detection of rare causal variants in an allelically heterogeneous disease trait. In addition, a post-hoc labeling procedure based on the idea of genealogical nearest neighbors is introduced to identify potential carriers or non-carriers of causal variants among case sequences. Second, we introduce a goodness-of-fit test based on the EDF in the presence of parameter estimation, which can be applied to any general likelihood model. In summary, the computation of the P-value in goodness-of-fit tests based on EDF with parameter estimation depends on the limiting large-sample covariance function of a stochastic process. This function relies on key elements of the model, including the Fisher information matrix and the derivatives of the cumulative distribution function under the null hypothesis. Computing these elements is often not straightforward and can be computationally intensive or impractical in some cases. In this thesis, we review the theory and propose a new method to estimate the covariance function of the process directly from the sample instead of analytical calculation. We consider twobroad cases: when the sample is independent and identically distributed, or when the expected value of the response variable depends on some covariates (e.g., linear model or generalized linear model). Through simulations, we demonstratethe reliability of the estimation method. Finally, we provide computational tools as an R package for practical implementation.

Keywords: linkage analysis; fine-mapping; sequence relatedness; goodness-of-fit test; empirical distribution function; general likelihood model iv