MENU

Winfield Chen

Title: Toward efficient population-level genome-wide association studies
Date: April 29, 2022
Time: 10:30 AM (PDT)
Location: Remote delivery

Abstract

Big genomic resources such as UK Biobank involve hundreds of thousands of subjects and are being established for prospective epidemiological cohort studies with the goal of improving the screening and treatment of disease. Genome-wide association studies (GWAS) on these resources experience time and space efficiency issues which are amplified at the population level. We show two new methods for mitigation of these issues. Firstly, we present a new compressed file format and associated software which exploits properties of the statistical distribution of population genetics files and enables computationally faster and smaller GWAS, which results in reduced costs for GWAS research. We benchmark this new method on UK Biobank data against the current state-of-the-art and find a significant space efficiency increase. Secondly, software implementing an efficient clustering method for discovered associations from such studies is also presented. The method is applied on GWAS of nearly 4,000 brain imaging phenotypes from UK Biobank, with results associated with pathways involved in various diseases.

Keywords: statistical genetics; genome-wide association study; information theory; source coding; entropy compression; algorithms and data structures