Genome-Wide Modeling

Motivation
Interpreting the clinical significance of rare and novel genetic variants identified in whole-genome sequencing studies remains a critical challenge for patient care. Methods to predict the clinical significance of genetic variants that lie outside of protein coding sequences are particularly under-studied. We aim to develop predictive algorithms that apply to noncoding, intronic, and synonymous variants to complement available approaches for protein-altering missense variants.

Approach
One important mechanism by which genetic variants can achieve clinical significance is through regulating gene expression. This mechanism is likely to be particularly important for noncoding, intronic, and synonymous variants that do not directly affect protein structure by altering an amino acid sequence. Therefore, our initial approach to modeling the effects of genome-wide genetic variants is to develop machine learning algorithms that predict whether a given variant is likely to alter gene expression. We are currently training our predictive models on recently published studies of expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs).