Regulation of gene expression is fundamental to a wide range of biological processes. From cell fate determination during development to malignant transformation during tumorigenesis, precise control of gene expression forms the basis of these processes. Our current understanding of gene regulation is, however, far from complete. Most published studies that profile gene expression are transcript-centric (i.e. they focus on measuring mRNA levels and levels of transcription factor binding). While these efforts revealed intricate networks of cooperativity amongst transcription factors in shaping complex biological processes, much of the post-transcriptional regulation are left unexplored. It remains unclear whether the process of protein translation is regulated by a network of factors to an extent of complexity similar to transcription regulation. We ask questions such as “Do sequence specific RNA binding proteins (RBP) cooperate in controlling translation?”, “Are there translational regulatory networks that orchestrate critical biological processes?”. Our research program focus on addressing these questions in biological contexts that are relevant to human health. Our immediate goals are to develop novel tools to systemically study RBP binding; to investigate regulatory functions of upstream Open Reading Frames (uORFs); and to integrate these functional genomics annotations with results from genetic studies, in order to fine map the regulatory variants and to provide mechanistic understanding for disease associated variants.

Current projects

Novel coding regions and uORF

Upstream Open Reading Frames (uORF) are in frame pairs of start and stop codons present in the 5’UTR of a coding transcript.  Translation at uORF is known to impact translation of the main ORF. Example of uORF regulation including HIF1-alpha translation regulation during hypoxia conditions. Systemic investigation of uORF function has been challenging mainly because of limited annotations. We recently developed a statistical model for genomic-data-driven annotation of coding sequences (riboHMM) (Raj, Wang and Shim et al., 2016). Using this method, we identified thousands of novel coding sequences (CDS) in the human genome, which include both translation of uORF and translation of noncoding RNAs. Unlike conventional CDS, these newly identified regions mostly encode short peptides with unknown functions. Armed with the new CDS annotation predicted with riboHMM, we are taking a proteogenomics approach to validation the peptide production and a pooled CRISPR knockout screen to identify fitness impact of these novel coding regions.

Genetics of translation regulation

Genetic variations in the human population that impact protein translation contain valuable information on both how a variant could impact the well-being of an individual and more generally how translation regulation is coded in the genome. Finding causal variants, however, is a key challenge in the field of functional genomics.  We use cis-QTL mapping approach to identify genetic variants impacting translation regulation. Our previous work established that in stable cell culture conditions, divergence in transcript level across cell lines is often attenuated at the protein level (Battle, Khan and Wang et al., 2015). While we were able to describe general trends of gene regulation across the genome, we were unable to identify causal genetic variants impacting such regulation. As in most cases, multiple variants are associated with the observed impact. Better annotations of RBP binding and uORF could help us prioritize the associated variants. A variant in a RBP binding motif close to the gene of interests is more likely to impact translation of the gene than a distal variant associated with no clear functional annotation.  To this end, we are currently working on developing genomic assays to survey genome-wide RBP binding. In addition, we are also extending the riboHMM model to allow multiple CDS inferences from each transcript to better annotate uORFs. Finally, we are extending our QTL mapping study to investigate interactions between environmental stimuli and genetic variants. We are currently focusing on mapping genetic variants associated with translational stress response.


Visit the PubMed profile page

Wang SH, Hsiao CJ, Khan Z, Pritchard JK. Post-translational buffering leads to convergent protein expression levels between primates. Genome Biol. 2018;19:83.

Raj A*, Wang SH*, Shim H*, Harpak A, Li YI, Engelmann B, et al. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. eLife. 2016;5:e13328.

Battle A*, Khan Z*, Wang SH*, Mitrano A, Ford MJ, Pritchard JK, et al. Genomic variation. Impact of regulatory variation from RNA to protein. Science. 2015;347:664–7.

*equal contribution