Doris produces novel murine reference genome assembly


By Roman Petrowski, Office of Communications

Dr. Peter Doris - Novel Murine Reference Genome Assembly
Peter Doris, PhD

As the world’s leading resource for disseminating genomic information, the NIH’s National Center for Biotechnology Information places special emphasis on three mammalian species that are key to medical research.

A recent addition to these genomes has been produced in a project led by Peter Doris, PhD, at the Brown Foundation Institute of Molecular Medicine. This genome assembly will be known as GRCr8 to reflect its adoption by the NCBI Genome Reference Consortium.

The genome assembly is the result of funding awarded by the National Human Genome Research Institute to Doris, professor and Mary Elizabeth Holdsworth Distinguished University Chair in Metabolic and Inflammatory Disease Research. The project is a collaboration between Doris and colleagues Ted Kalbfleisch, PhD, associate professor at the Maxwell H. Gluck Equine Research Center at the University of Kentucky, and Melissa Smith, PhD, assistant professor in the Department of Biochemistry and Molecular Genetics at the University of Louisville.

Doris stated, “Recently, genome assembly has been revolutionized by the use of new ‘long read’ sequencing methods that allow much more complete and accurate representation of the genome than previously possible. The GRCr8 assembly is based on PacBio HiFi sequencing technology using genomic DNA from a Brown Norway male murine model.

“Genome assembly resembles completing a jigsaw puzzle containing nearly 3 billion pieces. Long read sequencing reduces the number of pieces to about 300,000, but these must still be ordered and linked correctly until entire chromosomes are achieved. This involves a process called scaffolding in which technologies such as optical mapping and chromosome conformation capture were used to provide long-range information that helps correctly join and order the assembly. The result is a huge leap forward in genome quality.”

A variety of bioinformatic analyses have been applied to the genome assembly which substantiate its quality. Genome size has increased by more than 7% as more of the genome is assembled with notable increases in chromosomes containing duplicated regions and the incorporation of challenging repetitive regions such as centromeres and telomeres. The accuracy and completeness of the genome have allowed an additional 1,100 protein coding genes to be localized within the genome.