BCN, 29 April 2021.- The Vertebrate Genome Project (VGP) today announced its flagship study and associated publications focusing on the quality of genome assembly and standardization for the field of genomics. The international consortium, led by a research team from Rockefeller University and involving the Centro Nacional de Análisis Genómico (CNAG-CRG) and the Institute of Evolutionary Biology (IBE) -a joint centre of the Spanish National Research Council (CSIC) and Pompeu Fabra University (UPF)-, is publishing a proof of concept with 16 complete, high quality sets of reference genomes of vertebrates that will allow access to the study and conservation of species at an unprecedented scale.
Thanks to ten years of work by the scientific community of the Genome 10K Project (G10K) to sequence the genomes of 10,000 vertebrate species and other comparative genomics efforts around the world, the VGP has taken advantage of the dramatic improvements in sequencing technologies of recent years to start producing high quality reference genome assemblies for over 70,000 extant vertebrates.
“This massive comparative genomics project represents a new era of innovation in genome science, developing and utilizing in new ways state-of-the-art techniques of sequencing, assembly and annotation with implications for addressing fundamental questions in comparative biology, genetics and biodiversity conservation”, says Tomàs Marquès-Bonet, principal investigator at the Comparative Genomics group of the IBE, also affiliated to the CNAG-CRG and member of the VGP Steering Committee.
The consortium will also serve as a model for other coordinated genomics projects, such as the Catalan Initiative for the Earth Biogenome Project, that might take advantage of the extensive infrastructure and knowledge of the VGP, which has involved the collaboration of hundreds of international scientists from over 50 institutions across 12 different countries since the project began, including Ivo Gut, director of CNAG-CRG and head of the Biomedical Genomics group.
In a special issue of Nature, with complementary articles published in other scientific journals simultaneously, the VGP details numerous technological improvements in genome assembly. In the main article, the VGP demonstrates the feasibility of establishing and achieving high quality metrics for the reference genome of almost all species. With its new approach, the international team has managed to combine automated long-range reading of genomes by using new algorithms to reconstruct the pieces of the genomic puzzle in each case almost error-free.
“When I was asked to assume the leadership of G10K in 2015, I emphasized the need for more partners and to work on approaches that would produce the highest quality data possible, as the students and postdocs in my own group were taking months to correct the structure of each gene in genome sequences for their experiments”, says Erich Jarvis, head of the VGP sequencing centre at Rockefeller University, G10K coordinator and researcher at the Howard Hughes Medical Institute. “For me, that was not just a practical mission but a moral one”.
The first genomes analysed have already led to new discoveries with implications for characterizing biodiversity and contributing to conservation and human health. In particular, the first high-quality reference genomes of six species of bat, generated by the Bat 1K consortium, revealed the selection and loss of genes related to immunity that are directly relevant to the investigation of emerging infectious diseases, like COVID-19 today.
As an initial large-scale project of high quality eukaryotic reference genomes, the VGP has also become the working model for other large consortia, including the Earth Biogenome Project, the Darwin Tree of Life, the Catalan Initiative for the Earth Biogenome Project, and the European Reference Genome Atlas, among others.
Only until now, the VGP consortium has led to the generation of more than one hundred genomes representing the most complete versions of these species to date. The genomic data developed have principally been generated by three sequencing centres committed to the mission of the VGP, including the vertebrate genome lab at Rockefeller University (New York, USA) - partly supported by the Howard Hughes Medical Institute, the Wellcome Sanger Institute (United Kingdom), and the Max Planck Institute (Germany).
As its next step, the VGP will continue to work in networks around the world and with other consortia to complete the first phase of the project which will consist of analysing approximately 260 species - with one representative species for each order of vertebrates separated by a minimum of 50 million years from a common ancestor with other species. The VGP is planning to create genomic resources that will also enable relating these 260 species, including complete genomes, that provide a means to understand their evolutionary history in great detail. The second phase will focus on analysing representative species of each family of vertebrates and is currently in the process of identifying samples and raising funds.
Work of reference: Towards complete and error-free genome assemblies of all vertebrate species