Benchmarking bioinformatics pipelines to improve accuracy and computing costs

Researchers from the CNAG have benchmarked six combinations of state-of-the-art read aligners and variant callers for WES and WGS

Sequencing a human genome today costs less than 1% of what it did in 2006, and takes hours instead of years. Computing cost to analyze sequencing data has decreased far less than the cost of whole exome and whole genome sequencing (WES and WGS) and today constitutes a non-neglible fraction of the cost of sequence analysis. The clinical genetics community is adopting WES and WGS as a standard practice in research and diagnosis and therefore it is essential to choose the most accurate and cost-efficient analysis pipeline.

Now a team of researchers from the CNAG and the Autonomous University of Barcelona have benchmarked six combinations of state-of-the-art read aligners and variant callers for WES and WGS. The study, published in Human Mutation, aims to evaluate the robustness of the variant detection process, while taking into account the computing resources required.

The results show that the six variant calling pipelines are consistent in 70% of the genome, but the remaining 30% of the genome is not reliably callable, with different pipelines detecting different variants, and these dark regions remain a challenge with current technology.

Regarding the computing costs, the study found substantial differences between tools. It is notable that GEM3, the alignment tool developed and used at CNAG was found to be 4 times faster than the widely used BWA-MEM. While BWA-MEM required almost 300 CPU hours for WGS alignment, GEM3 used less than 60 CPU hours to complete the same task. Moreover, the combination of aligner and variant caller used at CNAG performed the best overall.

Work of reference:

From Wet-lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

Funders:

Member of:

Certificates:

Search form

You are here