Researchers from the Functional Genomics Team at CNAG have identified critical limitations in standard quality control (QC) workflows for single-cell RNA sequencing (scRNA-seq), leading to the frequent inclusion of low-quality cells in single-cell transcriptomic atlases.

Published in BMC Genomics, the study demonstrated that commonly used filters for selecting high-quality cells in cellular maps are inadequate for complex organs such as the heart, liver, and kidney. 
The CNAG Team has demonstrated that utilising nuclear markers, such as intronic fraction and MALAT1 gene expression, significantly enhances the quality of single-cell atlas, thereby improving downstream analyses.
This innovative workflow has been successfully applied to existing datasets, including those from the Tabula Muris, Tabula Sapiens, and Human Cell Atlas, covering 500 samples across six reference cell atlases and more than 150 cell types from more than 20 tissues.
 
 
February 5th, 2025. Mapping every single cell in the human body is one of the great challenges in genomics today. Over the past decade, scientists from around the world have produced an impressive number of cell atlases, profiling more than 63 million cells. The advancement of single-cell RNA sequencing technologies has contributed to this goal, enhancing our understanding of gene expression, cellular diversity, and disease. In this fast-paced race to analyse all cells in the human body—and those of mice—researchers from the Functional Genomics Team at CNAG have uncovered a significant reality related to the quality of these cellular atlases: they contain a substantial proportion of nuclei-free low-quality cells, in some cases exceeding 85%. Published in the journal BMC Genomics, the study reveals that reference data repositories like the Human Cell Atlas, Tabula Muris, and Tabula Sapiens often include many poor-quality cells, undermining their utility. 

 

These poor-quality cells arise from technical artifacts such as damaged cells, incomplete cell lysis, and contamination from cytoplasmic debris. Despite this, standard QC measures, which focus on parameters such as the mitochondrial content and number of detected genes/transcripts per cell, fail to evaluate the presence of the nucleus. Moreover, these current methods have proven to be insufficient for challenging tissues and organs such as the heart, liver, or kidney. The process of obtaining high-quality cells from these tissues is particularly difficult, as the dissociation treatments are quite aggressive, damaging some cell types and releasing high amounts of RNA into solution, which then become ambient RNA and confound the analysis. 

 

The CNAG researchers propose advocating for the routine inclusion of nuclear metrics, specifically the intronic fraction and MALAT1 expression. Null or very low levels of intronic reads or MALAT1 indicate that the nucleus has not been lysed, and only the cytosolic transcriptome has been sequenced. To demonstrate the high presence of poor-quality cells in publicly cellular atlases, the team have reanalised 500 samples from more than 150 cell types across six cell atlases, covering more than 20 human and mouse tissues. 

 

The results shed light on the significant presence of low-quality cells in most datasets, with some cases severely compromising organs such as the kidney. The team demonstrated that filtering cells with absent nuclear markers effectively excludes poor-quality cells. “Our findings show that many datasets rely on permissive QC workflows that fail to detect nuclear content deficiencies,” explains Tomàs Montserrat, bioinformatician at CNAG and first author of the study. “This oversight is particularly critical for tissues that require aggressive dissociation protocols. We strongly recommend incorporating nuclear markers into the QC step as a best practice for single-cell analysis.”  

 

The implications of these findings extend beyond data quality. “Reusing reference datasets with low-quality cells risks propagating errors into new analyses,” adds Anna Esteve-Codina, the study’s senior author. “Adopting stricter QC standards will enhance the reliability of cellular maps, benefiting the entire research community.”

 

Nuclear-content QC metrics can be applied in any single-cell RNA-seq study easily. Both nuclear metrics are compatible with existing bioinformatics tools, providing a scalable solution for improving new single-cell datasets and review those already published. However, the incorporation of these two nuclear metrics is still underused in the single-cell research community. CNAG researchers are reaching out to leading institutions in the field to present the results of this study. Indeed, the advancement not only paves the way for higher-quality cellular atlases but also underscores the importance of refining experimental and computational protocols to meet the growing demands of single-cell research.

 

Reference article

Montserrat-Ayuso, Tomàs, and Anna Esteve-Codina. ‘High Content of Nuclei-Free Low-Quality Cells in Reference Single-Cell Atlases: A Call for More Stringent Quality Control Using Nuclear Fraction’. BMC Genomics, vol. 25, no. 1, Nov. 2024, p. 1124. BioMed Centralhttps://doi.org/10.1186/s12864-024-11015-5