A Random Matrix Theory Approach to Denoise Single-Cell Data

(click to enlarge)

Single-cell technologies provide the opportunity to identify new cellular states. However, a major obstacle to the identification of biological signals is noise in single-cell data. In addition, single-cell data are very sparse. We propose a new method based on random matrix theory to analyze and denoise single-cell sequencing data. The method uses the universal distributions predicted by random matrix theory for the eigenvalues and eigenvectors of random covariance/Wishart matrices to distinguish noise from signal. In addition, we explain how sparsity can cause spurious eigenvector localization, falsely identifying meaningful directions in the data. We show that roughly 95% of the information in single-cell data is compatible with the predictions of random matrix theory, about 3% is spurious signal induced by sparsity, and only the last 2% reflects true biological signal. We demonstrate the effectiveness of our approach by comparing with alternative techniques in a variety of examples with marked cell populations.


“A Random Matrix Theory Approach to Denoise Single-Cell Data”

AUTHORS: Luis Aparicio, Mykola Bordyuh, Andrew Blumberg, Raul Rabadan.

LINK TO PUBLICATION:
Patterns. 2020 June 12. doi: 10.1016/j.patter.2020.100035


Previous
Previous

Identification of Relevant Genetic Alterations in Cancer using Topological Data Analysis

Next
Next

Genomic Characterization of HIV-Associated Plasmablastic Lymphoma Identifies Pervasive Mutations in the JAK–STAT Pathway