Scientists Figure Out How to Merge Single-Cell Datasets for Medical Research

Our cells express a myriad of different genes. So-called “single-cell datasets”, which are profiles of gene expression of a single cell, are very useful in medical research because they explain, in minute detail, ongoing intracellular physiology. But, they can be quite massive and since different labs can use different techniques to create and publish these datasets, they’re hard to use in combination with other datasets.

Now researchers at MIT, using a method related to panoramic photography, have come up with a way to reliably merge very large, but very different, single-cell datasets together and make them useful. Up to twenty datasets of different cell types can be used to create a digital “panorama” of gene expression. The algorithm is rightfully called “Scanorama”, and it automatically identifies how and where to combine different datasets while maintaining the integrity and truthfulness of the data.

Scientists, using the technique, can not compare and contrast the functionality of cells and identify therapeutic targets, as well as where diseases find a way in. “Traditional methods force cells to align, regardless of what the cell types are. They create a blob with no structure, and you lose all interesting biological differences,” says Brian Hie, a PhD at MIT and one of the team members on this research. “You can give Scanorama datasets that shouldn’t align together, and the algorithm will separate the datasets according to biological differences.”

As a proof of concept, the team created a single merge of datasets that combined 26 different sources and 100,000 cells. The computation required took only 30 minutes, though the announcement doesn’t say what size computer was used.

Study in Nature Biotechnology: Efficient integration of heterogeneous single-cell transcriptomes using Scanorama…

Via: MIT…