操作代码:https://satijalab.org/seurat/
Comprehensive Integration of Single-Cell Data
实在是没想到,这篇seurat的V3里面的整合方法居然发在了Cell主刊。
果然:大佬+前沿领域=无限可能
可以看到bioRxiv上是November 02, 2018发布的,然后Cell主刊June 06, 2019正式发表。
方法的创意应该在2017年底就有了,那时候我才刚来做single cell。
Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters.
As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function.
Here, we develop a strategy to “anchor” diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.
After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations.
Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns.
Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
亮点1:通过锚定的方法来整合多种数据,不同平台,不同形态。
亮点2:同时能整合scATAC-seq数据
亮点3:空间基因表达模式分析
至今为止的单细胞重大突破:
- immunophenotype (Stoeckius et al., 2017; Peterson et al., 2017),
- genome sequence (Navin et al., 2011; Vitak et al., 2017),
- lineage origins (Raj et al., 2018; Spanjaard et al., 2018; Alemany et al., 2018),
- DNA methylation landscape (Luo et al., 2018; Kelsey et al., 2017),
- chromatin accessibility (Cao et al., 2018; Lake et al., 2018; Preissl et al., 2018),
- spatial positioning
单细胞数据整合的两大问题:
- how can disparate single-cell datasets, produced across individuals, technologies, and modalities be harmonized into a single reference
- once a reference has been constructed, how can its data and meta-data improve the analysis of new experiments?
These questions are well suited to established fields in statistical learning.
第二个问题就类似reference assembly (Li et al., 2010) and mapping (Langmead et al., 2009) for genomic DNA sequences
identify shared subpopulations across datasets
- canonical correlation analysis (CCA)
- mutual nearest neighbors (MNNs)
第二种整合的问题:
- only a subset of cell types are shared across datasets
- significant technical variation masks shared biological signal.
这篇文章解决了三个问题:
- reference assembly
- transfer learning for transcriptomic, epigenomic, proteomic,
- spatially resolved single-cell data
核心凝练
Through the identification of cell pairwise correspondences between single cells across datasets, termed ‘‘anchors,’’ we can transformdatasets into a shared space, even in the presence of extensive technical and/or biological differences.
This enables the construction of harmonized atlases at the tissue or organismal scale, as well as effective transfer of discrete or continuous data from a reference onto a query dataset.
一些单细胞的常识
false negatives (‘‘drop-outs’’) due to transcript abundance and protocol-specific biases
expression derived from fluorescence in situ hybridization (FISH) exhibits probe-specific noise due to sequence specificity and background binding
结果
Identifying Anchor Correspondences across Single-Cell Datasets
基本的假设:we assume that there are correspondences between datasets and that at least a subset of cells represent a shared biological state.
Constructing Integrated Atlases at the Scale of Organs and Organisms
评估不同工具在整合不同平台和不同subtype数据的准确性
Leveraging Anchor Correspondences to Classify Cell States
开始整合case和control,cell state
Projecting Cellular States across Modalities
整合scATAC-seq
Transferring Continuous and Multimodal Data across Experiments
Predicting Protein Expression in Human Bone Marrow Cells
CITE-seq,预测蛋白表达
Spatial Mapping of Single-Cell Sequencing Data in the Mouse Cortex
小鼠大脑皮层的空间比对
what‘s my problem?
我也早就意识到这是个重要的有价值的问题了,但是孤军奋战,没有真正的提炼这个问题,也没有深入思考和理解,更没有想去利用统计思维来解决这个问题。
可以看到大佬早就看到这个有价值的问题,而且已经召集人马来讨论、思考,用统计学的方法系统的提出了自己的解决方案,也最终凭借自己的实力和名气把结果发表在最顶级的杂志上了。
是什么在阻挠我,让我一直在原地打转?
原文地址:https://www.cnblogs.com/leezx/p/11244731.html