In this demo, we will analyze 313-plex Xenium data on human breast cancer tissue (Janesick et al. 2023). Following very basic quality control and preprocessing, we will perform both (non-spatial) unsupervised clustering as well as fully supervised label-transfer based on scRNA-seq reference data; then compare the obtained cluster assignments to those provided by the authors.
# compute cell-level QC metricsspe<-addPerCellQCMetrics(spe)# identify low-quality cells by thresholding on# median absolute deviation (MAD) from the medianol<-perCellQCFilters(spe)# tabulate # and % of cells discarded # due to few counts/detected featuresdata.frame( check.names=FALSE, `#`=apply(ol, 2, sum), `%`=round(100*apply(ol, 2, mean), 2))
Before proceeding to exclude any cells from downstream analyses, let’s first visualize the cells deemed to be of low quality in space alongside the underlying quality control metrics (total counts and detected features):
Let’s visualize the expression of some genes in space; e.g., PIGR, IGHG3 and CEACAM6, which should mark epithelial, plasma and tumor cells, respectively:
# shared nearest-neighbor (SNN) graph based on # cell-to-cell Jaccard similarity in PC spaceg<-buildSNNGraph(sub, use.dimred="PCA", type="jaccard", BPPARAM=bp)# community detection using Leiden algorithmk<-cluster_leiden(g, objective_function="modularity", resolution=0.7)table(sub$Leiden<-factor(.<-k$membership, labels=letters[seq_along(unique(.))]))
##
## a b c d e f g h i j
## 5855 3538 30354 4101 7646 11538 7404 3883 4705 9839
22.4.2 Supervised
For comparison, we annotate the Xenium data using a label transfer approach, SingleR, which relies on labeled scRNA-seq data to compute references profiles and transfers labels based on the rank correlation between observed (here, Xenium) and reference (scRNA-seq) expression profiles.
First, we retrieve a matching (Chromium) scRNA-seq dataset, which includes low- (Level1) and high-resolution (Level2) annotations of cells into 9 and 31 subpopulations, respectively:
Here, we run SingleR using high-res. annotations (Level2) and with argument aggr.ref=TRUE, such that reference profiles will be aggregated (per cluster) prior to annotation. In this way, every single Xenium cell will be assigned a label based on which pseudo-bulk scRNA-seq profile represents the best match.
Note that we filter the reference data to contain only cells from the same patient. This is not strictly necessary, assuming that clusters are transcriptionally stable across patients, but is done here to reduce runtime.
Code
# exclude cells deemed to be of low-qualitysce<-sce[, sce$QCFilter=="Keep"]# subset cells from same patientsce<-sce[, grepl("P2", sce$Patient)]# realize count matrixassay(sce)<-as(assay(sce), "dgCMatrix")# log-library size normalizationsce<-logNormCounts(sce)# restrict to Xenium targetssce<-sce[rowData(sce)$ID%in%rowData(sub)$ID, ]# set gene symbols as feature namesrownames(sce)<-rowData(sce)$Symbol# perform label transfer at the single cell-level,# using pseudo-bulk Chromium profiles as referenceres<-SingleR( test=sub, ref=sce, labels=sce$Level2, aggr.ref=TRUE, BPPARAM=bp)sub$Level2<-factor(res$pruned.labels)
Based on these predictions, we can also propagate low-res. annotations (Level1):
Simplifying further, we can group cells into different compartments, namely, (malignant) tumor, immune, epithelial and stromal cells; we’ll see below that visualizing cells in this way nicely captures the general tissue structure.
Tabulating the cluster assignments between Leiden (unsupervised) and SingleR (supervised), we can observe overall high concordance; i.e., most clusters have a one-to-one mapping between both approaches. However, some subpopulations are split between clusters; e.g., cells labeled as cluster fibroblasts, endothelia and smooth muscle cells by SingleR tend to intermix in the Leiden clusters. This is not unexpected, given that these are all stromal subpopulations with comparatively similar transcriptional profiles. (Note that we are observing a mere fraction of the whole transcriptome with the Xenium panel employed here.)
Code
# contingency table & number of clustersround(100*prop.table(table(sub$Level1, sub$Leiden), 2), 1)
Below, we test for differential expression between Level1 clusters, and visualize selected markers as a heatmap of (z-scaled) average expression. It’s comforting to see that we pick up on many classics, e.g., endothelia are marked by VWF and PECAM1, T cells by CD2 and TRAC, etc.
Code
# test for differential expression between clustersmgs<-findMarkers(sub, groups=sub$Level1, direction="up")# select top-ranked genes for every clustertop<-unique(unlist(lapply(mgs, \(df)rownames(df)[df$Top<=3])))plotGroupedHeatmap(sub, features=top, group="Level1", scale=TRUE, center=TRUE, fontsize=6)
References
Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, Florian Wagner, Stephen R. Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.”Nature Communications 14 (1): 8353.