In this demo, we will continue on processing Xenium data on a human breast cancer biopsy section from Janesick et al. (2023), which has already undergone quality control and minimal filtering; see Chapter 17. The processing steps carried out here – namely, normalization, dimension reduction, and clustering – lay the foundating for a variety of downstream analysis tasks that will be covered in the next chapters. Here, we will run standard non-spatial and spatially aware approaches; for a more comprehensive overview of methodology and tools, see Chapter 22 on dimension reduction and Chapter 23 on clustering.
Before getting started, we will retrieve the authors’ cell type labels, which were obtained by transferring scFFPE-seq annotations (supervised); these comprise 20 subpopulations.
Code
# get annotations from 'BiocFileCache'# (data has been retrieved already)id<-"Xenium_HumanBreast1_Janesick"pa<-OSTA.data_load(id)dir.create(td<-tempfile())unzip(pa, "annotation.csv", exdir=td)df<-read.csv(list.files(td, full.names=TRUE))# add annotations as cell metadatacs<-match(spe$cell_id, df$Barcode)spe$Label<-df$Annotation[cs]
18.2 Normalization
Library size-based normalization as is typical for scRNA-seq data has been shown to be problematic for ST data, especially so for targeted panels underlying current commercial imaging-based ST platforms (Atta et al. 2024; Bhuva et al. 2024). For lack of a better approach, we here use standard log-library size normalization. We caution readers, however, to keep an eye out in the literature for attempts to provide a better strategy.
At this stage, we would typically perform selection of (e.g., highly variable) features; see Chapter 10. The dataset at hand, however, is targeted and relatively low-plex, so that features have already been selected by design (e.g., different targets will be included in immuno-oncology and neuroscience panels).
18.4 Dimension Reduction
As a baseline, we will perform principal component analysis (PCA), which underlies many standard scRNA-seq analysis pipelines, such as (spatially unaware) graph-based clustering based on a shared nearest neighbor (SNN) graph and the Leiden or Louvain algorithm for community detection.
Here, we perform standard graph-based clustering by (i) constructing a shared nearest neighbor (SNN) graph and (ii) using the Leiden algorithm for community detection. By basing the SNN graph on standard and BANKSY PCs, respectively, we can obtain non-spatial as well as spatially aware assignments:
Code
pcs<-c(Leiden="PCA_tx", Banksy="PCA_sp")for(.innames(pcs)){# build cellular shared nearest-neighbor (SNN) graphg<-buildSNNGraph(spe, use.dimred=pcs[.], type="jaccard", k=20)# cluster using Leiden community detection algorithmk<-cluster_leiden(g, objective_function="modularity", resolution=0.8)spe[[.]]<-factor(k$membership)}
18.6 Visualization
Let’s visualize the assignment obtains from non-spatial and spatially aware clustering:
Atta, Lyla, Kalen Clifton, Manjari Anant, Gohta Aihara, and Jean Fan. 2024. “Gene Count Normalization in Single-Cell Imaging-Based Spatially Resolved Transcriptomics.”Genome Biology 25 (153). https://doi.org/10.1186/s13059-024-03303-w.
Bhuva, Dharmesh D., Chin Wee Tan, Agus Salim, Claire Marceaux, Marie A. Pickering, Jinjin Chen, Malvika Kharbanda, et al. 2024. “Library Size Confounds Biology in Spatial Transcriptomics Data.”Genome Biology 25 (99). https://doi.org/10.1186/s13059-024-03241-7.
Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, Florian Wagner, Stephen R. Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.”Nature Communications 14 (8353). https://doi.org/10.1038/s41467-023-43458-x.
Singhal, Vipul, Nigel Chou, Joseph Lee, Yifei Yue, Jinyue Liu, Wan Kee Chock, Li Lin, et al. 2024. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.”Nature Genetics 56: 431–41. https://doi.org/10.1038/s41588-024-01664-3.