21  Intermediate processing

21.1 Preamble

21.1.1 Introduction

In this demo, we will continue processing the Xenium dataset from a human breast cancer biopsy section collected by Janesick et al. (2023), which has already undergone quality control and minimal filtering; see Chapter 20. The processing steps carried out here – namely, normalization, dimension reduction, and clustering – lay the foundating for a variety of downstream analysis tasks that will be covered in the next chapters. Here, we will run standard non-spatial and spatially-aware approaches; for a more comprehensive overview of methodology and tools, see Chapter 28 on dimension reduction and Chapter 29 on clustering.

21.1.2 Dependencies

Code
# library(scran)
# library(scater)
# library(igraph)
library(Banksy)
library(scrapper)
library(ggplot2)
library(ggspavis)
library(patchwork)
library(OSTA.data)
library(SpatialExperiment)
# set seed for random number generation
# in order to make results reproducible
set.seed(20000229)
# load data from preceding 
# chapter (post quality control)
(spe <- readRDS("img-spe_qc.rds"))
##  class: SpatialExperiment 
##  dim: 313 140268 
##  metadata(1): qc
##  assays(1): counts
##  rownames(313): ABCC11 ACTA2 ... ZEB2 ZNF562
##  rowData names(3): ID Symbol Type
##  colnames(140268): 2 3 ... 167779 167780
##  colData names(12): cell_id transcript_counts ... detected keep
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  spatialCoords names(2) : x_centroid y_centroid
##  imgData names(1): sample_id

Before getting started, we will retrieve the authors’ cell type labels, which were obtained by transferring scFFPE-seq annotations (supervised); these comprise 20 subpopulations.

Code
# get annotations from 'BiocFileCache'
# (data has been retrieved already)
id <- "Xenium_HumanBreast1_Janesick"
pa <- OSTA.data_load(id, mol=FALSE)
dir.create(td <- tempfile())
unzip(pa, "annotation.csv", exdir=td)
df <- read.csv(list.files(td, full.names=TRUE))
# add annotations as cell metadata
cs <- match(spe$cell_id, df$Barcode)
spe$Label <- df$Annotation[cs]

21.2 Normalization

Library size-based normalization, as typically used for scRNA-seq data, has been shown to be problematic for ST data, especially so for targeted panels underlying current commercial imaging-based ST platforms (Atta et al. 2024; Bhuva et al. 2024). For lack of a better approach, we here use standard log-library size normalization. We caution readers, however, to keep an eye out in the literature for attempts to provide a better strategy.

See also Chapter 27.
Code

21.3 Feature selection

At this stage, we would typically perform selection of (e.g., highly variable) features; see Chapter 12. The dataset at hand, however, is targeted and relatively low-plex, so that ‘interesting’ features have already been selected by design (e.g., different targets will be included in immuno-oncology as opposed to neuroscience panels).

21.4 Dimension Reduction

As a baseline, we will perform principal component analysis (PCA), which underlies many standard scRNA-seq analysis pipelines, such as (spatially unaware) graph-based clustering based on a shared nearest neighbor (SNN) graph and the Leiden or Louvain algorithm for community detection.

Code
spe <- runPca.se(spe, features=rownames(spe), number=20)

For comparison, we will also perform spatially-aware dimension reduction with BANKSY (Singhal et al. 2024); see Chapter 28.

Code
spe <- computeBanksy(spe, assay_name="logcounts")
spe <- runBanksyPCA(spe, npcs=20, lambda=0.2)

To not confuse different types of PCs, we rename reducedDims to end in _sp and _tx for spatially aware and unaware results, respectively.

Code
reducedDimNames(spe) <- c("PCA_tx", "PCA_sp")

21.5 Clustering

Here, we perform standard graph-based clustering by (i) constructing a shared nearest neighbor (SNN) graph and (ii) using the Leiden algorithm for community detection. By basing the SNN graph on standard and BANKSY PCs, respectively, we can obtain non-spatial as well as spatially aware assignments:

Code
# PCA-based shared nearest-neighbor (SNN) graph;
# cluster via Leiden community detection algorithm
pcs <- c(Leiden="PCA_tx", Banksy="PCA_sp")
for (k in names(pcs)) {
    spe <- clusterGraph.se(spe, 
        method="leiden", resolution=0.7, 
        output.name=k, reddim.type=pcs[[k]], 
        more.build.args=list(weight.scheme="jaccard"))
}

21.6 Visualization

Let’s visualize the assignment obtains from non-spatial and spatially aware clustering:

Code
spe$in_tissue <- 1; spe$x_centroid <- spe$y_centroid <- NULL
lapply(c("Label", "Leiden", "Banksy"), \(.) {
    plotCoords(spe, annotate=., point_size = 0.1)
}) |>
    wrap_plots(nrow=1) &
    theme(legend.key.size=unit(0, "lines")) &
    scale_color_manual(values=unname(pals::trubetskoy()))

21.7 Appendix

TipFurther reading

For further methodological details, literature, and resources on the analyses performed in this chapter, we refer readers to:

Save data

Code
colLabels(spe) <- spe$Banksy
saveRDS(spe, "img-spe_cl.rds")

References

Atta, Lyla, Kalen Clifton, Manjari Anant, Gohta Aihara, and Jean Fan. 2024. “Gene Count Normalization in Single-Cell Imaging-Based Spatially Resolved Transcriptomics.” Genome Biology 25 (153). https://doi.org/10.1186/s13059-024-03303-w.
Bhuva, Dharmesh D., Chin Wee Tan, Agus Salim, et al. 2024. “Library Size Confounds Biology in Spatial Transcriptomics Data.” Genome Biology 25 (99). https://doi.org/10.1186/s13059-024-03241-7.
Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.” Nature Communications 14 (8353). https://doi.org/10.1038/s41467-023-43458-x.
Singhal, Vipul, Nigel Chou, Joseph Lee, et al. 2024. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.” Nature Genetics 56: 431–41. https://doi.org/10.1038/s41588-024-01664-3.
Back to top