18  Processing

18.1 Preamble

18.1.1 Introduction

In this demo, we will continue on processing Xenium data on a human breast cancer biopsy section from Janesick et al. (2023), which has already undergone quality control and minimal filtering; see Chapter 17. The processing steps carried out here – namely, normalization, dimension reduction, and clustering – lay the foundating for a variety of downstream analysis tasks that will be covered in the next chapters. Here, we will run standard non-spatial and spatially aware approaches; for a more comprehensive overview of methodology and tools, see Chapter 25 on dimension reduction and Chapter 26 on clustering.

18.1.2 Dependencies

# set seed for random number generation
# in order to make results reproducible
# load data from preceding 
# chapter (post quality control)
(spe <- readRDS("img-spe_qc.rds"))
##  class: SpatialExperiment 
##  dim: 313 160154 
##  metadata(5): experiment.xenium transcripts cell_boundaries
##    nucleus_boundaries technology
##  assays(1): counts
##  rownames(313): ABCC11 ACTA2 ... ZEB2 ZNF562
##  rowData names(3): ID Symbol Type
##  colnames(160154): 1 2 ... 167779 167780
##  colData names(41): cell_id transcript_counts ... is_fscore_outlier
##    filter_out
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(4): NegControlProbe NegControlCodeword antisense BLANK
##  spatialCoords names(2) : x_centroid y_centroid
##  imgData names(1): sample_id

Before getting started, we will retrieve the authors’ cell type labels, which were obtained by transferring scFFPE-seq annotations (supervised); these comprise 20 subpopulations.

# get annotations from 'BiocFileCache'
# (data has been retrieved already)
id <- "Xenium_HumanBreast1_Janesick"
pa <- OSTA.data_load(id)
dir.create(td <- tempfile())
unzip(pa, "annotation.csv", exdir=td)
df <- read.csv(list.files(td, full.names=TRUE))
# add annotations as cell metadata
cs <- match(spe$cell_id, df$Barcode)
spe$Label <- df$Annotation[cs]

18.2 Normalization

Library size-based normalization as is typical for scRNA-seq data has been shown to be problematic for ST data, especially so for targeted panels underlying current commercial imaging-based ST platforms Bhuva et al. (2024). For lack of a better approach, we here use standard log-library size normalization. We caution readers, however, to keep an eye out in the literature for attempts to provide a better strategy.

spe <- logNormCounts(spe)

18.3 Feature selection

At this stage, we would typically perform selection of (e.g., highly variable) features; see Chapter 9. The dataset at hand, however, is targeted and relatively low-plex, so that features have already been selected by design (e.g., different targets will be included in immuno-oncology and neuroscience panels).

18.4 Dimension Reduction

As a baseline, we will perform principal component analysis (PCA), which underlies many standard scRNA-seq analysis pipelines, such as (spatially unaware) graph-based clustering based on a shared nearest neighbor (SNN) graph and the Leiden or Louvain algorithm for community detection.

spe <- runPCA(spe, ncomponents=20)

For comparison, we will also perform spatially-aware dimension reduction with BANKSY (Singhal et al. 2024); see Chapter 25.

spe <- computeBanksy(spe, assay_name="logcounts")
spe <- runBanksyPCA(spe, npcs=20, lambda=0.2)

To not confuse different types of PCs, we rename reducedDims to end in _sp and _tx for spatially aware and unaware results, respectively.

reducedDimNames(spe) <- c("PCA_tx", "PCA_sp")

18.5 Clustering

Here, we perform standard graph-based clustering by (i) constructing a shared nearest neighbor (SNN) graph and (ii) using the Leiden algorithm for community detection. By basing the SNN graph on standard and BANKSY PCs, respectively, we can obtain non-spatial as well as spatially aware assignments:

pcs <- c(Leiden="PCA_tx", Banksy="PCA_sp")
for (. in names(pcs)) {
    # build cellular shared nearest-neighbor (SNN) graph
    g <- buildSNNGraph(spe, use.dimred=pcs[.], type="jaccard", k=20)
    # cluster using Leiden community detection algorithm
    k <- cluster_leiden(g, objective_function="modularity", resolution=0.8)
    spe[[.]] <- factor(k$membership)

18.6 Visualization

Let’s visualize the assignment obtains from non-spatial and spatially aware clustering:

# TODO: 'ggspavis' is being a bit pedantic...
spe$in_tissue <- 1; spe$x_centroid <- spe$y_centroid <- NULL
lapply(c("Label", "Leiden", "Banksy"), \(.) {
    plt <- plotSpots(spe, annotate=.)
    plt$layers[[1]]$aes_params$stroke <- 0
    plt$layers[[1]]$aes_params$size <- 0.2
}) |>
    wrap_plots(nrow=1) &
    theme(legend.key.size=unit(0, "lines")) &

18.7 Appendix


colLabels(spe) <- spe$Banksy
saveRDS(spe, "img-spe_cl.rds")


