30 Differential spatial patterns

30.1 Preamble

30.1.1 Introduction

Recent advances in spatial transcriptomics (ST) techniques have facilitated the generation of multi-sample datasets across various experimental conditions (e.g. healthy vs. diseased, or multiple treatments).

When multi-sample, multi-condition ST data are available, differential analysis can be performed to identify genes with distinct spatial expression patterns that change between conditions either across domains or within any domain; we call these differential spatial pattern (DSP) genes. A DSP gene may be spatially variable in one condition and display spatially uniform expression in another one, or it could be spatially variable in all conditions, but with distinct spatial expression patterns. Conversely, genes that show spatially uniform abundance, or that are spatially variable with the same spatial structure in all conditions, are not considered DSP, because their spatial distribution is unchanged. Another way to think about DSP is that it is the spatial analog of ‘differential state analysis’ for single cell analyses. That is, here we focus on differences in expression within any domain (or different across domains) between conditions.

Identifying DSP genes can provide valuable insights into how gene expression spatial structures change across experimental conditions, helping to understand the underlying biological processes.

30.1.2 Dependencies

Code

library(muSpaData)
library(ggspavis)
library(DESpace)
library(patchwork)

30.1.3 Load data

In this demo, we will analyze a Stereo-seq dataset from axolotl brain tissues collected at various stages of regeneration (Wei et al. 2022), which includes multiple samples (i.e. serial sections) measured under multiple experimental conditions (regeneration stages).

In the original dataset, which is publicly available through the Spatial Transcript Omics DataBase, there are 5 stages with multiple sections, totaling 16 samples (3 to 4 sections per stage).

In this chapter, for computational reasons, we use 3 stages – 2, 10, and 20 days post injury (DPI) – and only two sections per stage. The analysis framework uses pre-computed spatial domains generated with Banksy (Singhal et al. 2024). For performing clustering with Banksy, refer to Chapter 23. For preprocessing details of this dataset, see the muSpaData Bioconductor package.

Code

spe <- Wei22_example()

Code

colnames(spatialCoords(spe)) <- c("x", "y")
plotCoords(spe, 
    annotate = "Banksy_smooth", 
    in_tissue = NULL, 
    x_coord = "sdimx", 
    y_coord = "sdimy", 
    y_reverse = FALSE,
    sample_id = "sample_id") + 
    theme(legend.key.size = unit(0, "lines")) + 
    scale_color_manual(values=unname(pals::trubetskoy()))

30.2 DESpace

Here, we briefly demonstrate how to identify DSP genes using the DESpace (Cai, Robinson, and Tiberi 2024) Bioconductor package. For more details, see the package vignette. In brief, spot- or cell-level counts are aggregated into pseudobulk counts for each sample and domain, and these are compared across domains and conditions. DSP focuses on testing whether (condition x domain) interaction terms in the model are different from zero. In other words, does the expression of a gene change between conditions, but change differently for some domains compared to others?

Code

# run DESpace
dsp <- dsp_test(spe = spe, cluster_col = "Banksy_smooth", 
                sample_col = "sample_id", condition_col = "condition", 
                verbose = TRUE)

Code

head(dsp_global <- dsp$gene_results, n = 3)

##                        gene_id logFC.condition20DPI.cluster_id1
##  AMEX60DD014721 AMEX60DD014721                      -0.07170723
##  AMEX60DD045083 AMEX60DD045083                       0.08759059
##  AMEX60DD011151 AMEX60DD011151                      -0.97377297
##                 logFC.condition2DPI.cluster_id1
##  AMEX60DD014721                      -0.3898784
##  AMEX60DD045083                       0.5418637
##  AMEX60DD011151                      -2.1019048
##                 logFC.condition20DPI.cluster_id2
##  AMEX60DD014721                       -0.8510223
##  AMEX60DD045083                       -1.3377605
##  AMEX60DD011151                        0.1861520
##                 logFC.condition2DPI.cluster_id2
##  AMEX60DD014721                       0.8526301
##  AMEX60DD045083                       1.1226997
##  AMEX60DD011151                       0.4671347
##                 logFC.condition20DPI.cluster_id3
##  AMEX60DD014721                       -0.8328237
##  AMEX60DD045083                        0.2478193
##  AMEX60DD011151                       -0.4011555
##                 logFC.condition2DPI.cluster_id3
##  AMEX60DD014721                      -0.9450830
##  AMEX60DD045083                       1.2666344
##  AMEX60DD011151                       0.4086612
##                 logFC.condition20DPI.cluster_id4
##  AMEX60DD014721                       0.22287797
##  AMEX60DD045083                      -0.99648014
##  AMEX60DD011151                      -0.01634095
##                 logFC.condition2DPI.cluster_id4   logCPM        LR
##  AMEX60DD014721                       0.2085777 9.344907 100.69288
##  AMEX60DD045083                      -0.7969334 7.505402  97.72697
##  AMEX60DD011151                       0.1810959 7.959492  85.69149
##                       PValue          FDR
##  AMEX60DD014721 3.081052e-18 1.540526e-14
##  AMEX60DD045083 1.243325e-17 3.108313e-14
##  AMEX60DD011151 3.472538e-15 5.787563e-12

Code

# count significant DSP genes (at 5% FDR significance level)
table(dsp_global$FDR <= 0.05)

##  
##  FALSE  TRUE 
##   4809   191

In order to identify the key spatial cluster (domain) where expression changes across conditions, we use the individual_dsp() function, which focuses on domain-specific DSP.

Code

dsp_clu <- individual_dsp(spe, cluster_col = "Banksy_smooth", 
                          sample_col = "sample_id", condition_col = "condition")

Code

# results for cluster 2
dsp_clu2 <- dsp_clu$`2`
head(dsp_clu2, n = 3)

##                        gene_id logFC.condition20DPI.cluster_id2
##  AMEX60DD014721 AMEX60DD014721                       -0.5801351
##  AMEX60DD014991 AMEX60DD014991                        2.6944195
##  AMEX60DD055246 AMEX60DD055246                       -0.2660938
##                 logFC.condition2DPI.cluster_id2   logCPM       LR
##  AMEX60DD014721                        1.270051 9.582333 74.62550
##  AMEX60DD014991                        3.015258 7.330327 73.65886
##  AMEX60DD055246                       -2.298605 5.661946 61.49402
##                       PValue          FDR
##  AMEX60DD014721 6.241352e-17 2.530002e-13
##  AMEX60DD014991 1.012001e-16 2.530002e-13
##  AMEX60DD055246 4.433456e-14 7.389093e-11

Code

# top DSP gene for cluster 2
top_dsp <- dsp_clu2$gene_id[1]

# get gene symbols from Ensembl identifiers
idx <- match(top_dsp, rowData(spe)$gene_name)
(.gs <- rowData(spe)$gene_id[idx])

##  [1] "ECM1"

30.3 Visualization

DSP genes can be further investigated, for example, by plotting the spatial expression of genes across conditions, or the log2-transformed counts per million of genes across different clusters and conditions.

Code

.ids <- levels(spe$sample_id)
ps <- lapply(seq_along(.ids), \(.) {
  spe_j <- spe[, spe$sample_id == .ids[.]]
  p <- FeaturePlot(spe_j, 
                   feature = top_dsp,
                   platform = "Stereo-seq",
                   coordinates = c("sdimx", "sdimy"),
                   cluster_col = "Banksy_smooth", 
                   cluster = '2', 
                   diverging = TRUE,
                   low = "gray95", 
                   high = "blue") +
    ggtitle(.ids[.]) 
  # ignore alignment
  free(p[[1]]) 
})
wrap_plots(ps, ncol=3) & theme(
  plot.title = element_text(margin = margin(b = 100), hjust = 0.5))

Code

# calculate log CPM
cps <- edgeR::cpm(dsp$estimated_y, log = TRUE)
cps_name <- colnames(cps)
mdata <- data.frame(
    log_cpm = cps[top_dsp, ], 
    Banksy_smooth = factor(sub(".*_", "", cps_name)), 
    day = as.numeric(sub("([0-9]+)DPI.*", "\\1", cps_name)), 
    sample_id = sub("(_[0-9]+)$", "", cps_name)
)

ggplot(mdata, aes(x = factor(day), y = log_cpm)) + 
    geom_jitter(aes(color = Banksy_smooth), size = 2, width = 0.1) + 
    geom_boxplot(aes(fill = ifelse(Banksy_smooth == "2", 
                                   "cluster 2", "non-cluster 2"))) + 
    scale_x_discrete(breaks = c(2, 10, 20)) + 
    scale_fill_manual(values = c("#4DAF4A", "gray")) + 
    labs(title = .gs, x = "Days post injury", 
         y = "log2 counts per million (logCPM)", fill = "", 
         color = "Banksy cluster") + 
     theme(legend.position = "right")

The boxplots above show the average log-CPM for cluster 2 and for all other clusters (excluding cluster 2) across different time points. The largest difference between cluster 2 and the remaining clusters is observed at 2 DPI, with the difference decreasing over time. At 20 DPI, no significant log-CPM differences are observed between the clusters. This suggests that spatial patterns change across conditions (i.e. time stages).

30.4 Appendix

References

Cai, Peiying, Mark D Robinson, and Simone Tiberi. 2024. “DESpace: Spatially Variable Gene Detection via Differential Expression Testing of Spatial Clusters.” Bioinformatics 40 (2): btae027.

Singhal, Vipul, Nigel Chou, Joseph Lee, Yifei Yue, Jinyue Liu, Wan Kee Chock, Li Lin, et al. 2024. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.” Nature Genetics 56 (3): 431–41.

Wei, Xiaoyu, Sulei Fu, Hanbo Li, Yang Liu, Shuai Wang, Weimin Feng, Yunzhi Yang, et al. 2022. “Single-Cell Stereo-Seq Reveals Induced Progenitor Cells Involved in Axolotl Brain Regeneration.” Science 377: 1062. https://doi.org/10.1126/science.abp9444.