27  Spatially variable genes

27.1 Preamble

27.1.1 Introduction

Unsupervised clustering and label-transfer approaches yield (discrete, non-overlapping) grouping of cells by transcriptional – and presumably functional – similarity. Identifying differentially expressed genes (DEGs), i.e., genes that are up-/down-regulated in one or few subpopulation(s), helps characterize clusters (and, in an unsupervised setting, find biologically meaningful cluster labels).

By contrast, methods to identify spatially variable genes (SVGs) aim to find genes with spatially correlated patterns of expression. Feature selection in terms of SVGs can be used as a spatially-aware alternative to mean-variance relationship-based HVGs (c.f. Chapter 9), or to identify biologically informative genes as candidates for experimental follow-up.

Here, we demonstrate selected methods to identify SVGs de novo (using nnSVG), and based on pre-computed spatial clusters (using DESpace). We further compare these to DEGs as well as HVGs, and discuss the conceptual similarities and differences between genes identified through these different approaches.

27.1.2 Dependencies

Code
library(DESpace)
library(dplyr)
library(ggspavis)
library(nnSVG)
library(patchwork)
library(scater)
library(scran)
library(SpatialExperiment)
library(tidyr)
# set seed for random number generation
# in order to make results reproducible
set.seed(123)
# load data from previous chapters
# (post quality control & clustering)
(spe <- readRDS("seq-spe_cl.rds"))
##  class: SpatialExperiment 
##  dim: 33525 4992 
##  metadata(0):
##  assays(2): counts logcounts
##  rownames(33525): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##    ENSG00000268674
##  rowData names(3): gene_id gene_name feature_type
##  colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##    TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
##  colData names(11): barcode_id sample_id ... BayesSpace label
##  reducedDimNames(2): PCA UMAP
##  mainExpName: NULL
##  altExpNames(0):
##  spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
##  imgData names(1): sample_id

27.2 Non-spatial

We can also use methods originally developed for bulk and single-cell transcriptomics, such as those for detecting highly variable genes (HVGs) and differentially expressed genes (DEGs), on spatial omics data. Note that HVGs are defined only based on molecular features (i.e., gene expression), and do not incorporate spatial information. In the context of single-cell and ST, differentially expressed (DE) may refer to differences between groups of samples, or to differences between clusters (e.g., cell types, or spatial domains); below, we consider changes across spatial structures.

27.2.1 Highly variable genes (HVGs)

For a comprehensive discussion on HVG selection, we refer readers to OSCA. Below, we take a standard approach for selecting HVGs. For more details and examples, see Chapter 9.

Code
# fit mean-variance relationship
dec <- modelGeneVar(spe)

27.2.2 Differentially expressed genes (DEGs)

For details on identifying DE genes between groups of cells from single-cell RNA-seq data, we refer readers to OSCA. Below, we show a potential approach, using marker detection methods on ST data, to identify DE genes between spatial clusters.

Code
# differential gene expression analysis
mgs <- findMarkers(spe, groups=spe$BayesSpace, direction="up")
# select for a few markers per cluster
deg <- lapply(mgs, \(df) rownames(df)[df$Top <= 3])
length(deg <- unique(unlist(deg)))
##  [1] 67

27.3 Spatially-aware

Here, we demonstrate brief examples of how to identify a set of top SVGs using (nnSVG (Weber et al. 2023) and DESpace (Cai, Robinson, and Tiberi 2024)). These methods are available through Bioconductor and can be easily integrated into Bioconductor-based workflows.

27.3.1 Spatially-variable genes (SVGs)

SVGs are usually identified integrating gene expression measurements with spatial coordinates, either with or without predefined spatial domains. SVGs approaches can be broadly categorized into three types: overall SVGs, spatial domain-specific SVGs, and cell type-specific SVGs (Yan, Hua, and Li 2024).

The detection of overall SVGs is sometimes used as a feature selection step for further downstream analyses such as spatially-aware clustering (see Chapter 26). Several methods have been proposed for detecting overall SVGs; below, we report some few notable examples:

Spatial domain-specific SVGs target changes in gene expression between spatial domains, which are used to summarize the whole spatial information. Genes displaying expression changes across spatial clusters indicate SVGs. Spatial domains can be predefined based on morphology knowledge, or identified through spatially-aware clustering approaches such as BayesSpace (Zhao et al. 2021) and Banksy (Singhal et al. 2024). Methods that belong to this category include DESpace (Cai, Robinson, and Tiberi 2024) in R, and SpaGCN (Hu et al. 2021) in Python.

Cell type-specific SVG methods leverage external cell type annotations to identify SVGs within cell types. They analyze interaction effects between cell types and spatial coordinates (Yan, Hua, and Li 2024). Examplary methods in R include CTSV (J. Yu and Luo 2022), C-SIDE (implemented in spacexr; (Cable et al. 2022)), and spVC (S. Yu and Li 2024).

27.3.2 nnSVG

In this example, we use a small subset of the dataset for faster runtime. We select a subset of the data, by subsampling the set of spots and including stringent filtering for lowly expressed genes.

A full analysis with nnSVG using all spots for this dataset and default filtering parameters for an individual Visium sample from human brain tissue (available from spatialLIBD) takes around 45 minutes on a standard laptop.
Code
# # set random seed for number generation
# # in order to make results reproducible
# set.seed(123)
# # sample 100 spots to decrease runtime in this demo
# # (note: skip this step in full analysis)
# n <- 100
# sub <- spe[, sample(ncol(spe), 100)]
# # filter lowly expressed genes using stringent
# # criteria to decrease runtime in this demo
# # (note: use default criteria in full analysis)
# sub <- filter_genes(sub,     # filter for genes with...
#     filter_genes_ncounts=10, # at least 10 counts in
#     filter_genes_pcspots=3)  # at least 3% of spots
# # re-normalize counts post-filtering
# sub <- logNormCounts(sub)
# 
# # run nnSVG
# set.seed(123)
# sub <- nnSVG(sub)
# 
# # extract gene-level results
# res_nnSVG <- rowData(sub)
# # show results
# head(res_nnSVG, 3)
# # count significant SVGs (at 5% FDR significance level)
# table(res_nnSVG$padj <= 0.05)
# # identify the top-ranked SVGs
# res_nnSVG$gene_name[res_nnSVG$rank == 1]
# # store the top 7 SVGs
# top_nnSVG <- res_nnSVG$gene_name[res_nnSVG$rank < 7.5]

27.3.3 DESpace

DESapce relies on pre-computed spatial domains to summarize the primary spatial structures of the data; see Chapter 26 on clustering.

Code
plotSpots(spe, 
    annotate="BayesSpace") +
    theme(legend.key.size=unit(0, "lines")) +
    scale_color_manual(values=unname(pals::trubetskoy()))

Code
# run DESpace
res <- DESpace_test(spe, spatial_cluster="BayesSpace")
##  using 'DESpace_test' for spatial gene/pattern detection.
##  Filter low quality genes:
##  min_counts = 20; min_non_zero_spots = 10.
##  The number of genes that pass filtering is 15415.
##  single sample test
Code
head(res_DESpace <- res$gene_results)
##                          gene_id       LR   logCPM PValue FDR
##  ENSG00000162545 ENSG00000162545 3931.811 11.34354      0   0
##  ENSG00000115756 ENSG00000115756 1659.035 10.21759      0   0
##  ENSG00000173267 ENSG00000173267 1906.305 10.43305      0   0
##  ENSG00000176884 ENSG00000176884 1651.826 10.70112      0   0
##  ENSG00000111716 ENSG00000111716 2262.749 10.80120      0   0
##  ENSG00000149557 ENSG00000149557 1667.483 10.38032      0   0
Code
# count significant SVGs (at 5% FDR significance level)
table(res_DESpace$FDR <= 0.05)
##  
##  FALSE  TRUE 
##   2833 12582

27.4 Downstream analyses

The set of top SVGs may be further investigated, e.g. by plotting the spatial expression of several top genes and via gene pathway analyses (i.e., comparing significant SVGs with known gene sets associated with specific biological functions).

Code
top_HVGs <- getTopHVGs(dec, n=(n <- 6))
top_DEGs <- lapply(mgs, \(df) rownames(df)[df$Top == 1])
top_DEGs <- unique(unlist(top_DEGs))[seq_len(6)]
top_DESpace <- res_DESpace$gene_id[seq_len(6)]

27.4.1 Visualization

To visualize the expression levels of selected genes in spatial coordinates on the tissue slide, we can use plotting functions from the ggspavis package.

Code
# get top SVGs from each method
gs <- list(
    HVGs=top_HVGs, 
    DEGs=top_DEGs, 
    DESpace=top_DESpace)
# get gene symbols from ensembl identifiers
idx <- match(unlist(gs), rowData(spe)$gene_id)
.gs <- rowData(spe)$gene_name[idx]
# expression plots for each top gene
ps <- lapply(seq_along(.gs), \(.) {
    plotSpots(spe, 
        point_size=0,
        annotate=.gs[.], 
        assay_name="logcounts", 
        feature_names="gene_name") + 
        if (. %% 6 == 1) list(
            ylab(names(gs)[ceiling(./6)]), 
            theme(axis.title.y=element_text()))
}) 
wrap_plots(ps, nrow=3) & theme(
    legend.key.width=unit(0.4, "lines"),
    legend.key.height=unit(0.8, "lines")) &
    scale_color_gradientn(colors = pals::parula())

27.4.2 Comparison

We can compare the ranks of the genes detected by each method and compute pairwise correlations, highlighting two known cortical layer-associated SVGs: MOBP and SNAP25. While all four methods - HVGs, DEGs, SVGs identified through nnSVG and DESpace - yield similar gene ranks, each method captures distinct aspects of gene expression, with varying pairwise correlations between them.

Code
# subset data for the 113 genes nnSVG is based on
# sub_gene <- rowData(sub)$gene_id
# sub_DESpace <- res_DESpace[sub_gene, ] 
# sub_HVG <- as.data.frame(dec[sub_gene, ])
# 
# # aggregate DEGs for each spatial domain
# subset_mgs <- lapply(mgs, \(x) x[sub_gene, 1:3] |> as.data.frame())
# sub_DEG <- do.call(cbind, subset_mgs) 
# top_cols <- sub_DEG |> select(ends_with(".Top"))
# sub_DEG$Top <- do.call(pmin, c(top_cols, na.rm = TRUE))
# 
# # compute ranks 
# sub_DESpace$rank <- rank(sub_DESpace$FDR, ties.method = "first")
# sub_HVG$rank <- rank(-1 * sub_HVG$bio, ties.method = "first")
# sub_DEG$rank <- rank(sub_DEG$Top, ties.method = "first")
# 
# # combine 'rank' column for each method
# res_all <- list(
#     nnSVG = as.data.frame(res_nnSVG),
#     DESpace = sub_DESpace, HVGs = sub_HVG, DEGs = sub_DEG
# )
# rank_all <- do.call(cbind, lapply(res_all, `[[`, "rank"))
# rank_all <- data.frame(gene_id = sub_gene, rank_all)
Code
# known SVGs for this data
# known_genes <- c("MOBP", "SNAP25")
# # convert data structure
# rank_long <- rank_all |>
#     left_join(data.frame(rowData(sub)[, c("gene_id", "gene_name")]), by = "gene_id") |>
#     select(all_of(c("gene_name", method_names))) |>
#     pivot_longer(cols = c(nnSVG, DESpace, HVGs, DEGs), 
#                  names_to = "method", 
#                  values_to = "rank")
# # all pairwise comparisons
# df_pairs <- t(combn(method_names, 2)) |> as.data.frame()
# # function to plot each pairwise comparison
# plot_pairwise_comparison <- function(m1, m2, df) {
#   # filter the data for the two methods being compared
#   df <- df |>
#       filter(method %in% c(m1, m2)) |>
#       pivot_wider(names_from = method, values_from = rank)
#   # compute pearson correlation between methods
#   cor_val <- cor(df[[m1]], df[[m2]], method = "spearman")
#   # plot
#   ggplot(df, aes(x = .data[[m1]], y = .data[[m2]])) +
#       geom_point() + 
#       geom_text_repel(data = df %>% filter(gene_name %in% known_genes), 
#                       aes(label = gene_name), color = "red", size = 3.25, 
#                       nudge_x = 50, nudge_y = 10, box.padding = 0.5) +
#       labs(x = paste(m1, "rank"), y = paste(m2, "rank"),
#            title = paste(m2, "vs.", m1, ": Cor = ", round(cor_val, 2))) +
#       #scale_color_manual(values = c("darkorange", "firebrick3", "deepskyblue2")) +
#       theme_bw() + coord_fixed() + 
#       xlim(c(0, 120)) + ylim(c(0, 120))
# }
# 
# # generate and display all pairwise comparison plots
# plots <- lapply(seq_len(nrow(df_pairs)), \(i) {
#     plot_pairwise_comparison(df_pairs[i, 2], df_pairs[i, 1],rank_long)
# })
# 
# wrap_plots(plots, ncol = 3)

27.5 Appendix

Reviews

  • Das Adhikari et al. (2024)
  • Yan, Hua, and Li (2024)

Benchmarks

  • Li et al. (2023) compares 14 methods using 60 simulated datasets, generated using four different strategies, 12 experimental ST datasets, and 3 spatial ATAC-seq datasets.
  • Chen, Kim, and Yang (2024) compares 8 methods based on 22 experimental datasets (8 technologies, 6 tissue types) and 9 datasets simulated using scDesign3 (Song et al. 2024).

References

Cable, Dylan M, Evan Murray, Vignesh Shanmugam, Simon Zhang, Luli S Zou, Michael Diao, Haiqi Chen, Evan Z Macosko, Rafael A Irizarry, and Fei Chen. 2022. “Cell Type-Specific Inference of Differential Expression in Spatial Transcriptomics.” Nature Methods, 1–12.
Cai, Peiying, Mark D Robinson, and Simone Tiberi. 2024. DESpace: Spatially Variable Gene Detection via Differential Expression Testing of Spatial Clusters.” Bioinformatics 40 (2): btae027.
Chen, Carissa, Hani Jieun Kim, and Pengyi Yang. 2024. “Evaluating Spatially Variable Gene Detection Methods for Spatial Transcriptomics Data.” Genome Biology 25 (1): 18.
Das Adhikari, Sikta, Jiaxin Yang, Jianrong Wang, and Yuehua Cui. 2024. “Recent Advances in Spatially Variable Gene Detection in Spatial Transcriptomics.” Computational and Structural Biotechnology Journal 23: 883–91.
Hu, Jian, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara, and Mingyao Li. 2021. SpaGCN: Integrating Gene Expression, Spatial Location and Histology to Identify Spatial Domains and Spatially Variable Genes by Graph Convolutional Network.” Nature Methods 18: 1342–51. https://doi.org/10.1038/s41592-021-01255-8.
Li, Zhijian, Zain M Patel, Dongyuan Song, Guanao Yan, Jingyi Jessica Li, and Luca Pinello. 2023. “Benchmarking Computational Methods to Identify Spatially Variable Genes and Peaks.” bioRxiv, 2023.12.02.569717.
Singhal, Vipul, Nigel Chou, Joseph Lee, Yifei Yue, Jinyue Liu, Wan Kee Chock, Li Lin, et al. 2024. BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.” Nature Genetics 56 (3): 431–41.
Song, Dongyuan, Qingyang Wang, Guanao Yan, Tianyang Liu, Tianyi Sun, and Jingyi Jessica Li. 2024. scDesign3 Generates Realistic in Silico Data for Multimodal Single-Cell and Spatial Omics.” Nature Biotechnology 42 (2): 247–52.
Sun, Shiquan, Jiaqiang Zhu, and Xiang Zhou. 2020. “Statistical Analysis of Spatial Expression Patterns for Spatially Resolved Transcriptomic Studies.” Nature Methods 17: 193–200. https://doi.org/10.1038/s41592-019-0701-7.
Svensson, Valentine, Sarah A. Teichmann, and Oliver Stegle. 2018. SpatialDE: Identification of Spatially Variable Genes.” Nature Methods 15: 343–46. https://doi.org/10.1038/nmeth.4636.
Weber, Lukas M., Arkajyoti Saha, Abhirup Datta, Kasper D. Hansen, and Stephanie C. Hicks. 2023. nnSVG for the Scalable Identification of Spatially Variable Genes Using Nearest-Neighbor Gaussian Processes.” Nature Communications 14: 4059. https://doi.org/10.1038/s41467-023-39748-z.
Yan, Guanao, Shuo Harper Hua, and Jingyi Jessica Li. 2024. “Categorization of 33 Computational Methods to Detect Spatially Variable Genes from Spatially Resolved Transcriptomics Data.” arXiv, May.
Yu, Jinge, and Xiangyu Luo. 2022. “Identification of Cell-Type-Specific Spatially Variable Genes Accounting for Excess Zeros.” Bioinformatics 38 (17): 4135–44.
Yu, Shan, and Wei Vivian Li. 2024. spVC for the Detection and Interpretation of Spatial Gene Expression Variation.” Genome Biology 25 (1): 103.
Zhao, Edward, Matthew R. Stone, Xing Ren, Jamie Guenthoer, Kimberly S. Smythe, Thomas Pulliam, Stephen R. Williams, et al. 2021. “Spatial Transcriptomics at Subspot Resolution with BayesSpace.” Nature Biotechnology 39: 1375–84. https://doi.org/10.1038/s41587-021-00935-2.
Zhu, Jiaqiang, Shiquan Sun, and Xiang Zhou. 2021. SPARK-X: Non-Parametric Modeling Enables Scalable and Robust Detection of Spatial Expression Patterns for Large Spatial Transcriptomic Studies.” Genome Biology 22: 184. https://doi.org/10.1186/s13059-021-02404-0.
Back to top