29 Clustering & annotation

Code

knitr::opts_chunk$set(eval=FALSE)

29.1 Preamble

29.1.1 Introduction

In spatial transcriptomics data, we can apply clustering algorithms to identify spatial domains, which represent spatially defined regions consisting of relatively consistent gene expression profiles. For example, spatial domains may consist of regions containing cells from a single cell type or a consistent mixture of cell types.

Several alternative approaches exist for identifying spatial domains. One strategy involves applying standard clustering algorithms from single-cell workflows without incorporating spatial information. Alternatively, spatial information can be integrated at various stages of the workflow. For example, spatial data can inform preprocessing steps, such as selecting spatially variable genes (SVGs), followed by the application of either non-spatial or spatial clustering algorithms. Another widely used strategy involves generating a latent space that incorporates spatial coordinates, often leveraging Bayesian framework or neural networks.

As advancements in spatial transcriptomics techniques enhance tissue resolution, computational tradeoffs between various approaches and parameters become increasingly relevant.

It is also important to keep in mind that when we use clustering to define cell types and/or states, these can be defined at various resolutions (or even on a continuum). The optimal number of clusters depends on the biological context – in particular, there is no “true” number of clusters, since this depends on the biological context (e.g. if we are comparing major cell populations vs. comparing rare subtypes), so the choice of the optimal number of clusters or resolutions requires some judgment and biological interpretation.

Once we have identified spatial domains, these can then be further investigated in additional downstream analyses.

Spatial domain identification is one component of a broader analysis workflow. Other related tasks include classifying cellular neighborhoods or niches based on local cell-type composition (see Chapter 22), identifying spatially variable genes (see Section 30.3.1), estimating cell-type mixtures in spots (see Chapter 13), and modeling spatially constrained cell-cell communication (see Chapter 23). In this chapter, we focus on clustering cells or spots into spatial domains, while pointing readers to complementary chapters for other related analyses.

29.1.2 Dependencies

Code

library(Banksy)
library(BayesSpace)
library(basilisk)
library(ggspavis)
library(OSTA.data)
library(patchwork)
library(pheatmap)
library(reticulate)
library(scrapper)
library(SpatialExperiment)
# set seed for random number generation
# in order to make results reproducible
set.seed(2025)
# load processed Xenium datasets
spe <- readRDS("img-spe_cl.rds")

29.2 Clustering

29.2.1 Non-spatial

As a baseline, we will perform a standard graph-based clustering approach developed for scRNA-seq data, using molecular features (gene expression) only. Specifically, we construct a shared nearest neighbor (SNN) graph based on non-spatial PCs (reducedDim slot "PCA_tx"), and use the Leiden algorithm for community detection (Traag et al. 2019).

The resolution parameter used here was selected in order to obtain a similar number of clusters annotated by the authors.

Code

# build cellular shared nearest-neighbor (SNN) graph;
# cluster using Leiden community detection algorithm
spe <- clusterGraph.se(spe,
    reddim.type="PCA_tx", 
    output.name="Leiden",
    method="leiden", 
    resolution=1.2, 
    num.neighbors=20,
    more.build.args=list(weight.scheme="jaccard"))
table(spe$Leiden)

29.2.2 Spatially-aware

Several methods to identify spatial domains in ST data have recently been developed, which each have various methodological and computational tradeoffs. A few examples include BayesSpace (Zhao et al. 2021), Banksy (Singhal et al. 2024) and PRECAST (Liu et al. 2023) in R, as well as CellCharter (Varrone et al. 2024) and SpaceFlow (Ren et al. 2022) in Python.

Here, we demonstrate tools representative of different methodologies, namely, (probabilistic) BayesSpace for seq-based Visium, as well as (neighborhood-based) Banksy and (encoder-based) CellCharter for img-based Xenium data.

29.2.2.1 Probabilistic

Underlying BayesSpace (Zhao et al. 2021) is a Bayesian model that encourages neighboring spots to belong to the same cluster, and utilizes Markov Chain Monte Carlo (MCMC) for estimating model parameters.

Default parameters are nrep=50,000 MCMC iterations and a burn-in period of burn.in=1,000; the authors recommend running with at least 10,000 iterations. Here, we use fewer iterations for the sake of runtime, which will arguably compromise performance. Note that, since MCMC is stochastic, a seed for random number generation should be set in order to make results reproducible.

Code

# load processed Visium dataset
vis <- readRDS("seq-spe_cl.rds")

# prepare data for 'BayesSpace'
# skipping PCA (already computed)
.vis <- spatialPreprocess(vis, skip.PCA=TRUE)

# perform spatial clustering with 'BayesSpace'
# using 'd=20' PCs and targeting 'q=10' clusters
.vis <- spatialCluster(.vis, q=10, d=20, nrep=1e3, burn.in=100) 
table(vis$BayesSpace <- factor(.vis$spatial.cluster))

29.2.2.2 Neighborhood-based

BANKSY (Singhal et al. 2024) embeds cells in a product space of their own and their local neighborhood’s (average) transcriptome, representing cell state and microenvironment. We have already computed Banksy PCs in Chapter 28 (reducedDim slot PCA_sp) so that here, we perform SNN graph-based Leiden clustering on said PCs in order to obtain spatially-aware cluster assignments:

Note that we are using a lower resolution here, as Banksy PCs tend to yield more clusters on the data at hand.

Code

# perform SNN graph-based clustering on 'Banksy' PCs using
# similar parameters as for 'non-spatial' clustering above
spe <- clusterGraph.se(spe,
    reddim.type="PCA_sp", 
    output.name="Banksy",
    method="leiden", 
    resolution=0.8, 
    num.neighbors=20,
    more.build.args=list(weight.scheme="jaccard"))
table(spe$Banksy)

29.2.2.3 Encoder-based

Another category of spatially-aware clustering method uses encoder architectures to generate a latent embedding incorporating both spatial and transcriptomics information, which is then clustered using standard algorithms. One typical example is cellcharter (Varrone et al. 2024), which uses a variational auto-encoder to generate a latent feature space and aggregate the features of each spot across its neighbors to preserve their spatial context.

Recent machine-learning approaches for spatial domain identification often construct a graph whose nodes represent spots or cells and whose edges encode spatial proximity, transcriptional similarity, histological similarity, or combinations thereof. Graph neural networks, graph attention autoencoders, variational autoencoders, contrastive learning, and foundation model embeddings can then be used to learn spatially informed representations before clustering. These methods can be powerful when tissue structure depends jointly on expression, local neighborhood composition, and histology, but they also introduce additional dependencies, hyperparameters, hardware requirements, and reproducibility considerations.

Other methods

Several other published methods have also adopted encoder architectures in different ways to perform spatially-aware clustering. STAGATE (Dong and Zhang 2022) uses a graph attention autoencoder to integrate spatial and transcriptional data by modeling spatial neighborhoods as graphs, whereas PROST (Liang et al. 2024) employs a self-attention transformer-based encoder to learn spatial patterns by jointly modeling gene expression and spatial coordinates. Graph-based encoder methods, such as STAGATE, may be particularly well-suited for imaging-based ST data, as their ability to model spatial neighborhoods as graphs aligns naturally with the continuous and fine-grained spatial organization of tissues captured through imaging. Other examples include SpaGCN (Hu et al. 2021), DeepST (Xu et al. 2022), GraphST (Long et al. 2023), SpaceFlow (Ren et al. 2022), nichecompass (Birk et al. 2025), and novae (Blampey et al. 2025). We demonstrate CellCharter here as one representative encoder-based workflow, and recommend benchmark and review papers below for method selection.

Code

# initialize 'basilisk' environment
env <- BasiliskEnvironment(
  envname="CellCharter", pkgname="OSTA",
  channels=c("conda-forge", "pytorch"),
  packages=c(
    "python=3.10.15",
    "pytorch=1.12.1",
    "torchvision=0.13.1",
    "torchaudio=0.12.1"),
  pip=c(
    "scvi-tools==0.20.3",
    "cellcharter==0.2.0",
    "anndata==0.10.9",
    "scanpy==1.10.4"))
# activate underlying conda environment
use_condaenv(obtainEnvironmentPath(env))
counts <- r_to_py(t(counts(spe)))
coords <- r_to_py(spatialCoords(spe))

Code

import cellcharter as cc
import anndata as ad
import squidpy as sq
import pandas as pd
import numpy as np
import random
import scvi

adata = ad.AnnData(X = r.counts,
                   obsm = {"spatial":r.coords},
                   layers = {"counts": r.counts})

seed = 2025
random.seed(seed)
scvi.settings.seed = seed

# variational autoencoder for feature extraction
scvi.model.SCVI.setup_anndata(adata, layer="counts")
model = scvi.model.SCVI(adata, n_hidden=64)
model.train(early_stopping=True,
    # the parameters below aim to reduce runtime; 
    # in reality, it'd be better to use defaults
    max_epochs=70, batch_size=512, train_size=0.5, validation_size=0.2)

adata.obsm["X_scVI"] = model.get_latent_representation(adata).astype(np.float32)

# Getting neighborhood aggregation
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True, spatial_key="spatial", percentile=99)
cc.gr.aggregate_neighbors(adata, n_layers=3, use_rep="X_scVI", out_key="X_cellcharter")

# clustering by scanning a range of data number
mod = cc.tl.Cluster(n_clusters=14, random_state=seed)
mod.fit(adata, use_rep="X_cellcharter")

label_df = pd.DataFrame({"label": mod.predict(adata, use_rep="X_cellcharter")}) 
label_df[["label"]].value_counts()

Code

# pull 'CellCharter' assignments from Python into R
spe$CellCharter <- unlist(py$label_df)

29.3 Downstream

29.3.1 Visualization

Let’s visualize the assignments obtained from non-spatial and spatially-aware clustering:

Code

ks <- c("Label", "Leiden", "Banksy")
lapply(ks, \(.) {
    plt <- plotCoords(spe, annotate=.)
    plt$layers[[1]]$aes_params$stroke <- 0
    plt$layers[[1]]$aes_params$size <- 0.2
    plt
}) |>
    wrap_plots(nrow=2) &
    scale_color_manual(values=unname(pals::trubetskoy())) &
    theme(legend.key.size=unit(0, "lines"), legend.justification="left")

Especially in large tissues, and when there are many subpopulations, the above type of plot makes it difficult to spot rare subpopulations, and might cause cells to overlap in regions with high cellular density. This can be misleading, as we will tend to see only highly abundant subpopulations, or the cells plotted last and on top (i.e., later columns in the object).

To better distinguish between different subpopulations, we can instead generate separate spatial plots with one subpopulation highlighted at a time:

Code

# plot selected clusters in order of frequency,
# highlighting cells assigned to cluster 'k'
lapply(tail(names(sort(table(spe$Banksy))), 12), \(k) {
    spe$foo <- spe$Banksy == k
    spe <- spe[, order(spe$foo)]
    plt <- plotCoords(spe, annotate="foo")
    plt$layers[[1]]$aes_params$stroke <- 0
    plt$layers[[1]]$aes_params$size <- 0.2
    plt + ggtitle(k)
}) |>
    wrap_plots(nrow=3) &
    scale_color_manual(values=c("lavender", "purple")) &
    theme(plot.title=element_text(hjust=0.5), legend.position="none")

PC regression

29.3.2 PC regression

For any single-cell analysis where downstream tasks rely on PCs, it is useful to perform linear regression of (continuous or categorical) covariates of interest onto PCs. This quantifies the variance explained by the covariate and can help assess the extend of unwanted variation (due to, e.g., cell area) as opposed to subpopulations driving transcriptional differences. Here, we regress total counts, cell area, and cluster assignments from different methods against PCs:

Code

pcs <- reducedDim(spe, "PCA_tx")
ids <- c("total_counts", "cell_area", ks)
pcr <- lapply(ids, \(id) {
    fit <- summary(lm(pcs ~ spe[[id]]))
    r2 <- sapply(fit, \(.) .$adj.r.squared)
    data.frame(id, pc=seq_along(r2), r2)
}) |> do.call(what=rbind)

Here, Leiden (transcription-only) and Banksy (spatially-aware) clusterings perform similarly in terms of capturing (spatially unaware) PCs; as expected, Leiden clusters do slightly better. Had we set a higher value for \(\lambda\) when running Banksy, results would diverge more; vice versa, using spatial PCs for regression would have Leiden clusters perform worse. Also, note that we would expect results to converge for \(\lambda=0\).

Code

pcr$id <- factor(pcr$id, ids)
pal <- c("red", "magenta", "gold", "cyan", "blue", "black")
ggplot(pcr, aes(pc, r2, col=id)) +
    geom_line(show.legend=FALSE) + geom_point() +
    scale_color_manual("predictor", values=pal) +
    scale_x_continuous(breaks=c(1, seq(5, 20, 5))) +
    scale_y_continuous(limits=c(0, 1), breaks=seq(0, 1, 0.2)) +
    labs(x="principal component", y="coeff. of determination") +
    guides(col=guide_legend(override.aes=list(size=2))) +
    coord_cartesian(xlim=c(1, 20)) +
    theme_minimal() + theme(
        panel.grid.minor=element_blank(),
        legend.key.size=unit(0, "lines"))

29.3.3 DGE analysis

To help characterize (unsupervised) clusters, we want to identify ‘markers’ for each subpopulation, i.e., features whose expression is positively or negatively restricted to (a) specific subpopulation(s). For details on identifying genes that are differentially expressed (DE) between groups of cells, we refer readers to OSCA; a standard approach is given below, visualizing the average expression of exemplary markers across clusters:

Code

# differential gene expression analysis
de <- scoreMarkers.se(spe, groups=spe$Leiden)

# select for a few markers per cluster
gs <- lapply(de, \(df) head(rownames(df), 7))
length(gs <- unique(unlist(gs)))

Code

# average expression by clusters
pbs <- aggregateAcrossCells.se(spe[gs, ], spe$Leiden, assay.type="logcounts")
mtx <- sweep(assay(pbs, "sums"), 2, pbs$counts, `/`)

# visualize averages z-scaled across clusters
pheatmap(mat=t(mtx), scale="column", breaks=seq(-2, 2, length=101))

29.4 Annotation

Following clustering and marker gene identification, the next step is the biological interpretation of identified subpopulations, typically by assigning them cell type or – in spatial context – tissue domain labels. Depending on data characteristics (e.g., spatial resolution, molecular coverage), and the availability of a suitable single-cell reference dataset, annotation can be performed using a range of approaches, ranging from manual to automated marker-based annotation, reference mapping, deep learning, or foundation models. Several common annotation strategies are outlined below.

Deconvolution: For spot-based ST (e.g., Visium) where spots contain multiple cells, it is typically required to estimate the proportions of different cell types within each spot through reference-based deconvolution (see Chapter 13).
Manual: Clusters can be labeled based on the expression of canonical marker genes and results from DE analysis (as shown above). While highly accurate when performed by experts, this process is labor-intensive and arguably subjective.
Marker-based: To reduce manual effort and improve reproducibility, clusters can be scored against predefined marker gene signatures (see Chapter 31) using tools such as CellAssign (Zhang et al. 2019) and scType (Ianevski et al. 2022).
Reference-based: For imaging-based ST data in particular (e.g., Xenium, MERFISH), labels from a high-quality, annotated scRNA-seq reference atlas can be “projected” onto the spatial data. Popular R-based tools include SingleR (Aran et al. 2019) (Bioconductor), and azimuth (Hao et al. 2021) (implemented as part of Seurat v5+).
Machine & deep learning: These methods leverage either high-capacity statistical models or multi-layered neural networks (e.g., VAEs, GNNs) to learn complex cellular representations. CellTypist (Domínguez Conde et al. 2022), for instance, uses a logistic regression framework and provides a variety of pre-trained models; scvi-tools’s scANVI (Xu et al. 2021) employs semi-supervised deep learning for label prediction.
Foundation models: More recently, foundation models (FMs) pretrained on very many single cells have emerged. Such models (e.g., scFoundation (Hao et al. 2024), scGPT (Cui et al. 2024), and Geneformer (Theodoris et al. 2023)) can be used zero-shot or fine-tuned to classify cell types across diverse tissues and platforms with high accuracy.

See also Chapter 34 for more details on deep learning-based approaches and foundation models beyond cell type annotation.

29.4.1 Reference datasets

Some annotation approaches rely on high-quality single-cell reference datasets. In this context, so-called atlas efforts aim to generate expert-curated, standardized resources that serve as biological anchors across diverse species, tissues, developmental stages, and disease states. These resources typically provide large collections of molecular profiles, either through dedicated publications or via centralized data portals that harmonize datasets from multiple studies. Below, we highlight several widely known resources, including ongoing efforts towards spatial and multi-modal ones.

CELLxGENE (Chan Zuckerberg Initiative) is a data portal and explorer that provides standardized access to thousands of datasets (Megill et al. 2021). These can be interactively queried and filtered for downstream tasks including, e.g., reference-based annotation, benchmarking, and cross-study integration.
The Human Cell Atlas (HCA) is a global consortium that is mapping every cell type in the human body” across health and disease (Regev et al. 2017). As part of this effort, multi-organ atlases such as Tabula Muris (Tabula Muris Consortium et al. 2018), Tabula Muris Senis (Tabula Muris Consortium 2020) and Tabula Sapiens (Tabula Sapiens Consortium* et al. 2022) provide high-resolution references for mouse and human, respectively.
The Human Protein Atlas (HPA) integrates transcriptomics and proteomics data to map their spatial distributions in human tissues and cells (Thul and Lindskog 2018).
Still work in progress, the Spatial Atlas of Human Anatomy (SAHA) is an effort to generate a multi-modal, subcellular-resolution spatial reference map for millions of cells in their native 3D tissue context (Park et al. 2025).

The Bioconductor package cellxgenedp provides an alternative, R-based interface, allowing programmatic discovery and retrieval of >2000 datasets from 100s of studies as H5AD files, which can be read as SingleCellExperiment objects using anndataR (Deconinck et al. 2026); see Chapter 8.There are several R/Bioconductor ExperimentHub packages that provide programmatic access to these atlases, e.g., TabulaMurisData and TabulaMurisSenisData. The ExperimentHub::query() command (R) or the BiocViews browser (online) can be used to explore current datasets – for annotation or otherwise.

29.4.2 Evaluation

While automated annotation can greatly accelerate analysis, the quality of labels obtained through such methods depends heavily on the alignment between query and reference. Misalignment can arise from technical effects, biological differences (e.g., healthy vs. diseased), or the presence of “out-of-reference” cell types. Evaluating reference suitability and prediction reliability is crucial. The scDiagnostics (Christidis et al. 2026) R/Bioconductor package provides a systematic framework for this task, e.g.:

Assessing distributional congruence by projecting query data into the reference’s subspace to check global alignment between datasets.
Marker gene validation by quantitatively comparing the expression profiles of canonical markers between query and reference clusters to detect gene-level discrepancies.
Detection of annotation anomalies by identifying cells that do not fit the reference; these may represent novel cell states or technical artifacts.

29.5 Appendix

When choosing among spatial clustering methods, we recommend starting from the biological question and data modality rather than from model complexity alone. Simpler spatially aware approaches may be preferable when interpretability, runtime, and reproducibility are priorities. Deep learning and graph-based methods may be more appropriate when histology, multi-modal features, or complex local tissue structure are central to the analysis. In all cases, spatial domains should be evaluated by marker genes, SVGs, histological concordance, parameter stability, and consistency with known tissue architecture where available.

References

Aran, Dvir, Agnieszka P Looney, Leqian Liu, et al. 2019. “Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.” Nature Immunology 20 (2): 163–72. https://doi.org/10.1038/s41590-018-0276-y.

Birk, Sebastian, Irene Bonafonte-Pardàs, Adib Miraki Feriz, et al. 2025. “Quantitative characterization of cell niches in spatially resolved omics data.” Nature Genetics 57 (4): 897–909. https://doi.org/10.1038/s41588-025-02120-6.

Blampey, Quentin, Hakim Benkirane, Nadège Bercovici, et al. 2025. “Novae: a graph-based foundation model for spatial transcriptomics data.” Nature Methods 22 (12): 2539–50. https://doi.org/10.1038/s41592-025-02899-6.

Chen, Renjie, Yue Yao, Jingyang Qian, Xin Peng, Xin Shao, and Xiaohui Fan. 2025. “A comprehensive benchmarking for spatially resolved transcriptomics clustering methods across variable technologies, organs, and replicates.” iMeta 4 (6): e70084. https://doi.org/10.1002/imt2.70084.

Cheng, Andrew, Guanyu Hu, and Wei Vivian Li. 2022. “Benchmarking cell-type clustering methods for spatially resolved transcriptomics data.” Briefings in Bioinformatics 24 (1): bbac475. https://doi.org/10.1093/bib/bbac475.

Christidis, Anthony, Andrew R Ghazi, Smriti Chawla, Nitesh Turaga, Robert Gentleman, and Ludwig Geistlinger. 2026. “scDiagnostics: systematic assessment of cell type annotation in single-cell transcriptomics data.” bioRxiv, 2026.01.29.701618. https://doi.org/10.64898/2026.01.29.701618.

Cui, Haotian, Chloe Wang, Hassaan Maan, et al. 2024. “scGPT: toward building a foundation model for single-cell multi-omics using generative AI.” Nature Methods 21 (8): 1470–80. https://doi.org/10.1038/s41592-024-02201-0.

Deconinck, Louise, Luke Zappia, Robrecht Cannoodt, et al. 2026. “anndataR improves interoperability between R and Python in single-cell transcriptomics.” Bioinformatics 42 (6): btag288. https://doi.org/10.1093/bioinformatics/btag288.

Domínguez Conde, C, C Xu, L B Jarvis, et al. 2022. “Cross-tissue immune cell analysis reveals tissue-specific features in humans.” Science (New York, N.Y.) 376 (6594): eabl5197. https://doi.org/10.1126/science.abl5197.

Dong, Kangning, and Shihua Zhang. 2022. “Deciphering Spatial Domains from Spatially Resolved Transcriptomics with an Adaptive Graph Attention Auto-Encoder.” Nature Communications 13 (1739). https://doi.org/10.1038/s41467-022-29439-6.

Hao, Minsheng, Jing Gong, Xin Zeng, et al. 2024. “Large-scale foundation model on single-cell transcriptomics.” Nature Methods 21 (8): 1481–91. https://doi.org/10.1038/s41592-024-02305-7.

Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, et al. 2021. “Integrated analysis of multimodal single-cell data.” Cell 184 (13): 3573–3587.e29. https://doi.org/10.1016/j.cell.2021.04.048.

Hu, Jian, Xiangjie Li, Kyle Coleman, et al. 2021. “SpaGCN: Integrating Gene Expression, Spatial Location and Histology to Identify Spatial Domains and Spatially Variable Genes by Graph Convolutional Network.” Nature Methods 18: 1342–51. https://doi.org/10.1038/s41592-021-01255-8.

Hu, Yunfei, Manfei Xie, Yikang Li, et al. 2024. “Benchmarking Clustering, Alignment, and Integration Methods for Spatial Transcriptomics.” Genome Biology 25 (212). https://doi.org/10.1186/s13059-024-03361-0.

Ianevski, Aleksandr, Anil K Giri, and Tero Aittokallio. 2022. “Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data.” Nature Communications 13 (1): 1246. https://doi.org/10.1038/s41467-022-28803-w.

Liang, Yuchen, Guowei Shi, Runlin Cai, et al. 2024. “PROST: Quantitative Identification of Spatially Variable Genes and Domain Detection in Spatial Transcriptomics.” Nature Communications 15 (600). https://doi.org/10.1038/s41467-024-44835-w.

Liu, Teng, Zhao-Yu Fang, Zongbo Zhang, Yongxiang Yu, Min Li, and Ming-Zhu Yin. 2024. “A Comprehensive Overview of Graph Neural Network-Based Approaches to Clustering for Spatial Transcriptomics.” Computational and Structural Biotechnology Journal 23: 106–28. https://doi.org/10.1016/j.csbj.2023.11.055.

Liu, Wei, Xu Liao, Ziye Luo, et al. 2023. “Probabilistic Embedding, Clustering, and Alignment for Integrating Spatial Transcriptomics Data with PRECAST.” Nature Communications 14 (296). https://doi.org/10.1038/s41467-023-35947-w.

Long, Yahui, Kok Siong Ang, Mengwei Li, et al. 2023. “Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST.” Nature Communications 14 (1): 1–19. https://doi.org/10.1038/s41467-023-36796-3.

Megill, Colin, Bruce Martin, Charlotte Weaver, et al. 2021. “Cellxgene: A performant, scalable exploration platform for high dimensional sparse matrices.” bioRxiv, 2021.04.05.438318. https://doi.org/10.1101/2021.04.05.438318.

Park, Jiwoon, Roberto De Gregorio, Erika Hissong, et al. 2025. “The Spatial Atlas of Human Anatomy (SAHA): A multimodal subcellular-resolution reference across human organs.” bioRxiv, 2025.06.16.658716. https://doi.org/10.1101/2025.06.16.658716.

Regev, Aviv, Sarah A Teichmann, Eric S Lander, et al. 2017. “The Human Cell Atlas.” Elife 6: e27041. https://doi.org/10.7554/eLife.27041.

Ren, Honglei, Benjamin L Walker, Zixuan Cang, and Qing Nie. 2022. “Identifying multicellular spatiotemporal organization of cells with SpaceFlow.” Nature Communications 13 (1): 4076. https://doi.org/10.1038/s41467-022-31739-w.

Singhal, Vipul, Nigel Chou, Joseph Lee, et al. 2024. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.” Nature Genetics 56: 431–41. https://doi.org/10.1038/s41588-024-01664-3.

Sun, Jieran, Kirti Biharie, Peiying Cai, et al. 2025. “Beyond benchmarking: an expert-guided consensus approach to spatially aware clustering.” bioRxiv, ahead of print. https://doi.org/10.1101/2025.06.23.660861.

Tabula Muris Consortium. 2020. “A single-cell transcriptomic atlas characterizes ageing tissues in the mouse.” Nature 583 (7817): 590–95. https://doi.org/10.1038/s41586-020-2496-1.

Tabula Muris Consortium, Overall coordination, Logistical coordination, et al. 2018. “Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris.” Nature 562 (7727): 367–72. https://doi.org/10.1038/s41586-018-0590-4.

Tabula Sapiens Consortium*, Robert C Jones, Jim Karkanias, et al. 2022. “The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans.” Science (New York, N.Y.) 376 (6594): eabl4896. https://doi.org/10.1126/science.abl4896.

Theodoris, Christina V, Ling Xiao, Anant Chopra, et al. 2023. “Transfer learning enables predictions in network biology.” Nature 618 (7965): 616–24. https://doi.org/10.1038/s41586-023-06139-9.

Thul, Peter J, and Cecilia Lindskog. 2018. “The human protein atlas: A spatial map of the human proteome: The Human Protein Atlas.” Protein Science 27 (1): 233–44. https://doi.org/10.1002/pro.3307.

Traag, V. A., L. Waltman, and N. J. van Eck. 2019. “From Louvain to Leiden: Guaranteeing Well-Connected Communities.” Scientific Reports 9 (5233). https://doi.org/10.1038/s41598-019-41695-z.

Varrone, Marco, Daniele Tavernari, Albert Santamaria-Martínez, Logan A. Walsh, and Giovanni Ciriello. 2024. “CellCharter Reveals Spatial Cell Niches Associated with Tissue Remodeling and Cell Plasticity.” Nature Genetics 56: 74–84. https://doi.org/10.1038/s41588-023-01588-4.

Wang, Ziyi, Aoyun Geng, Hao Duan, Feifei Cui, Quan Zou, and Zilong Zhang. 2024. “A Comprehensive Review of Approaches for Spatial Domain Recognition of Spatial Transcriptomes.” Briefings in Functional Genomics 23: 702–12. https://doi.org/10.1093/bfgp/elae040.

Xiong, Caiwei, Huang Shuai, Muqing Zhou, et al. 2025. “A comprehensive comparison on clustering methods for multi-slide spatially resolved transcriptomics data analysis.” bioRxiv, ahead of print. https://doi.org/10.1101/2025.01.19.633631.

Xu, Chang, Xiyun Jin, Songren Wei, et al. 2022. “DeepST: identifying spatial domains in spatial transcriptomics by deep learning.” Nucleic Acids Research 50 (22): e131. https://doi.org/10.1093/nar/gkac901.

Xu, Chenling, Romain Lopez, Edouard Mehlman, Jeffrey Regier, Michael I Jordan, and Nir Yosef. 2021. “Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models.” Molecular Systems Biology 17 (1): e9620. https://doi.org/10.15252/msb.20209620.

Zhang, Allen W, Ciara O’Flanagan, Elizabeth A Chavez, et al. 2019. “Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling.” Nature Methods 16 (10): 1007–15. https://doi.org/10.1038/s41592-019-0529-1.

Zhao, Edward, Matthew R. Stone, Xing Ren, et al. 2021. “Spatial Transcriptomics at Subspot Resolution with BayesSpace.” Nature Biotechnology 39: 1375–84. https://doi.org/10.1038/s41587-021-00935-2.