29  Clustering & annotation

29.1 Preamble

29.1.1 Introduction

In spatial transcriptomics data, we can apply clustering algorithms to identify spatial domains, which represent spatially defined regions consisting of relatively consistent gene expression profiles. For example, spatial domains may consist of regions containing cells from a single cell type or a consistent mixture of cell types.

Several alternative approaches exist for identifying spatial domains. One strategy involves applying standard clustering algorithms from single-cell workflows without incorporating spatial information. Alternatively, spatial information can be integrated at various stages of the workflow. For example, spatial data can inform preprocessing steps, such as selecting spatially variable genes (SVGs), followed by the application of either non-spatial or spatial clustering algorithms. Another widely used strategy involves generating a latent space that incorporates spatial coordinates, often leveraging Bayesian framework or neural networks.

As advancements in spatial transcriptomics techniques enhance tissue resolution, computational tradeoffs between various approaches and parameters become increasingly relevant.

It is also important to keep in mind that when we use clustering to define cell types and/or states, these can be defined at various resolutions (or even on a continuum). The optimal number of clusters depends on the biological context – in particular, there is no “true” number of clusters, since this depends on the biological context (e.g. if we are comparing major cell populations vs. comparing rare subtypes), so the choice of the optimal number of clusters or resolutions requires some judgment and biological interpretation.

Once we have identified spatial domains, these can then be further investigated in additional downstream analyses.

29.1.2 Dependencies

Code
library(Banksy)
library(BayesSpace)
library(basilisk)
library(ggspavis)
library(OSTA.data)
library(patchwork)
library(pheatmap)
library(reticulate)
library(scrapper)
library(SpatialExperiment)
# set seed for random number generation
# in order to make results reproducible
set.seed(2025)
# load processed Xenium datasets
spe <- readRDS("img-spe_cl.rds")

29.2 Clustering

29.2.1 Non-spatial

As a baseline, we will perform a standard graph-based clustering approach developed for scRNA-seq data, using molecular features (gene expression) only. Specifically, we construct a shared nearest neighbor (SNN) graph based on non-spatial PCs (reducedDim slot "PCA_tx"), and use the Leiden algorithm for community detection (Traag et al. 2019).

The resolution parameter used here was selected in order to obtain a similar number of clusters annotated by the authors.
Code
# build cellular shared nearest-neighbor (SNN) graph;
# cluster using Leiden community detection algorithm
spe <- clusterGraph.se(spe,
    reddim.type="PCA_tx", 
    output.name="Leiden",
    method="leiden", 
    resolution=1.2, 
    num.neighbors=20,
    more.build.args=list(weight.scheme="jaccard"))
table(spe$Leiden)
##  
##      1     2     3     4     5     6     7     8     9    10    11    12 
##  11357  3586  3284  5341 11089 25237  7085 12739 10625  6573 13346  8120 
##     13    14    15    16 
##  11085  4079  4124  2598

29.2.2 Spatially-aware

Several methods to identify spatial domains in ST data have recently been developed, which each have various methodological and computational tradeoffs. A few example include BayesSpace (Zhao et al. 2021), Banksy (Singhal et al. 2024) and PRECAST (Liu et al. 2023) in R, as well as CellCharter (Varrone et al. 2024) and SpaceFlow (Ren et al. 2022) in Python.

Here, we demonstrate tools representative of different methodologies, namely, (probabilistic) BayesSpace for seq-based Visium, as well as (neighborhood-based) Banksy and (encoder-based) CellCharter for img-based Xenium data.

29.2.2.1 Probabilistic

Underlying BayesSpace (Zhao et al. 2021) is a Bayesian model that encourages neighboring spots to belong to the same cluster, and utilizes Markov Chain Monte Carlo (MCMC) for estimating model parameters.

Default parameters are nrep=50,000 MCMC iterations and a burn-in period of burn.in=1,000; the authors recommend running with at least 10,000 iterations. Here, we use fewer iterations for the sake of runtime, which will arguably compromise performance. Note that, since MCMC is stochastic, a seed for random number generation should be set in order to make results reproducible.
Code
# load processed Visium dataset
vis <- readRDS("seq-spe_cl.rds")

# prepare data for 'BayesSpace'
# skipping PCA (already computed)
.vis <- spatialPreprocess(vis, skip.PCA=TRUE)

# perform spatial clustering with 'BayesSpace'
# using 'd=20' PCs and targeting 'q=10' clusters
.vis <- spatialCluster(.vis, q=10, d=20, nrep=1e3, burn.in=100) 
table(vis$BayesSpace <- factor(.vis$spatial.cluster))
##  
##    1   2   3   4   5   6   7   8   9  10 
##  499 557 161 500 323 308 354 375 181 381

29.2.2.2 Neighborhood-based

BANKSY (Singhal et al. 2024) embeds cells in a product space of their own and their local neighborhood’s (average) transcriptome, representing cell state and microenvironment. We have already computed Banksy PCs in Chapter 28 (reducedDim slot PCA_sp) so that here, we perform SNN graph-based Leiden clustering on said PCs in order to obtain spatially-aware cluster assignments:

Note that we are using a lower resolution here, as Banksy PCs tend to yield more clusters on the data at hand.
Code
# perform SNN graph-based clustering on 'Banksy' PCs using
# similar parameters as for 'non-spatial' clustering above
spe <- clusterGraph.se(spe,
    reddim.type="PCA_sp", 
    output.name="Banksy",
    method="leiden", 
    resolution=0.8, 
    num.neighbors=20,
    more.build.args=list(weight.scheme="jaccard"))
table(spe$Banksy)
##  
##      1     2     3     4     5     6     7     8     9    10    11    12 
##  10987  5616  3793 11305 21021  8396  7901  5696 17201 13005 10499  4138 
##     13    14    15    16    17 
##    568  3472 12858   633  3179

29.2.2.3 Encoder-based

Another category of spatially-aware clustering method uses encoder architectures to generate a latent embedding incorporating both spatial and transcriptomics information, which is then clustered using standard algorithms. One typical example is cellcharter (Varrone et al. 2024), which uses a variational auto-encoder to generate a latent feature space and aggregate the features of each spot across its neighbors to preserve their spatial context.

Several other published methods have also adopted encoder architectures in different ways to perform spatially-aware clustering. STAGATE (Dong and Zhang 2022) uses a graph attention autoencoder to integrate spatial and transcriptional data by modeling spatial neighborhoods as graphs, whereas PROST (Liang et al. 2024) employs a self-attention transformer-based encoder to learn spatial patterns by jointly modeling gene expression and spatial coordinates. Graph-based encoder methods, such as STAGATE, may be particularly well-suited for imaging-based ST data, as their ability to model spatial neighborhoods as graphs aligns naturally with the continuous and fine-grained spatial organization of tissues captured through imaging.

Code
# initialize 'basilisk' environment
env <- BasiliskEnvironment(
  envname="CellCharter", pkgname="OSTA",
  channels=c("conda-forge", "pytorch"),
  packages=c(
    "python=3.10.15",
    "pytorch=1.12.1",
    "torchvision=0.13.1",
    "torchaudio=0.12.1"),
  pip=c(
    "scvi-tools==0.20.3",
    "cellcharter==0.2.0",
    "anndata==0.10.9",
    "scanpy==1.10.4"))
# activate underlying conda environment
use_condaenv(obtainEnvironmentPath(env))
counts <- r_to_py(t(counts(spe)))
coords <- r_to_py(spatialCoords(spe))
Code
import cellcharter as cc
import anndata as ad
import squidpy as sq
import pandas as pd
import numpy as np
import random
import scvi

adata = ad.AnnData(X = r.counts,
                   obsm = {"spatial":r.coords},
                   layers = {"counts": r.counts})

seed = 2025
random.seed(seed)
scvi.settings.seed = seed

# variational autoencoder for feature extraction
scvi.model.SCVI.setup_anndata(adata, layer="counts")
model = scvi.model.SCVI(adata, n_hidden=64)
model.train(early_stopping=True,
    # the parameters below aim to reduce runtime; 
    # in reality, it'd be better to use defaults
    max_epochs=70, batch_size=512, train_size=0.5, validation_size=0.2)

adata.obsm["X_scVI"] = model.get_latent_representation(adata).astype(np.float32)

# Getting neighborhood aggregation
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True, spatial_key="spatial", percentile=99)
cc.gr.aggregate_neighbors(adata, n_layers=3, use_rep="X_scVI", out_key="X_cellcharter")

# clustering by scanning a range of data number
mod = cc.tl.Cluster(n_clusters=14, random_state=seed)
mod.fit(adata, use_rep="X_cellcharter")

label_df = pd.DataFrame({"label": mod.predict(adata, use_rep="X_cellcharter")}) 
label_df[["label"]].value_counts()
Code
# pull 'CellCharter' assignments from Python into R
spe$CellCharter <- unlist(py$label_df)

29.3 Downstream

29.3.1 Visualization

Let’s visualize the assignment obtains from non-spatial and spatially-aware clustering:

Code
ks <- c("Label", "Leiden", "Banksy")
lapply(ks, \(.) {
    plt <- plotCoords(spe, annotate=.)
    plt$layers[[1]]$aes_params$stroke <- 0
    plt$layers[[1]]$aes_params$size <- 0.2
    plt
}) |>
    wrap_plots(nrow=2) &
    scale_color_manual(values=unname(pals::trubetskoy())) &
    theme(legend.key.size=unit(0, "lines"), legend.justification="left")

Especially in large tissues, and when there are many subpopulations, the above type of plot makes it difficult to spot rare subpopulations, and might cause cells to overlap in regions with high cellular density. This can be misleading, as we will tend to see only highly abundant subpopulations, or the cells plotted last and on top (i.e., later columns in the object).

To better distinguish between different subpopulations, we can instead generate separate spatial plots with one subpopulation highlighted at a time:

Code
# plot selected clusters in order of frequency,
# highlighting cells assigned to cluster 'k'
lapply(tail(names(sort(table(spe$Banksy))), 12), \(k) {
    spe$foo <- spe$Banksy == k
    spe <- spe[, order(spe$foo)]
    plt <- plotCoords(spe, annotate="foo")
    plt$layers[[1]]$aes_params$stroke <- 0
    plt$layers[[1]]$aes_params$size <- 0.2
    plt + ggtitle(k)
}) |>
    wrap_plots(nrow=3) &
    scale_color_manual(values=c("lavender", "purple")) &
    theme(plot.title=element_text(hjust=0.5), legend.position="none")

29.3.2 PC regression

For any single-cell analysis where downstream tasks rely on PCs, it is useful to perform linear regression of (continuous or categorical) covariates of interest onto PCs. This quantifies the variance explained by the covariate and can help assess the extend of unwanted variation (due to, e.g., cell area) as opposed to subpopulations driving transcriptional differences. Here, we regress total counts, cell area, and cluster assignments from different methods against PCs:

Code
pcs <- reducedDim(spe, "PCA_tx")
ids <- c("total_counts", "cell_area", ks)
pcr <- lapply(ids, \(id) {
    fit <- summary(lm(pcs ~ spe[[id]]))
    r2 <- sapply(fit, \(.) .$adj.r.squared)
    data.frame(id, pc=seq_along(r2), r2)
}) |> do.call(what=rbind)

Here, Leiden (transcription-only) and Banksy (spatially-aware) clusterings perform similar in terms of capturing (spatially unaware) PCs; as to be expected, Leiden clusters do slightly better. Had we set a higher value for \(\lambda\) when running Banksy, results would diverge more; vice versa, using spatial PCs for regression would have Leiden clusters perform worse. Also, note that we would expect results to converge for \(\lambda=0\).

Code
pcr$id <- factor(pcr$id, ids)
pal <- c("red", "magenta", "gold", "cyan", "blue", "black")
ggplot(pcr, aes(pc, r2, col=id)) +
    geom_line(show.legend=FALSE) + geom_point() +
    scale_color_manual("predictor", values=pal) +
    scale_x_continuous(breaks=c(1, seq(5, 20, 5))) +
    scale_y_continuous(limits=c(0, 1), breaks=seq(0, 1, 0.2)) +
    labs(x="principal component", y="coeff. of determination") +
    guides(col=guide_legend(override.aes=list(size=2))) +
    coord_cartesian(xlim=c(1, 20)) +
    theme_minimal() + theme(
        panel.grid.minor=element_blank(),
        legend.key.size=unit(0, "lines"))

29.3.3 DGE analysis

To help characterize (unsupervised) clusters, we want to identify ‘markers’ for each subpopulation, i.e., features whose expression is positively or negatively restricted to (a) specific subpopulation(s). For details on identifying genes that are differentially expressed (DE) between groups of cells, we refer readers to OSCA; a standard approach is given below, visualizing the average expression of exemplary markers across clusters:

Code
# differential gene expression analysis
de <- scoreMarkers.se(spe, groups=spe$Leiden)

# select for a few markers per cluster
gs <- lapply(de, \(df) head(rownames(df), 7))
length(gs <- unique(unlist(gs)))
##  [1] 75
Code
# average expression by clusters
pbs <- aggregateAcrossCells.se(spe[gs, ], spe$Leiden, assay.type="logcounts")
mtx <- sweep(assay(pbs, "sums"), 2, pbs$counts, `/`)

# visualize averages z-scaled across clusters
pheatmap(mat=t(mtx), scale="column", breaks=seq(-2, 2, length=101))

29.4 Annotation

Following clustering and marker gene identification, the next step is the biological interpretation of identified subpopulations, typically by assigning them cell type or – in spatial context – tissue domain labels. Depending on data characteristics (e.g., spatial resolution, molecular coverage), and the availability of a suitable single-cell reference dataset, annotation can be performed using a range of approaches, ranging from manual to automated marker-based annotation, reference mapping, deep learning, or foundation models. Several common annotation strategies are outlined below.

  • Deconvolution: For spot-based ST (e.g., Visium) where spots contain multiple cells, it is typically required to estimate the proportions of different cell types within each spot through reference-based deconvolution (see Chapter 13).

  • Manual: Clusters can be labeled based on the expression of canonical marker genes and results from DE analysis (as shown above). While highly accurate when performed by experts, this process is labor-intensive and arguably subjective.

  • Marker-based: To reduce manual effort and improve reproducibility, clusters can be scored against predefined marker gene signatures (see Chapter 31) using tools such as CellAssign (Zhang et al. 2019) and scType (Ianevski et al. 2022).

  • Reference-based: For imaging-based ST data in particular (e.g., Xenium, MERFISH), labels from a high-quality, annotated scRNA-seq reference atlas can be “projected” onto the spatial data. Popular R-based tools include SingleR (Aran et al. 2019) (Bioconductor), and azimuth (Hao et al. 2021) (implemented as part of Seurat v5+).

  • Machine & deep learning: These methods leverage either high-capacity statistical models or multi-layered neural networks (e.g., VAEs, GNNs) to learn complex cellular representations. CellTypist (Domínguez Conde et al. 2022), for instance, uses a logistic regression framework and provides a variety of pre-trained models; scvi-tools’s scANVI (Xu et al. 2021) employs semi-supervised deep learning for label prediction.

  • Foundation models: More recently, foundation models (FMs) pretrained on very many single cells have emerged. Such models (e.g., scFoundation (Hao et al. 2024), scGPT (Cui et al. 2024), and Geneformer (Theodoris et al. 2023)) can be used zero-shot or fine-tuned to classify cell types across diverse tissues and platforms with high accuracy.

29.4.1 References

Some annotation approaches rely on high-quality single-cell reference datasets. In this context, so-called atlas efforts aim to generate expert-curated, standardized resources that serve as biological anchors across diverse species, tissues, developmental stages, and disease states. These resources typically provide large collections of molecular profiles, either through dedicated publications or via centralized data portals that harmonize datasets from multiple studies. Below, we highlight several widely known resources, including ongoing efforts towards spatial and multi-modal ones.

  • CELLxGENE (Chan Zuckerberg Initiative) is a data portal and explorer that provides standardized access to thousands of datasets (Megill et al. 2021). These can be interactively queried and filtered for downstream tasks including, e.g., reference-based annotation, benchmarking, and cross-study integration.

  • The Human Cell Atlas (HCA) is a global consortium that is mapping every cell type in the human body” across health and disease (Regev et al. 2017). As part of this effort, multi-organ atlases such as Tabula Muris (Tabula Muris Consortium et al. 2018), Tabula Muris Senis (Tabula Muris Consortium 2020) and Tabula Sapiens (Tabula Sapiens Consortium* et al. 2022) provide high-resolution references for mouse and human, respectively.

  • The Human Protein Atlas (HPA) integrates transcriptomics and proteomics data to map their spatial distributions in human tissues and cells (Thul and Lindskog 2018).

  • Still work in progress, the Spatial Atlas of Human Anatomy (SAHA) is an effort to generate a multi-modal, subcellular-resolution spatial reference map for millions of cells in their native 3D tissue context (Park et al. 2025).

The Bioconductor package cellxgenedp provides an alternative, R-based interface, allowing programmatic discovery and retrieval of >2000 datasets from 100s of studies as H5AD files, which can be read as SingleCellExperiment objects using anndataR (Deconinck et al. 2026); see Chapter 8.There are several R/Bioconductor ExperimentHub packages that provide programmatic access to these atlases, e.g., TabulaMurisData and TabulaMurisSenisData. The ExperimentHub::query() command (R) or the BiocViews browser (online) can be used to explore current datasets – for annotation or otherwise.

29.4.2 Evaluation

While automated annotation can greatly accelerate analysis, the quality of labels obtained through such methods depends heavily on the alignment between query and reference. Misalignment can arise from technical effects, biological differences (e.g., healthy vs. diseased), or the presence of “out-of-reference” cell types. Evaluating reference suitability and prediction reliability is crucial. The scDiagnostics (Christidis et al. 2026) R/Bioconductor package provides a systematic framework for this task, e.g.:

  • Assessing distributional congruence by projecting query data into the reference’s subspace to check global alignment between datasets.

  • Marker gene validation by quantitatively comparing the expression profiles of canonical markers between query and reference clusters to detect gene-level discrepancies.

  • Detection of annotation anomalies by identifying cells that do not fit the reference; these may represent novel cell states or technical artifacts.

29.5 Appendix

TipFurther reading
  • The single-cell best practices chapter on spatial domains divides methods into two broad categories: (i) methods that identify spatial domains using gene expression together with spatial coordinates, and (ii) methods that additionally incorporate features extracted from histological images. Previous chapters also cover single-cell clustering and annotation using Python/scverse.

  • The OSCA chapter on clustering provides an overview of graph-based, hierarchical, and centroid-based clustering methods for scRNA-seq data. Together with the preceding chapter on dimensionality reduction, these concepts underpin many spatial clustering approaches. Marker gene detection and cell type annotation are covered in the following couple chapters.

  • Wang et al. (2024) categorize ST clustering approaches into probability statistics-, graph neural network- and contrastive learning-based approaches, and review their advantages and limitations for clustering of data from seq- and img-based assays.

  • Sun et al. (2025) compare 22 methods across 15 datasets spanning 9 technologies and diverse tissue types. Notably, the authors also propose a “consensus-guided workflow […] to generate consensus representations”, moving beyond traditional scoring and comparison of methods based on ground truths.

  • Chen et al. (2025) compare 14 methods across ~600 datasets from 10 technologies. Several other benchmarking studies have also compared different subsets of methods across different subsets of platforms and tissue types (Cheng et al. 2022; Hu et al. 2024; Liu et al. 2024; Xiong et al. 2025).

References

Aran, Dvir, Agnieszka P Looney, Leqian Liu, et al. 2019. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.” Nature Immunology 20 (2): 163–72. https://doi.org/10.1038/s41590-018-0276-y.
Chen, Renjie, Yue Yao, Jingyang Qian, Xin Peng, Xin Shao, and Xiaohui Fan. 2025. A comprehensive benchmarking for spatially resolved transcriptomics clustering methods across variable technologies, organs, and replicates.” iMeta 4 (6): e70084. https://doi.org/10.1002/imt2.70084.
Cheng, Andrew, Guanyu Hu, and Wei Vivian Li. 2022. Benchmarking cell-type clustering methods for spatially resolved transcriptomics data.” Briefings in Bioinformatics 24 (1): bbac475. https://doi.org/10.1093/bib/bbac475.
Christidis, Anthony, Andrew R Ghazi, Smriti Chawla, Nitesh Turaga, Robert Gentleman, and Ludwig Geistlinger. 2026. scDiagnostics: systematic assessment of cell type annotation in single-cell transcriptomics data.” bioRxiv, 2026.01.29.701618. https://doi.org/10.64898/2026.01.29.701618.
Cui, Haotian, Chloe Wang, Hassaan Maan, et al. 2024. scGPT: toward building a foundation model for single-cell multi-omics using generative AI.” Nature Methods 21 (8): 1470–80. https://doi.org/10.1038/s41592-024-02201-0.
Deconinck, Louise, Luke Zappia, Robrecht Cannoodt, et al. 2026. anndataR improves interoperability between R and Python in single-cell transcriptomics.” Bioinformatics 42 (6): btag288. https://doi.org/10.1093/bioinformatics/btag288.
Domínguez Conde, C, C Xu, L B Jarvis, et al. 2022. Cross-tissue immune cell analysis reveals tissue-specific features in humans.” Science (New York, N.Y.) 376 (6594): eabl5197. https://doi.org/10.1126/science.abl5197.
Dong, Kangning, and Shihua Zhang. 2022. “Deciphering Spatial Domains from Spatially Resolved Transcriptomics with an Adaptive Graph Attention Auto-Encoder.” Nature Communications 13 (1739). https://doi.org/10.1038/s41467-022-29439-6.
Hao, Minsheng, Jing Gong, Xin Zeng, et al. 2024. Large-scale foundation model on single-cell transcriptomics.” Nature Methods 21 (8): 1481–91. https://doi.org/10.1038/s41592-024-02305-7.
Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, et al. 2021. Integrated analysis of multimodal single-cell data.” Cell 184 (13): 3573–3587.e29. https://doi.org/10.1016/j.cell.2021.04.048.
Hu, Yunfei, Manfei Xie, Yikang Li, et al. 2024. “Benchmarking Clustering, Alignment, and Integration Methods for Spatial Transcriptomics.” Genome Biology 25 (212). https://doi.org/10.1186/s13059-024-03361-0.
Ianevski, Aleksandr, Anil K Giri, and Tero Aittokallio. 2022. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data.” Nature Communications 13 (1): 1246. https://doi.org/10.1038/s41467-022-28803-w.
Liang, Yuchen, Guowei Shi, Runlin Cai, et al. 2024. “PROST: Quantitative Identification of Spatially Variable Genes and Domain Detection in Spatial Transcriptomics.” Nature Communications 15 (600). https://doi.org/10.1038/s41467-024-44835-w.
Liu, Teng, Zhao-Yu Fang, Zongbo Zhang, Yongxiang Yu, Min Li, and Ming-Zhu Yin. 2024. “A Comprehensive Overview of Graph Neural Network-Based Approaches to Clustering for Spatial Transcriptomics.” Computational and Structural Biotechnology Journal 23: 106–28. https://doi.org/10.1016/j.csbj.2023.11.055.
Liu, Wei, Xu Liao, Ziye Luo, et al. 2023. “Probabilistic Embedding, Clustering, and Alignment for Integrating Spatial Transcriptomics Data with PRECAST.” Nature Communications 14 (296). https://doi.org/10.1038/s41467-023-35947-w.
Megill, Colin, Bruce Martin, Charlotte Weaver, et al. 2021. Cellxgene: A performant, scalable exploration platform for high dimensional sparse matrices.” bioRxiv, 2021.04.05.438318. https://doi.org/10.1101/2021.04.05.438318.
Park, Jiwoon, Roberto De Gregorio, Erika Hissong, et al. 2025. The Spatial Atlas of Human Anatomy (SAHA): A multimodal subcellular-resolution reference across human organs.” bioRxiv, 2025.06.16.658716. https://doi.org/10.1101/2025.06.16.658716.
Regev, Aviv, Sarah A Teichmann, Eric S Lander, et al. 2017. The Human Cell Atlas.” Elife 6: e27041. https://doi.org/10.7554/eLife.27041.
Ren, Honglei, Benjamin L Walker, Zixuan Cang, and Qing Nie. 2022. Identifying multicellular spatiotemporal organization of cells with SpaceFlow.” Nature Communications 13 (1): 4076. https://doi.org/10.1038/s41467-022-31739-w.
Singhal, Vipul, Nigel Chou, Joseph Lee, et al. 2024. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.” Nature Genetics 56: 431–41. https://doi.org/10.1038/s41588-024-01664-3.
Sun, Jieran, Kirti Biharie, Peiying Cai, et al. 2025. Beyond benchmarking: an expert-guided consensus approach to spatially aware clustering.” bioRxiv, ahead of print. https://doi.org/10.1101/2025.06.23.660861.
Tabula Muris Consortium. 2020. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse.” Nature 583 (7817): 590–95. https://doi.org/10.1038/s41586-020-2496-1.
Tabula Muris Consortium, Overall coordination, Logistical coordination, et al. 2018. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris.” Nature 562 (7727): 367–72. https://doi.org/10.1038/s41586-018-0590-4.
Tabula Sapiens Consortium*, Robert C Jones, Jim Karkanias, et al. 2022. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans.” Science (New York, N.Y.) 376 (6594): eabl4896. https://doi.org/10.1126/science.abl4896.
Theodoris, Christina V, Ling Xiao, Anant Chopra, et al. 2023. Transfer learning enables predictions in network biology.” Nature 618 (7965): 616–24. https://doi.org/10.1038/s41586-023-06139-9.
Thul, Peter J, and Cecilia Lindskog. 2018. The human protein atlas: A spatial map of the human proteome: The Human Protein Atlas.” Protein Science 27 (1): 233–44. https://doi.org/10.1002/pro.3307.
Traag, V. A., L. Waltman, and N. J. van Eck. 2019. “From Louvain to Leiden: Guaranteeing Well-Connected Communities.” Scientific Reports 9 (5233). https://doi.org/10.1038/s41598-019-41695-z.
Varrone, Marco, Daniele Tavernari, Albert Santamaria-Martínez, Logan A. Walsh, and Giovanni Ciriello. 2024. “CellCharter Reveals Spatial Cell Niches Associated with Tissue Remodeling and Cell Plasticity.” Nature Genetics 56: 74–84. https://doi.org/10.1038/s41588-023-01588-4.
Wang, Ziyi, Aoyun Geng, Hao Duan, Feifei Cui, Quan Zou, and Zilong Zhang. 2024. “A Comprehensive Review of Approaches for Spatial Domain Recognition of Spatial Transcriptomes.” Briefings in Functional Genomics 23: 702–12. https://doi.org/10.1093/bfgp/elae040.
Xiong, Caiwei, Huang Shuai, Muqing Zhou, et al. 2025. A comprehensive comparison on clustering methods for multi-slide spatially resolved transcriptomics data analysis.” bioRxiv, ahead of print. https://doi.org/10.1101/2025.01.19.633631.
Xu, Chenling, Romain Lopez, Edouard Mehlman, Jeffrey Regier, Michael I Jordan, and Nir Yosef. 2021. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models.” Molecular Systems Biology 17 (1): e9620. https://doi.org/10.15252/msb.20209620.
Zhang, Allen W, Ciara O’Flanagan, Elizabeth A Chavez, et al. 2019. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling.” Nature Methods 16 (10): 1007–15. https://doi.org/10.1038/s41592-019-0529-1.
Zhao, Edward, Matthew R. Stone, Xing Ren, et al. 2021. “Spatial Transcriptomics at Subspot Resolution with BayesSpace.” Nature Biotechnology 39: 1375–84. https://doi.org/10.1038/s41587-021-00935-2.
Back to top