Beyond this book
The analytical landscape of spatial omics is vast and continuously developing. Here, we outline additional analysis tasks and topics that have not been covered elsewhere.
Throughout this book, we have emphasized the reproducibility standards provided by the R/Bioconductor ecosystem. However, certain tasks – particularly those involving machine and deep learning or intensive image analysis – frequently leverage the strengths of existing Python infrastructure. We consider these ecosystems complementary, and hope to address omissions in the future.
Finally, we view this e-book as a living resource and a community effort. If you are a keen developer or researcher with expertise in these (or other) analytical tasks, we welcome contributions to expand these sections into full chapters, whether to provide deeper theory or to include code examples. Please refer to our contribution guidelines.
Integration
Reconciling molecular measurements across multiple tissue sections or diverse platforms is essential for atlas building and comparative studies. In R/Bioconductor, linear methods like harmony (Korsunsky et al. 2019) have been shown to perform well for scRNA-seq data (Luecken et al. 2022). And, the CellMixS (Lütge et al. 2021) package implements several metrics to evaluate batch effects and correction.
In R/CRAN, seurat implements different options, including canonical correlation analysis (CCA) and also harmony (parameter defaults have changed between major releases). The variational autoencoder-based Python tool scvi-tools is another popular choice. And, many methods can be adapted to multi-modal data, e.g., by combining low-dimensional embeddings across modalities.
For ST data, BayesSpace (Zhao et al. 2021) integrates sections through joint spatial clustering. More recently, implicitly spatially aware integration frameworks such as PRECAST (W. Liu et al. 2023) have been developed to explicitly model spatial autocorrelation across slices, trading off batch correction and preservation of tissue architecture.
Trajectory inference
Trajectory inference (TI) aims to reconstruct dynamic biological processes by ordering cells or spots along paths of minimal transcriptional change, inferring a continuous progression known as pseudotime.
Foundational R/Bioconductor methods like monocle (Trapnell et al. 2014; Qiu et al. 2017) and slingshot (Street et al. 2018) remain popular also for spatially resolved data. Alternatively, spatially aware DR (see 28 Dimensionality reduction) can be used to obtain smoothed embeddings for downstream TI.
By contrast, ST data offer a means to reconstruct of pseudo-space-time, which reconciles transcriptional similarity with physical proximity. In Python spatially aware frameworks include SpaceFlow (Ren et al. 2022) that uses spatially regularized graph networks to learn spatially-coherent expression patterns for TI from ST data; stlearn (Pham et al. 2023), which penalizes transitions between physically distant points; spaTrack (Shen et al. 2025), an optimal transport-based approach to identify plausible paths.
CNV inference
Copy number variations (CNV), or alterations (CNA), aims to infer computationally genomic segments that have been duplicated or deleted from transcriptomics data. This strategy is particularly interesting in cancer, in order to identify malignant cells and, in spatially resolved data, map subclonal architecture.
In Python, the Broad Institute’s inferCNV is perhaps the most widely known tool for this task; while discontinued, infercnvpy represents a replacement with improved scalability and scanpy interoperability. In R/Bioconductor, infercnv provides a reticulate-based interface.
Briefly, inferCNV‘s approach is to, for every cell, compute the average gene expression over moving chromosomal windows, and to compare these to a set of ’normal’ reference cells. Results are typically captured as a heatmap where rows = cells and columns = genomic regions; the global loss/gain patterns can be compared with known driver-mutations in the scientific literature.
Foundation models
Foundation models (FMs) represent a modern paradigm where massive deep learning architectures (e.g., transformers) are pre-trained on millions of single-cell profiles, biological images, etc., to learn generalizable representations. All of these models are developed and implemented in Python, although R infrastructure to interface with pre-trained models are underway, i.e., making use of model weights and embeddings for downstream analysis.
Examples (mostly) mentioned across different chapters include Prov-GigaPath (Xu et al. 2024) (histopathology; 33 Image analysis); Geneformer (Theodoris et al. 2023), scGPT (Cui et al. 2024), scFoundation (Hao et al. 2024), and Novae (Blampey et al. 2025) (omics; 29 Clustering & annotation). Not an FM but worth mentioning: CellPLM (Wen et al. 2023) is a pre-trained “cell language model” with cells = tokens and tissues = sentences (an idea inspired by large language models); the model supports several downstream tasks such as denoising and annotation of scRNA-seq data, imputation of ST data, and perturbation prediction.
Another popular task not mentioned in previous chapter is the prediction of molecular changes upon perturbation (e.g., disease, treatment response); however, deep learning-based approaches for this task have been shown to not (yet) outperform simple linear baselines (Ahlmann-Eltze et al. 2025). Out-performance by simpler methods has also been demonstrated by Kedzierska et al. (2025) in the context of cell type annotation.
Multi-modality
Paralleling past technological developments around single-cell omics, spatial multi-omics approaches are sprouting by now, including spatial co-profiling of RNA with proteins (e.g., spatial-CITE-seq (Y. Liu et al. 2023)) and with the epigenome (e.g., spatial-ATAC-RNA-seq (Zhang et al. 2023)).
Simultaneous capture is by now also supported by commercial in situ platforms, such as 10x Genomics’ Xenium and Bruker’s CosMx, which enable co-detection of RNA and a curated set of protein targets. Microfluidic-based methods like DBiT-seq (Liu et al. 2020) and SPOTS (Ben-Chetrit et al. 2023) have further expanded these capabilities to high-throughput sequencing.
Computationally, the challenge lies in the joint latent representation of disparate data types. In R/Bioconductor, MOFA2 (Velten et al. 2022) provides a factor analysis framework for multi-modal integration. The MultiAssayExperiment class offers foundational infrastructure to manage synchronicity between linked data layers. In Python, tools such as scvi-tools’s MultiVI (Ashuach et al. 2023) and SpatialGlue (Long et al. 2024) use deep learning (e.g., graph neural networks) to reconcile these layers while preserving spatial context.