2 Spatial omics
2.1 Introduction
Spatial omics (or spatially-resolved omics) data refers to a set of recently-developed technologies that enable molecular measurements with spatial resolution. Spatial transcriptomics was named the Method of the Year 2020 (Editorial 2021) and spatial proteomics was highlighted as Method of the Year 2024 Editorial (2024) by the journal Nature Methods, and each has become widely applied in a range of biological contexts. In general, there are now a wide variety of modalities that can be measured (e.g. antibody-based protein expression, gene expression, chromatin accessibility, and histone modifications), with diverse measurement streams (e.g. imaging, mass spectrometry, and high-throughput sequencing), all of which give molecular measurements in a spatial context for a given appropriately-handled segment of tissue.
Platforms differ drasticallly in terms of the experimental procedures used (sequence or ion counts versus fluorescence intensities), the feature space (10s of proteins in imaging mass cytometry to full transcriptome in Visium / Visium HD, to genome-wide assessments of chromatin accessibility), and spatial resolution (e.g. single-cell resolution or multiple cells per measurement location). In general, this also means there are tradeoffs between the number of features, spatial resolution, and sensitivity of the assays.
Platforms may be broadly grouped into “sequencing-based” and “imaging-based” technologies; some of the latter can be further classified into “molecule-based” or not. The main platforms are described in more detail below. Sequencing-based platforms tend to provide higher gene coverage (e.g. full-transcriptome), while imaging-based platforms tend to provide higher spatial resolution (e.g. single-cell or subcellular resolution).
In this book, we focus on commercially available platforms, since these are the most widely used and accessible. However, the data representations are often similar for other related platforms. The main sections of the book are split into separate parts for sequencing-based and imaging-based platforms, since several analysis techniques are specific to each of these, followed by a part for non-platform-specific downstream analyses.
In the subsections below, we give a brief overview of some commercially available platforms. For more in-depth background, several recent reviews are available (Bressan, Battistoni, and Hannon 2023; Moses and Pachter 2022; Tian, Chen, and Macosko 2022; Lundberg and Borner 2019; Gulati et al. 2024; Paul et al. 2021; Mund, Brunner, and Mann 2022; Palla et al. 2022; Moffitt, Lundberg, and Heyn 2022; Rao et al. 2021; Cheng et al. 2023), covering available platforms, analysis methods, outstanding challenges, and additional topics.
2.2 Sequencing-based platforms
Sequencing-based platforms capture DNA fragments (which could represent gene expression, DNA binding, antibody-conjugated tags, etc.) at a set of spatial measurement locations for a tissue section placed on a slide. The spatial location is tagged via a unique barcode for each measurement location, and reads are perhaps summarized (e.g. counts) according to genes or bins.
The advantage of sequencing is that typically the features represent an untargeted set of molecular entities, thus not requiring panel selection and optimization. In practice, some spatial assays still require panels (e.g. spatial variants of CITE-seq; (Liu et al. 2023)) and assays, such as Visium and Visium HD, use genome-wide gene capture panels, and thus cannot always be applied to non-model organisms.
Spatial resolution varies between platforms, and depends on the size and spacing between the spatial measurement locations. Depending on the spatial resolution and tissue cell density in a given biological sample, each spatial measurement location may contain zero, one, or multiple cells. For these platforms, the spatial measurement locations are often referred to as “spots” or “beads”. In this book, we will frequently use the terminology “spots”.
2.3 Imaging-based platforms
Imaging-based platforms (or molecule-based platforms) identify the spatial locations of individual RNA molecules by sequential in situ hybridization (ISH) or in situ sequencing (ISS), for targeted panels of up to hundreds or thousands of genes. Since transcripts are individually identified, datasets arrive at subcellular spatial resolution.
Image segmentation is used to identify the boundaries of individual cells or nuclei, and assign RNA molecules to cells or nuclei during preprocessing. Segmentation into cells is challenging, especially due to overlapping cells (i.e. cells have 3-dimensional organization and the plane that a tissue section represents may have material from multiple cells at a given x/y location). After segmentation, gene counts may be aggregated to the cell level, or analyses may be performed directly at the molecule level. Cell-level analyses may re-use methods developed for spot-level spatial transcriptomics data or single-cell data.
The selection of targeted sets of biologically informative genes for an experiment, referred to as panel design, is a key consideration during experimental design Y. Zhang et al. (2024). Several commercially available options for targeted gene sets suitable for certain biological contexts are available.
2.4 Other variants of spatial omics data
Imaging-based proteomics, or more commonly called multiplexed imaging, represents a broad array of spatial detection technologies, the vast majority of which are antibody-based. Semba and Ishimoto (2024) categorize these antibody-based technologies into either “single-shot” (e.g. imaging mass cytometry; IMC) or “multicycle” (e.g. Lunaphore) imaging approaches. Single-shot refers to the set of, for example, heavy metal ions (representation protein presence), resulting from a laser ablation of a pre-stained sample. Multi-cycle approaches refer to sets of antibodies that are sequentially stained and stripped, with an imaging step in between at each cycle. Although technology variants are available, we mention a few specific approaches here. The two dominant single-shot spatial proteomics platforms are IMC and MIBIscope, with maximum pixel (or shot) resolution of 0.4 µm and 1 µm, respectively (Semba and Ishimoto 2024). Each platform can measure upwards of 40 channels (proteins).
Although the focus of the data analyses in this book will be primarily on gene and protein expression measurements in a spatial context, it is worth mentioning other modalities or data structures that are adjacent or emerging. We will not directly have data examples in the book, but some of the steps discussed in the chapters here may have applications to these other contexts.
Tissues are three-dimensional entities that are represented as 2D slices for many of the analyses conducted here. However, there will be various emerging datasets that have measurements made along a third dimension (Schott et al. 2024; Vickovic et al. 2022), whether this is directly measured (e.g. 3D imaging) or indirectly reconstructed from multiple per-slice measurements (Schott et al. 2025). In some cases, the analyses from this book with either still apply, or be applied successively on multiple slices.
Multi-omics datasets (e.g. RNA expression + protein expression, or RNA expression + chomatin accessibility) that are collected in a spatial context are now emerging (e.g. (Liu et al. 2023; D. Zhang et al. 2023)). Epigenomic modalities, in particular, may require alternative preprocessing steps, but some of the analyses mentioned in the chapters here could be reused (e.g. clustering given a low-dimensional embedding) or adapted.
Another emerging modality within a spatial context is the measurement of metabolites, lipids, or proteins via mass spectrometry imaging (MSI), such as MALDI-MSI (H. Zhang et al. 2024); these assays are sometimes known as imaging mass spectrometry. MSI typically involves coating tissues with a matrix layer that promotes the ionization of analytes of interest (e.g. glycans; (Palomino and Muddiman 2024)). Integration of MSI with other spatial modalities (e.g. to reveal cell types) may be required.