23 Sub-cellular analysis

23.1 Preamble

23.1.1 Introduction

Sub-cellular analysis aims to identify intra-cellular compartmentalization of transcripts, e.g., in the nucleus vs cytoplasm of the cell. Sub-cellular data can capture real biology, and analysis of sub-cellular could help identify biological mechanisms involved in transcript localisation (Cassella and Ephrussi 2022). Variation in intra-cellular localisation of transcripts generates gene expression gradients within the cell that can affect biological processes, such as cell communication and post-transcriptional regulation.

23.1.2 Data structure

In spatially-resolved transcriptomics (SRT), sub-cellular analysis can be performed with molecule-resolved data generated by imaging-based SRT technologies, such as MERFISH and Xenium, where targeted probes are used to capture specific transcripts and their locations. High-resolution sequencing-based SRT data, such as VisiumHD, have also been employed for sub-cellular analysis (Novoselsky et al. 2025).

Cell segmentation is required, prior to sub-cellular analysis, to generate cell boundaries within which the transcripts can be compartmentalized and assessed. MoleculeExperiment (Peters Couto et al. 2023) was built for storing transcript locations and cell boundaries, and is compatible with raw data generated by most existing molecule-resolved SRT technologies. It contains additional functions, such as countMolecules(), for conversion of molecule-resolved information to cell-level expression by counting transcripts within a cell boundary.

23.2 Quantifying cell compartments

23.2.1 Nucleus versus cell

Many molecule-resolved SRT technologies, such as MERFISH, SeqFISH, Stereo-seq, etc., provide nucleus and cell level data, generally using DAPI images to generate nuclear masks. These masks can be used to identify transcripts present in the nucleus vs cytoplasm of a cell, and the sub-cellular data can be used to ask a number of research questions. For example, the transport dynamics (from nucleus to cytoplasm) and localisation of transcripts within cells, the impact of sub-cellular localisation of transcripts on cell function, and comparison of sub-cellular location of transcripts between cells from different conditions.

23.2.2 Bento

Bento (Mah et al. 2024) is a Python-based toolkit for sub-cellular analysis of molecule-resolved SRT data that takes transcript locations and boundaries (cell, nucleus, region of interest) as input and stores them in AnnData format for downstream analyses. Bento includes 3 main approaches for processing molecule-resolved data:

It generates spatial summary statistics for each gene-cell pair and feeds these into RNAforest model to predict transcript localisation pattern (cytoplasmic, nuclear, nuclear edge, cell edge, or none) in each cell.
It identifies transcript compartment (nucleus or cytoplasm) from nucleus and cell boundaries, generating a cell x gene x compartment tensor. RNAcoloc approach is then applied to use the tensor for assessing co-localisation of gene pairs in each compartment.
It generates RNAflux embeddings of local neighborhoods in each cell, which are used for unsupervised segmentation of sub-cellular domains.

Figure 23.1: Bento approaches for sub-cellular analysis include: A, RNAforest; B, RNAcoloc; C, RNAflux.

These approaches supplement other downstream analyses, such as differential expression analysis between compartments or sub-cellular domains or enrichment analysis of these compartments/domains.

23.2.3 SpatialFeatures

SpatialFeatures is a R package that uses MoleculeExperiment to store transcript location and boundaries (cells, nuclei, and/or regions of interest), and perform sub-cellular and extra-cellular analyses. SpatialFeatures run involves 3 main steps:

Generate new sub-cellular (sub-concentric, sub-sector) and extra-cellular (super-concentric, super-sector) boundaries (loadBoundaries()).

Figure 23.2: The boundary feature types in SpatialFeature can be sub-cellular (A, sub-concentric; B, sub-sector) or extra-cellular (C, super-concentric; D, super-sector)

Calculate entropy-based metrics (EntropyMatrix()) for each cell across the boundary feature type.
Combine these information into a SingleCellExperiment object (EntropySingleCellExperiment()) containing a cell x feature matrix assay for each boundary feature type (sub-concentric, sub-sector, super-concentric, super-sector).

These features can be used to cluster and identify cells with similar sub-cellular and/or extra-cellular expression patterns.

23.3 Factor modelling

23.3.1 FISHFactor

FISHFactor (Walter, Stegle, and Velten 2023) is another Python-based method for sub-cellular data analysis. It uses spatial Poisson point processes to model location of each transcript within each cell and spatially-aware Gaussian processes to identify sub-cellular localisation patterns.

Figure 23.3: FISHFactor method overview.

This approach focuses on transcript sub-cellular patterns or domains, which can be investigated further to gain biological insights (Walter, Stegle, and Velten 2023).

23.4 Subcellular clustering

23.4.1 ClusterMap

ClusterMap (He et al. 2021) is a Python-based method capable of segmentation-free spatial clustering of transcripts to multiple scales, thereby identifying sub-cellular domains that might represent sub-cellular structures or cell bodies or cell-level clusters representing cell types and domains. It can also perform cell segmentation and sub-cellular compartmentalization using just transcript locations. Depending on the radius used to measure neighborhood size around a transcript, it can identify clusters at sub-cellular (sub-cellular domains/compartments), cell (cell types), or tissue (domain) level, and perform sub-cellular or cell segmentation.

Figure 23.4: Overview (a) and workflow (b) of ClusterMap method.

23.5 Testing for subcellular localisation and co-localisation

23.5.1 CellSP

CellSP (Aggarwal and Sinha 2025) is a Python-based workflow for identifying sub-cellular patterns of transcripts that can be further used to identify and characterize gene-cell modules. It provides tools for visualizing the gene-cell modules and examining their functional significance.

Figure 23.5: Overview of CellSP applicability to sub-cellular data (a). CellSP identifies sub-cellular patterns (b) for gene-cell module discovery (b). It also provides tools for visualization of gene-cell modules (d).

CellSP uses AnnData format for single cell and spatial transcriptomics analysis. Internally, CellSP uses other tools, such as MAGIC (Dijk et al. 2018) for denoising, Tangram (Biancalani et al. 2021) for imputation, InSTAnT (Kumar et al. 2024) for identifying gene-pair sub-cellular co-localisation patterns, and SPRAWL (Bierman et al. 2024) for identifying gene sub-cellular localisation patterns.

CellSP outputs a set of gene-cell modules for each sub-cellular pattern type (peripheral, radial, punctate, central, co-localisation). A gene-cell module represents a set of genes or gene pairs that have the same sub-cellular pattern across the same set of cells. The statistical significance of grouping is estimated using a Bonferroni-based score. In each module, genes and cells are characterized using gene ontology (GO) enrichment tests and cell type composition (if available), respectively.

Some statistical tests for gene(s) localisation/co-localisation that can be performed using CellSP include:

Testing per cell - does a cell have significant sub-cellular localisation of a set of genes or gene-pairs?
Testing across multiple cells - do cells belonging to a cell type/cluster have significant sub-cellular localisation of a gene or gene pair?
Testing across multiple groups/samples - are specific genes differentially localized/co-localized in two cell types/clusters from one sample or a cell type/cluster from two different conditions?

23.5.2 SpaGNN

SpaGNN (Fang et al. 2023) is another Python-based pipeline for sub-cellular analyses, including:

Sub-cellular clustering of transcript locations into sub-cellular patches with high transcript density using Leiden graph clustering.
Sub-cellular patch analysis, where transcripts of each gene in each patch are summed to generate a patch x gene counts data. This is used to calculate Pearson’s correlation between genes across all patches, to identify gene pairs that often co-localize. The statistical significance of these correlation values can be assessed using a t-test.
Sub-cellular local neighborhoods are detected within a patch by identifying 9 nearest neighbors of each transcript in the patch. Further analysis involves summing transcript counts in the local neighborhoods and calculating Pearson’s correlation between genes. Through permutation analysis, a proximity score is calculated for each gene pair in the patch.

Depending on availability of cell type data, additional questions can be asked. For example, how similar are the patch correlations between same/different cell types, or how consistent is the sub-cellular co-localisation of a gene pair in same/different cell types?

23.6 Considerations

23.6.1 2D versus 3D

A majority of the molecule-resolved SRT data are 2D projections of 3D structures, such as cells, organelles, or even processes like RNA transport. A few imaging-based SRT technologies are capable of measuring tissue depth as z-axis, generating 3D coordinates with a sparse z-axis. However, working with 2D vs 3D data is not so different, since spatial relations are often captured as neighborhoods defined by Euclidean or other distance measures between locations, irrespective of coordinate dimension. Therefore, many tools are able to use n-dimension coordinates for measuring spatial relationships. For example, ClusterMap can use 3D coordinates by using z_radius for z-axis data, if its available, whereas for 2D coordinates it sets z_radius = 0 (He et al. 2021).

References

Aggarwal, Bhavay, and Saurabh Sinha. 2025. “CellSP: Module Discovery and Visualization for Subcellular Spatial Transcriptomics Data.” bioRxiv, January. https://doi.org/10.1101/2025.01.12.632553.

Biancalani, Tommaso, Gabriele Scalia, Lorenzo Buffoni, Raghav Avasthi, Ziqing Lu, Aman Sanger, Neriman Tokcan, et al. 2021. “Deep Learning and Alignment of Spatially Resolved Single-Cell Transcriptomes with Tangram.” Nature Methods 18: 1352–62. https://doi.org/10.1038/s41592-021-01264-7.

Bierman, Rob, Jui M Dave, Daniel M Greif, and Julia Salzman. 2024. “Statistical Analysis Supports Pervasive RNA Subcellular Localization and Alternative 3’ UTR Regulation.” eLife 12 (December). https://doi.org/10.7554/elife.87517.

Cassella, Lucia, and Anne Ephrussi. 2022. “Subcellular Spatial Transcriptomics Identifies Three Mechanistically Different Classes of Localizing RNAs.” Nature Communications 13 (1). https://doi.org/10.1038/s41467-022-34004-2.

Dijk, David van, Roshan Sharma, Juozas Nainys, Kristina Yim, Pooja Kathail, Ambrose J. Carr, Cassandra Burdziak, et al. 2018. “Recovering Gene Interactions from Single-Cell Data Using Data Diffusion.” Cell 174 (3): 716–729.e27. https://doi.org/10.1016/j.cell.2018.05.061.

Fang, Zhou, Adam J. Ford, Thomas Hu, Nicholas Zhang, Athanasios Mantalaris, and Ahmet F. Coskun. 2023. “Subcellular Spatially Resolved Gene Neighborhood Networks in Single Cells.” Cell Reports Methods 3 (5): 100476. https://doi.org/10.1016/j.crmeth.2023.100476.

He, Yichun, Xin Tang, Jiahao Huang, Jingyi Ren, Haowen Zhou, Kevin Chen, Albert Liu, et al. 2021. “ClusterMap for Multi-Scale Clustering Analysis of Spatial Gene Expression.” Nature Communications 12 (1). https://doi.org/10.1038/s41467-021-26044-x.

Kumar, Anurendra, Alex W. Schrader, Bhavay Aggarwal, Ali Ebrahimpour Boroojeny, Marisa Asadian, JuYeon Lee, You Jin Song, Sihai Dave Zhao, Hee-Sun Han, and Saurabh Sinha. 2024. “Intracellular Spatial Transcriptomic Analysis Toolkit (InSTAnT).” Nature Communications 15 (1). https://doi.org/10.1038/s41467-024-49457-w.

Mah, Clarence K., Noorsher Ahmed, Nicole A. Lopez, Dylan C. Lam, Avery Pong, Alexander Monell, Colin Kern, et al. 2024. “Bento: A Toolkit for Subcellular Analysis of Spatial Transcriptomics Data.” Genome Biology 25 (1). https://doi.org/10.1186/s13059-024-03217-7.

Novoselsky, Roy, Ofra Golani, Tal Barkai, Merav Kedmi, Inna Goliand, Michal Fine, Ilan Kent, Ido Nachmany, and Shalev Itzkovitz. 2025. “Subcellular mRNA Localization Patterns Across Tissues Resolved with Spatial Transcriptomics.” bioRxiv, September. https://doi.org/10.1101/2025.09.07.674688.

Peters Couto, Bárbara Zita, Nicholas Robertson, Ellis Patrick, and Shila Ghazanfar. 2023. “MoleculeExperiment Enables Consistent Infrastructure for Molecule-Resolved Spatial Omics Data in Bioconductor.” Bioinformatics 39 (btad550, 9). https://doi.org/10.1093/bioinformatics/btad550.

Walter, Florin C, Oliver Stegle, and Britta Velten. 2023. “FISHFactor: A Probabilistic Factor Model for Spatial Transcriptomics Data with Subcellular Resolution.” Edited by Christina Kendziorski. Bioinformatics 39 (5). https://doi.org/10.1093/bioinformatics/btad183.