Spatial statistics builds around the first law of geography of Tobler (1970) that states “everything is related to everything else, but near things are more related than distant things”. This dependence of spatial observations has been studied in the field of (geo)spatial statistics. In general, spatial dependence is estimated by comparing the values at one location with the values at another location that is a given distance (‘spatial lag’) away (Dale and Fortin 2014; Baddeley, Rubak, and Turner 2015).
The two technological streams, imaging- and sequencing-based assays, are very different in terms of data modalities:
In imaging-based assays (e.g., CosMx and Xenium) the observations of interest (e.g., mRNAs) are recorded where they occur natively. This means the locations of the observations are governed by a stochastic process and can be approximated as a point process.
In sequencing-based assays (e.g., Visium), however, measurements are made along a defined grid, or lattice. This lattice is not created by a stochastic process and therefore cannot be approximated as a point process. Here, lattice data analysis methods have to be used. Notably, there are methods where cells can be segmented from very fine bins (e.g., Visium HD and IMC); these technologies can be approximated as being generated by a point process after segmentation (Emons et al. 2025; Baddeley, Rubak, and Turner 2015; Pebesma and Bivand 2023).
In the following vignette, the two main exploratory spatial statistics streams, point pattern analysis and lattice data analysis will be introduced.
Lattice based spatial methods rely on the concept of neighborhood. The neighborhood defines the spatial dependency between locations. Therefore, the first step of lattice based spatial analysis is the construction of a spatial weight matrix. Here we will use a nearest neighbor-based approach.
Spatial weight matrix
Different methods for the construction of the weight matrix exist, such as
contiguity-based neighbors (neighbors in direct contact),
A detailed overview can be found in the documentation of the spdep package.
As different weight matrices influence downstream results, analysts should justify their choice of the weight matrix. A more detailed overview can be found in Pebesma and Bivand (2023).
Code
# construct (K=6)NN-graph using cell centroidsknn6<-findSpatialNeighbors(sfe, type="centroids", method="knearneigh", k=6)colGraph(sfe, "knn6")<-knn6# visualize across tissue sectionplotColGraph(sfe, colGraphName="knn6", colGeometryName="centroids", segment_size=0.1, geometry_size=0.1)+theme_void()
26.2.3 Spatial autocorrelation
Spatial autocorrelation measures similarity between spatial units (e.g., cells) while recognize that the units are not independent due to their spatial context. Spatial autocorrelation metrics can be global (summarizing the entire study area) of view or local (provide statistic for each unit). In addition, there exist methods for univariate and multivariate comparisons of continuous and categorical data (Pebesma and Bivand 2023).
26.2.3.1 Univariate measures - Moran’s \(I\)
Moran’s \(I\) can be interpreted as the Pearson correlation between the value at a certain location and the average values of its neighbors. The global value is a weighted average of the respective local values (Moran 1950).
Code
# get gene probes & compute Moran's I for themidx<-rowData(sfe)$Type=="Gene Expression"length(geneProbes<-rowData(sfe)[idx, "Symbol"])
We can further visualize this using when calculating and plotting local Moran’s \(I\) values (Anselin 1995). The interpretation is analogue to the global counterpart. The higher the value, the more similar the expression among a cell’s neighbors. Negative values indicate local dissimilarity in expression.
During the interpretation of local autocorrelation measures, both the effect size (the value of the statistic) and the significance level should be considered. Because we calculate significance on each spot individually, values should be corrected for multiple testing.
Lee’s \(L\) combines the Pearson correlation coefficient and Moran’s \(I\)(Lee 2001). This unified metric allows to evaluate how two continuous variables are related while accounting for their spatial dependencies. We will calculate the global metric on the 20 most highly (non-spatial) variable features.
We will identify the gene pairs with highest Lee’s \(L\) value and then calculate and visualize the respective local measure.
Code
# select the 20 highest values in the res matrixval<-tail(sort(res), 20)[1]genePairs<-which(res>=val, arr.ind=TRUE)# keep only pairs containing different genesgenePairs<-genePairs[genePairs[,1]!=genePairs[,2], ]data.frame( val=res[genePairs], i=rownames(res)[genePairs[,1]], j=colnames(res)[genePairs[,2]])
Join count statistics can be used to quantify the arrangement of categorical marks on lattices. The join count statistic calculates how often a categorical mark appears next to itself or to another category within the pre-defined neighborhood. This values can then be compared against a theoretical or a permutation-based value test of the null hypothesis of random spatial allocation of the marks (Getis 2009).
Here, we will use the this to quantify the tendency of clusters (non-spatial) to co-localize in their relative neighborhood. Note that the output contains z-scores and no p-values such that a user-defined threshold can be applied.
Code
# code adapted from # https://robinsonlabuzh.github.io/# pasta/04-imaging-multivar-latSOD.html# dependencieslibrary(BiocNeighbors)library(BiocSingular)library(bluster)library(scater)# log-library size normalizationsfe<-logNormCounts(sfe)# set seed for random number generation# in order to make results reproducibleset.seed(123)# run PCA on the samplesfe<-runPCA(sfe, exprs_values="logcounts", ncomponents=50)# cluster based on first 10 PC's # using Leiden community detectionpcs<-reducedDim(sfe, "PCA")[, 1:10]params<-KNNGraphParam( k=20, cluster.fun="leiden", cluster.args=list( resolution=0.3, objective_function="modularity"))colData(sfe)$cluster<-clusterRows(pcs, BLUSPARAM=params)# visualize cluster assignmentsplotSpatialFeature(sfe, features="cluster", colGeometryName="centroids")+guides(col=guide_legend(override.aes=list(size=2)))
We note that the non-spatial cluster labels are of course most likely found next to each other. Apart from this obvious result, the first top non-self interaction is \(5:3\). Looking at the plot of the spatial distributions of the clusters these are two clusters in the ductal carcinoma regions of the tissue, the inner (blue) and the outer (green) lining.
26.3 Point Pattern Analysis
Point pattern analysis is a subfield of spatial statistics that focuses on the representation of spatial locations of events or objects as points (Baddeley, Rubak, and Turner 2015) (p. 3). There are two main ways how to summarize cells as points (Emons et al. 2025):
approximate the features (mRNAs) as points.
segment the cells and represent the centroids as points.
The central R package to perform point pattern analysis is called spatstat(Baddeley and Turner 2005). The package spatialFDA creates an interface between SpatialExperiment objects and the spatstat library for easy integration into analysis workflows.
In the following we will work with a Xenium dataset of breast cancer (Janesick et al. 2023). The representation of cells as points is done via cell centroids.
In the plot below, the centroids of the cells are attributed with a discrete cell type mark.
The first property to assess in a point pattern is the intensity. For a window \(W\) and points \(x\), the average intensity \(\bar{\lambda}\) is defined as the number of points \(n(x)\) divided by the area of the window \(|W|\):
\[
\bar{\lambda}=\frac{n(x)}{|W|}
\]
The intensity of the points can be uniform in space which is called homogeneous. If the intensity is not uniform in space, it is called inhomogeneous. This distinction has important implications for the choice of the spatial metrics. Most metrics have a correction for inhomogeneous intensity of points (Baddeley, Rubak, and Turner 2015) (p. 157 onwards).
In the plot above we see that the intensity of all points is not uniform. This inhomogeneity of points has to be taken into accounts when interpreting spatial statistics metrics. For example, an indication of clustering in a point pattern can be due solely to an inhomogeneity of points. This is called the confounding between intensity and interaction (Baddeley, Rubak, and Turner 2015) (p. 151 onwards).
26.3.2 Global Analysis
Global analyses summarize a statistic across the entire field of view. This means it reflects an average statistic and might not be reflective of local heterogeneities (Emons et al. 2025).
26.3.2.1 Correlation
One option in point pattern analysis is to analyse the correlation of marks. Like this, e.g., a clustering or spacing of cells can be determined relative to a completely spatially random (CSR) process.
Complete spatial randomness
Complete spatial randomness (CSR) is the null scenario for a point pattern. It is characterized by two key properties:
Homogeneity: The intensity of points is homogeneous in space.
Independence: The points in one region do not influence the distribution of points in another region.
Ripley’s \(K\) is a well established function to assess correlation in a point pattern. It can be calculated within a mark or across marks. In essence, Ripley’s \(K\) quantifies the average number of points that fall in a \(r\)-neighborhood of a chosen mark (Ripley 1976; Baddeley, Rubak, and Turner 2015) (p. 132 onwards).
We can plot Ripley’s \(K\) function not corrected for inhomogeneities of the chosen marks. Here, the diagonal is Ripley’s \(K\) function among the three cell types themselves and the off diagonal plots show the cross type combinations.
We note that all cell types, ductal carcinoma in situ 1 (DCIS 1), ductal carcinoma in situ 2 (DCIS 2), and invasive tumor cells show a clear interaction among themselves. DCIS 1 and invasive tumor cells show spacing, meaning there are fewer invasive tumor cells found in DCIS 1 than expect under complete spatial randomness (CSR). However, DCIS 2 and invasive tumor cells show a distribution that is completely spatially random in a \(500 µm\)\(r\)-neighborhood.
26.3.2.2 Spacing
A complementary approach to correlation analysis is spacing. There are three main distance types (Baddeley, Rubak, and Turner 2015) (p. 255):
pairwise distances: distances between all pairs of points
nearest-neighbor distances: distance to the nearest point of the query point
empty-space distances: distance from a reference location to the nearest point
26.3.2.2.1 Nearest-neighbor distance function \(G\)
The nearest neighbor function \(G\) quantifies the average nearest-neighbor distance over a radius range \(r\)(Baddeley, Rubak, and Turner 2015) (p. 262).
## [1] "Calculating Gcross from DCIS_1 to DCIS_1"
## [1] "Calculating Gcross from DCIS_2 to DCIS_1"
## [1] "Calculating Gcross from Invasive_Tumor to DCIS_1"
## [1] "Calculating Gcross from DCIS_1 to DCIS_2"
## [1] "Calculating Gcross from DCIS_2 to DCIS_2"
## [1] "Calculating Gcross from Invasive_Tumor to DCIS_2"
## [1] "Calculating Gcross from DCIS_1 to Invasive_Tumor"
## [1] "Calculating Gcross from DCIS_2 to Invasive_Tumor"
## [1] "Calculating Gcross from Invasive_Tumor to Invasive_Tumor"
In terms of spacing, the interpretation is a bit different. Still, the diagonal shows that all cell types are more clustered than expected if the patterns were completely spatially random. Comparing now DCIS 1 and DCIS 2 with invasive tumor cells, we see that both are more spaced than expect at random. However, the spacing is stronger for DCIS 1 and invasive tumor cells. At small radii, DCIS 2 and invasive tumor are distributed close to random.
The analysis with Ripley’s \(K\) and the \(G\) function are not contradictory. \(G\) functions summarize shorter scale interactions than \(K\) functions (Baddeley, Rubak, and Turner 2015) (p. 295).
26.3.3 Local Analysis
26.3.3.1 Local indicators of spatial association
The metrics shown before are an average over the entire window \(W\). Anselin (1995) proposed an alternative approach which is termed local indicators of spatial association (LISA). Instead of a global average LISA shows the local contributions of each point to the overall metric (Baddeley, Rubak, and Turner 2015) (p. 247). This is a general concept not only for point pattern analysis but for lattice data analysis as well (Anselin 1995, 2019).
Code
# subset to only Invasive tumor cells ppSub<-subset(pp, marks%in%"Invasive_Tumor")# restrict to a smaller window for computational reasonsWindow(ppSub)<-owin(c(3500, 7524.087), c(1200, 5475.691))# plot the point patternplot(ppSub)
The LISA Ripley’s \(K\) are colored by their value at \(r = 700\). We note that the LISA Ripley’s \(K\) gives two populations of curves. Those curves that increase at radii \(r<500 µm\) above the CSR line indicated in gray and the other curves that remain either below the gray CSR line or increase after \(r>500µm\).
If the values at \(r=700\) of the curves are projected back into the physical space, we note that the curves above the gray CSR line are the highly clustered cells in the bottom left. The other curves are the more spaced cells.
———. 2019. “A Local Indicator of Multivariate Spatial Association: Extending Geary’s c.”Geographical Analysis 51: 133–50. https://doi.org/10.1111/gean.12164.
Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with r. 1st Edition. Chapman; Hall/CRC. https://doi.org/10.1201/b19708.
Baddeley, Adrian, and Rolf Turner. 2005. “Spatstat: An r Package for Analyzing Spatial Point Patterns.”Journal of Statistical Software 12: 1–42. https://doi.org/10.18637/jss.v012.i06.
Dale, Mark R. T., and Marie-Josée Fortin. 2014. Spatial Analysis: A Guide for Ecologists. 2nd Edition. Cambridge University Press. https://doi.org/10.1017/CBO9780511978913.
Emons, Martin, Samuel Gunz, Helena L. Crowell, Izaskun Mallona, Reinhard Furrer, and Mark D. Robinson. 2025. “Harnessing the Potential of Spatial Statistics for Spatial Omics Data with Pasta.”Nucleic Acids Research 53 (17): gkaf870. https://doi.org/10.1038/s41467-022-28020-5.
Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, Florian Wagner, Stephen R. Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.”Nature Communications 14 (8353). https://doi.org/10.1038/s41467-023-43458-x.
Lee, Sang-Il. 2001. “Developing a Bivariate Spatial Association Measure: An Integration of Pearson’s r and Moran’s i.”Journal of Geographical Systems 3: 369–85. https://doi.org/10.1007/s101090100064.
Moses, Lambda, Pétur Helgi Einarsson, Kayla Jackson, Laura Luebbert, A. Sina Booeshaghi, Sindri Antonsson, Nicolas Bray, Páll Melsted, and Lior Pachter. 2023. “Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial Statistics.”bioRxiv. https://doi.org/10.1101/2023.07.20.549945.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in r. 1st Edition. Chapman; Hall/CRC. https://doi.org/10.1201/9780429459016.
Righelli, Dario, Lukas M Weber, Helena L Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T L Lun, Stephanie C Hicks, and Davide Risso. 2022. “SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in r Using Bioconductor.”Bioinformatics 38: 3128–31. https://doi.org/10.1093/bioinformatics/btac299.
Ripley, B. D. 1976. “The Second-Order Analysis of Stationary Point Processes.”Journal of Applied Probability 13: 255–66. https://doi.org/10.2307/3212829.
Tobler, W. R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.”Economic Geography 46: 234–40. https://doi.org/10.2307/143141.