Custom spatial analyses may rely on identifying nearest neighbors (NNs) of cells. We recommend RANN for this purpose, which finds NNs in \(O(N\log N)\) time for \(N\) cells (c.f., conventional approaches would take \(O(N^2)\) time) by relying on a Approximate Near Neighbor (ANN) C++ library. Furthermore, there is support for exact, approximate, and fixed-radius searchers. The latter is of particular interest in biology; e.g., one might require \(k\)NNs to lie within a biologically sensible distance as to avoid consideration of cells that are far-off, especially in sparse regions or at tissue borders.
As a toy example, we here compute the \(k\)NNs between a pair of subpopulations, with and without thresholding on NN distances (searchtype="radius").
For the first approach, each cell will receive \(k\) neighbors exactly,
but these may lie within an arbitrary distance.
For the second approach, cells will receive \(\leq k\) neighbors,
depending on how many cells lie within a radius \(r\).
Code
library(RANN)k<-10# num. neighborsr<-50# dist. thresholdi<-spe$k==1# sourcej<-spe$k==4# targetxy<-spatialCoords(spe)# k-NN search: all cells have k neighborsns_k<-nn2(xy[j, ], xy[i, ], k=k)is_k<-ns_k$nn.idxall(rowSums(is_k>0)==k)# w/ fixed-radius: cells have 0-k neighborsns_r<-nn2(xy[j, ], xy[i, ], k=k, searchtype="radius", r=r)is_r<-ns_r$nn.idxrange(rowSums(is_r>0))
## [1] TRUE
## [1] 0 10
The neighbors obtained via fixed-radius search (right) are less scattered than those obtained for unlimited distances (left); the former are arguably more meaningful in a biological context:
Results of kNN searches. Left: basic kNN search, highlighting source (pink) and target cells (gold). Right: kNN search with same k, but thresholding on neighbor distances.
exhaustive fixed-radius search
Note that, we could also set a very large k in order to identify all neighbors within a radius r. In order to prevent unnecessarily costly searches, it is sensible to estimate how many neighbors we would expect, and to set k accordingly. nn2() will otherwise find each cells’ \(k\)NNs, and set the indices of those with a distance \(>r\) to 0.
As an exemplary approach, we here sample 1,000 cells to estimate the highest number of NNs obtained, considering half of all target cells as potential NNs:
Code
# test search.i<-sample(which(i), 1e3)ns<-nn2(xy[j, ], xy[.i, ], k=round(sum(j)/2), searchtype="radius", r=r)(.k<-max(rowSums(ns$nn.idx>0)))
## [1] 14
For our actual search, we then set k to be twice our estimate. As a final spot-check, we make sure that all cells have fewer than k NNs, since we might otherwise be missing some.
Code
# real searchns<-nn2(xy[j, ], xy[i, ], k=k<-ceiling(2*.k), searchtype="radius", r=r)max(rowSums(ns$nn.idx>0))<k
## [1] TRUE
19.3 Spatial contexts
Spatial niche analysis aims at identifying regions of homogeneous composition by grouping cells based on their microenvironment. To this end, methods such as imcRtools(Windhager et al. 2023) rely on a \(k\)-nearest-neighbor (\(k\)NN) graph (based on Euclidean cell-to-cell distances), and clustering cells using common clustering algorithms (according to their neighborhood’s subpopulation frequencies).
Here, we demonstrate how to identify spatial contexts based on \(k\)-means clustering on cluster frequencies among (Euclidean) \(k\)NNs. We recommend readers consult imcRtools’ documentation for a much wider range of visualizations and downstream analyses in this context.
Code
library(imcRtools)# construct kNN-graph based on Euclidean distancessqe<-buildSpatialGraph(spe, coords=spatialCoordsNames(spe), img_id="sample_id", type="knn", k=10)# compute cluster frequencies among each cell's kNNssqe<-aggregateNeighbors(sqe, colPairName="knn_interaction_graph", aggregate_by="metadata", count_by="k")# view composition of 1st cell's kNNsunlist(sqe$aggregatedNeighbors[1, ])
Tissue plot with cells colored by cluster (left) and spatial context (right) based on \(k\)-means clustering of cluster frequencies among each cell’s (Euclidean) \(k\)NNs.
19.4 Co-localization
hoodscanR(Liu et al. 2024) also relies on a (Euclidean) \(k\)NN graph to estimate the probability of each cell associating with its NNs. The resulting probability matrix (rows=cells, columns=NNs) can, in turn, be used to assess co-occurrence of subpopulations.
To perform neighborhood co-localization analysis, plotColocal() computes the Pearson correlation of probability distribution between cells. Here, high/low values indicate attraction/repulsion between clusters:
Downstream, calcMetrics() can be used to calculate cell-level entropy and perplexity, which both measure the mixing of cellular neighborhoods. Here, low/high values indicate heterogeneity/homogeneity of a cell’s local neighborhood:
Tissue plots with cells colored by entropy and perplexity, z-scaled across cells (capped at 2 SDs).
Stratifying these values by subpopulation, we can observe that clusters forming distinct aggregates in space are lowest in entropy/perplexity (i.e., the most homogeneous locally):