Non-spatial
As a baseline, we will perform principal component analysis (PCA), which underlies many standard scRNA-seq analysis pipelines, such as (spatially unaware) graph-based clustering based on a shared nearest neighbor (SNN) graph and the Leiden or Louvain algorithm for community detection.
A standard approach is to apply PCA to the set of top HVGs, and retain a subset of PCs for subsequent steps. This is done for two main reasons: (i) to reduce noise due to random variation in expression of biologically uninformative genes, which are assumed to have expression patterns independent of each other, and (ii) to improve computational efficiency. Because our data are relatively low-plex in this example, we use all features instead.
Here, we use the implementation of PCA provided in the scater package (McCarthy et al. 2017):
For large-scale datasets (100,000s of cells), argument BSPARAM=RandomParam()
can be used to decrease runtime by approximating the singular value decomposition (SVD). Because this implementation uses randomization, and seed for random number generation should be set in order to make results reproducible.
Spatially aware
BANKSY (Singhal et al. 2024) computes PCs on a spatial-neighborhood augmented matrix, thereby embedding cells in a product space of their own and their local neighborhood’s (average) transcriptome, representing cell state and microenvironment. A key parameter for this method is \(\lambda\in[0,1]\) (argument lambda
in runBanksyPCA()
), which controls the spatial component’s weight; notably, when \(\lambda=0\), BANKSY
reduces to non-spatial clustering. Secondly, k_geom
determines the number of neighbors to use for computing local transcriptomic neighborhoods.
In Visium, \(k=6\) would correspond to first-order, \(k=18\) to first- and second-order neighbors.
To not confuse different types of PCs, we rename the corresponding reducedDims
to end in _sp
and _tx
for spatially aware and unaware results, respectively.