9  Dimensionality reduction

9.1 Overview

In this chapter, we apply dimensionality reduction methods to visualize the data and to generate inputs for further downstream analyses.

9.2 Load data from previous steps

We start by loading the data object(s) saved after running the analysis steps from the previous chapters. Code to re-run the previous steps is shown in condensed form in Chapter 4.

library(SpatialExperiment)
library(here)
spe <- readRDS(here("outputs/spe_hvgs.rds"))
top_hvgs <- readRDS(here("outputs/top_hvgs.rds"))

9.3 Principal component analysis (PCA)

Apply principal component analysis (PCA) to the set of top highly variable genes (HVGs) to reduce the dimensionality of the dataset, and retain the top 50 principal components (PCs) for further downstream analyses.

This is done for two reasons: (i) to reduce noise due to random variation in expression of biologically uninteresting genes, which are assumed to have expression patterns that are independent of each other, and (ii) to improve computational efficiency during downstream analyses.

We use the computationally efficient implementation of PCA provided in the scater package (McCarthy et al. 2017). This implementation uses randomization, and therefore requires setting a random seed for reproducibility.

library(scater)
# compute PCA
set.seed(123)
spe <- runPCA(spe, subset_row = top_hvgs)

reducedDimNames(spe)
[1] "PCA"
dim(reducedDim(spe, "PCA"))
[1] 3524   50

9.4 Uniform Manifold Approximation and Projection (UMAP)

We also run UMAP on the set of top 50 PCs and retain the top 2 UMAP components, which will be used for visualization purposes.

# compute UMAP on top 50 PCs
set.seed(123)
spe <- runUMAP(spe, dimred = "PCA")

reducedDimNames(spe)
[1] "PCA"  "UMAP"
dim(reducedDim(spe, "UMAP"))
[1] 3524    2
# update column names for easier plotting
colnames(reducedDim(spe, "UMAP")) <- paste0("UMAP", 1:2)

9.5 Visualizations

Generate plots using plotting functions from the ggspavis package. In the next chapter on clustering, we will add cluster labels to these reduced dimension plots.

library(ggspavis)
# plot top 2 PCA dimensions
plotDimRed(spe, plot_type = "PCA")

# plot top 2 UMAP dimensions
plotDimRed(spe, plot_type = "UMAP")