Analysis steps

This part consists of several chapters for steps in a computational analysis pipeline for spatially-resolved transcriptomics (SRT) data. This includes quality control (QC), normalization, feature selection, dimensionality reduction, clustering, identifying marker genes, and spot-level deconvolution.

These steps require that the raw data has been loaded into R. In the previous part, we provide instructions and examples showing how to do this for the 10x Genomics Visium platform.

Throughout these chapters, we will rely on [Bioconductor data classes], especially SpatialExperiment, to connect the steps in the pipeline. Following the Bioconductor principle of modularity, you can substitute alternative methods for individual steps if you prefer, as long as these can interface with the SpatialExperiment structure.

Load data

In the following analysis chapters, we use a pre-prepared dataset where we have previously applied data preprocessing procedures (using tools outside R and Bioconductor) and saved the object in the SpatialExperiment format. This is available from the STexampleData package.

The dataset consists of a single sample of human brain from the dorsolateral prefrontal cortex (DLPFC) region, measured using the 10x Genomics Visium platform, sourced from Maynard et al. (2021). The dataset is also described in more detail in [Visium human DLPFC workflow].

Here, we show how to load the data from the STexampleData package.

library(SpatialExperiment)
library(STexampleData)

# load object
spe <- Visium_humanDLPFC()

SpatialExperiment object

Next, we inspect the SpatialExperiment object. For more details, see [Bioconductor data classes].

# check object
spe

class: SpatialExperiment 
dim: 33538 4992 
metadata(0):
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
  ENSG00000268674
rowData names(3): gene_id gene_name feature_type
colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
  TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
colData names(7): barcode_id sample_id ... ground_truth cell_count
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor

# number of features (rows) and spots (columns)
dim(spe)

[1] 33538  4992

# names of 'assay' tables
assayNames(spe)

[1] "counts"

# features metadata
head(rowData(spe))

DataFrame with 6 rows and 3 columns
                        gene_id   gene_name    feature_type
                    <character> <character>     <character>
ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
ENSG00000237613 ENSG00000237613     FAM138A Gene Expression
ENSG00000186092 ENSG00000186092       OR4F5 Gene Expression
ENSG00000238009 ENSG00000238009  AL627309.1 Gene Expression
ENSG00000239945 ENSG00000239945  AL627309.3 Gene Expression
ENSG00000239906 ENSG00000239906  AL627309.2 Gene Expression

# spot-level metadata
head(colData(spe))

DataFrame with 6 rows and 7 columns
                           barcode_id     sample_id in_tissue array_row
                          <character>   <character> <integer> <integer>
AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673         0         0
AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673         1        50
AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673         1         3
AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673         1        59
AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673         1        14
AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673         1        43
                   array_col ground_truth cell_count
                   <integer>  <character>  <integer>
AAACAACGAATAGTTC-1        16           NA         NA
AAACAAGTATCTCCCA-1       102       Layer3          6
AAACAATCTACTAGCA-1        43       Layer1         16
AAACACCAATAACTGC-1        19           WM          5
AAACAGAGCGACTCCT-1        94       Layer3          2
AAACAGCTTTCAGAAG-1         9       Layer5          4

# spatial coordinates
head(spatialCoords(spe))

                   pxl_col_in_fullres pxl_row_in_fullres
AAACAACGAATAGTTC-1               3913               2435
AAACAAGTATCTCCCA-1               9791               8468
AAACAATCTACTAGCA-1               5769               2807
AAACACCAATAACTGC-1               4068               9505
AAACAGAGCGACTCCT-1               9271               4151
AAACAGCTTTCAGAAG-1               3393               7583

# image metadata
imgData(spe)

DataFrame with 2 rows and 4 columns
      sample_id    image_id   data scaleFactor
    <character> <character> <list>   <numeric>
1 sample_151673      lowres   ####   0.0450045
2 sample_151673       hires   ####   0.1500150