library(SpatialExperiment)
library(STexampleData)
# load object
<- Visium_humanDLPFC() spe
Analysis steps
This part consists of several chapters for steps in a computational analysis pipeline for spatially-resolved transcriptomics (SRT) data. This includes quality control (QC), normalization, feature selection, dimensionality reduction, clustering, identifying marker genes, and spot-level deconvolution.
These steps require that the raw data has been loaded into R. In the previous part, we provide instructions and examples showing how to do this for the 10x Genomics Visium platform.
Throughout these chapters, we will rely on [Bioconductor data classes], especially SpatialExperiment
, to connect the steps in the pipeline. Following the Bioconductor principle of modularity, you can substitute alternative methods for individual steps if you prefer, as long as these can interface with the SpatialExperiment
structure.
Load data
In the following analysis chapters, we use a pre-prepared dataset where we have previously applied data preprocessing procedures (using tools outside R and Bioconductor) and saved the object in the SpatialExperiment
format. This is available from the STexampleData package.
The dataset consists of a single sample of human brain from the dorsolateral prefrontal cortex (DLPFC) region, measured using the 10x Genomics Visium platform, sourced from Maynard et al. (2021). The dataset is also described in more detail in [Visium human DLPFC workflow].
Here, we show how to load the data from the STexampleData
package.
SpatialExperiment object
Next, we inspect the SpatialExperiment
object. For more details, see [Bioconductor data classes].
# check object
spe
class: SpatialExperiment
dim: 33538 4992
metadata(0):
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
ENSG00000268674
rowData names(3): gene_id gene_name feature_type
colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
colData names(7): barcode_id sample_id ... ground_truth cell_count
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor
# number of features (rows) and spots (columns)
dim(spe)
[1] 33538 4992
# names of 'assay' tables
assayNames(spe)
[1] "counts"
# features metadata
head(rowData(spe))
DataFrame with 6 rows and 3 columns
gene_id gene_name feature_type
<character> <character> <character>
ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
ENSG00000237613 ENSG00000237613 FAM138A Gene Expression
ENSG00000186092 ENSG00000186092 OR4F5 Gene Expression
ENSG00000238009 ENSG00000238009 AL627309.1 Gene Expression
ENSG00000239945 ENSG00000239945 AL627309.3 Gene Expression
ENSG00000239906 ENSG00000239906 AL627309.2 Gene Expression
# spot-level metadata
head(colData(spe))
DataFrame with 6 rows and 7 columns
barcode_id sample_id in_tissue array_row
<character> <character> <integer> <integer>
AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673 0 0
AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673 1 43
array_col ground_truth cell_count
<integer> <character> <integer>
AAACAACGAATAGTTC-1 16 NA NA
AAACAAGTATCTCCCA-1 102 Layer3 6
AAACAATCTACTAGCA-1 43 Layer1 16
AAACACCAATAACTGC-1 19 WM 5
AAACAGAGCGACTCCT-1 94 Layer3 2
AAACAGCTTTCAGAAG-1 9 Layer5 4
# spatial coordinates
head(spatialCoords(spe))
pxl_col_in_fullres pxl_row_in_fullres
AAACAACGAATAGTTC-1 3913 2435
AAACAAGTATCTCCCA-1 9791 8468
AAACAATCTACTAGCA-1 5769 2807
AAACACCAATAACTGC-1 4068 9505
AAACAGAGCGACTCCT-1 9271 4151
AAACAGCTTTCAGAAG-1 3393 7583
# image metadata
imgData(spe)
DataFrame with 2 rows and 4 columns
sample_id image_id data scaleFactor
<character> <character> <list> <numeric>
1 sample_151673 lowres #### 0.0450045
2 sample_151673 hires #### 0.1500150