5 Load data
5.1 Introduction
The following chapters provide examples demonstrating methods for individual analysis steps for spatial transcriptomics data from sequencing-based platforms.
In these chapters, we assume the datasets are formatted as SpatialExperiment
objects (see Chapter 3).
Here, we load a 10x Genomics Visium dataset that will be used in several of the following chapters. This dataset has previously been preprocessed using data preprocessing procedures with tools outside R and saved in SpatialExperiment
format. (For more details on data preprocessing procedures for the Visium platform, see the related online book Visium Data Preprocessing, also listed in Section B.2.) This dataset is available for download in SpatialExperiment
format from the STexampleData Bioconductor package.
5.2 Dataset
This dataset consists of one sample (Visium capture area) from one donor, consisting of postmortem human brain tissue from the dorsolateral prefrontal cortex (DLPFC) brain region, measured with the 10x Genomics Visium platform. The dataset is described in Maynard et al. (2021).
More details on the dataset are also included in Chapter 15.
5.3 Load data
Download and load the dataset in SpatialExperiment
format from the STexampleData Bioconductor package.
# load object
spe <- Visium_humanDLPFC()
## see ?STexampleData and browseVignettes('STexampleData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
5.4 Save objects for later chapters
We also save the object(s) in .rds
format for re-use within later chapters to speed up the build time of the book.
# save object(s)
saveRDS(spe, file = "spe_load.rds")
5.5 SpatialExperiment object
Check the structure of the SpatialExperiment
object. For more details on the SpatialExperiment
structure, see Chapter 3.
# check object
spe
## class: SpatialExperiment
## dim: 33538 4992
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(8): barcode_id sample_id ... reference cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
# number of genes (rows) and spots (columns)
dim(spe)
## [1] 33538 4992
# names of 'assays'
assayNames(spe)
## [1] "counts"
# row (gene) data
head(rowData(spe))
## DataFrame with 6 rows and 3 columns
## gene_id gene_name feature_type
## <character> <character> <character>
## ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
## ENSG00000237613 ENSG00000237613 FAM138A Gene Expression
## ENSG00000186092 ENSG00000186092 OR4F5 Gene Expression
## ENSG00000238009 ENSG00000238009 AL627309.1 Gene Expression
## ENSG00000239945 ENSG00000239945 AL627309.3 Gene Expression
## ENSG00000239906 ENSG00000239906 AL627309.2 Gene Expression
# column (spot) data
head(colData(spe))
## DataFrame with 6 rows and 8 columns
## barcode_id sample_id in_tissue array_row
## <character> <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673 0 0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673 1 43
## array_col ground_truth reference cell_count
## <integer> <character> <character> <integer>
## AAACAACGAATAGTTC-1 16 NA NA NA
## AAACAAGTATCTCCCA-1 102 Layer3 Layer3 6
## AAACAATCTACTAGCA-1 43 Layer1 Layer1 16
## AAACACCAATAACTGC-1 19 WM WM 5
## AAACAGAGCGACTCCT-1 94 Layer3 Layer3 2
## AAACAGCTTTCAGAAG-1 9 Layer5 Layer5 4
# spatial coordinates
head(spatialCoords(spe))
## pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1 3913 2435
## AAACAAGTATCTCCCA-1 9791 8468
## AAACAATCTACTAGCA-1 5769 2807
## AAACACCAATAACTGC-1 4068 9505
## AAACAGAGCGACTCCT-1 9271 4151
## AAACAGCTTTCAGAAG-1 3393 7583
# image data
imgData(spe)
## DataFrame with 2 rows and 4 columns
## sample_id image_id data scaleFactor
## <character> <character> <list> <numeric>
## 1 sample_151673 lowres #### 0.0450045
## 2 sample_151673 hires #### 0.1500150
5.6 Build object
Alternatively, we can also build a SpatialExperiment
object directly from raw data.
Here, we provide a short example with an empty dataset.
For more details, including how to load raw data from the 10x Genomics Space Ranger output files to build an object, or how to add image data to the object, see the SpatialExperiment documentation.
# create data
n_genes <- 200
n_spots <- 100
counts <- matrix(0, nrow = n_genes, ncol = n_spots)
row_data <- DataFrame(
gene_name = paste0("gene", sprintf("%03d", seq_len(n_genes)))
)
col_data <- DataFrame(
sample_id = rep("sample01", n_spots)
)
spatial_coords <- matrix(0, nrow = n_spots, ncol = 2)
colnames(spatial_coords) <- c("x", "y")
# create SpatialExperiment object
spe <- SpatialExperiment(
assays = list(counts = counts),
colData = col_data,
rowData = row_data,
spatialCoords = spatial_coords
)