5  Load data

5.1 Overview

In the following chapters, we apply analysis methods to spatial transcriptomics datasets that are formatted as SpatialExperiment objects or objects from other Bioconductor data classes (see Chapter 3).

Here, we load a 10x Genomics Visium dataset that will be used in several of the following chapters.

This dataset has previously been preprocessed using data preprocessing procedures with tools outside R and saved in SpatialExperiment format. For more details on data preprocessing procedures for the 10x Genomics Visium platform, see the related online book Visium Data Preprocessing.

This dataset is available for download in SpatialExperiment format from the STexampleData Bioconductor package.

5.2 Dataset

This dataset consists of one sample (Visium capture area) from one donor, consisting of postmortem human brain tissue from the dorsolateral prefrontal cortex (DLPFC) brain region, measured with the 10x Genomics Visium platform. The dataset is described in the original publication by Maynard et al. (2021).

More details on the dataset are also included in Chapter 17.

5.3 Load data

Download and load the dataset in SpatialExperiment format from the STexampleData Bioconductor package.


# load object
spe <- Visium_humanDLPFC()

5.4 SpatialExperiment object

Check the structure of the SpatialExperiment object. For more details on the SpatialExperiment structure, see Chapter 3.

# check object
class: SpatialExperiment 
dim: 33538 4992 
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
rowData names(3): gene_id gene_name feature_type
colData names(7): barcode_id sample_id ... ground_truth cell_count
mainExpName: NULL
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor
# number of genes (rows) and spots (columns)
[1] 33538  4992
# names of 'assays'
[1] "counts"
# row (gene) data
DataFrame with 6 rows and 3 columns
                        gene_id   gene_name    feature_type
                    <character> <character>     <character>
ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
ENSG00000237613 ENSG00000237613     FAM138A Gene Expression
ENSG00000186092 ENSG00000186092       OR4F5 Gene Expression
ENSG00000238009 ENSG00000238009  AL627309.1 Gene Expression
ENSG00000239945 ENSG00000239945  AL627309.3 Gene Expression
ENSG00000239906 ENSG00000239906  AL627309.2 Gene Expression
# column (spot) data
DataFrame with 6 rows and 7 columns
                           barcode_id     sample_id in_tissue array_row
                          <character>   <character> <integer> <integer>
AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673         0         0
AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673         1        50
AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673         1         3
AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673         1        59
AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673         1        14
AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673         1        43
                   array_col ground_truth cell_count
                   <integer>  <character>  <integer>
AAACAACGAATAGTTC-1        16           NA         NA
AAACAAGTATCTCCCA-1       102       Layer3          6
AAACAATCTACTAGCA-1        43       Layer1         16
AAACACCAATAACTGC-1        19           WM          5
AAACAGAGCGACTCCT-1        94       Layer3          2
AAACAGCTTTCAGAAG-1         9       Layer5          4
# spatial coordinates
                   pxl_col_in_fullres pxl_row_in_fullres
AAACAACGAATAGTTC-1               3913               2435
AAACAAGTATCTCCCA-1               9791               8468
AAACAATCTACTAGCA-1               5769               2807
AAACACCAATAACTGC-1               4068               9505
AAACAGAGCGACTCCT-1               9271               4151
AAACAGCTTTCAGAAG-1               3393               7583
# image data
DataFrame with 2 rows and 4 columns
      sample_id    image_id   data scaleFactor
    <character> <character> <list>   <numeric>
1 sample_151673      lowres   ####   0.0450045
2 sample_151673       hires   ####   0.1500150

5.5 Build object

Alternatively, we can also build a SpatialExperiment object directly from raw data.

Here, we provide a short example with an empty dataset.

For more details, including how to load raw data from the 10x Genomics Space Ranger output files to build an object, or how to add image data to the object, see the SpatialExperiment documentation.

# create data
n_genes <- 200
n_spots <- 100

counts <- matrix(0, nrow = n_genes, ncol = n_spots)

row_data <- DataFrame(
  gene_name = paste0("gene", sprintf("%03d", seq_len(n_genes)))

col_data <- DataFrame(
  sample_id = rep("sample01", n_spots)

spatial_coords <- matrix(0, nrow = n_spots, ncol = 2)
colnames(spatial_coords) <- c("x", "y")

# create SpatialExperiment object
spe <- SpatialExperiment(
  assays = list(counts = counts), 
  colData = col_data, 
  rowData = row_data, 
  spatialCoords = spatial_coords

5.6 Molecule-based data

For more details on data classes for molecule-based platforms, e.g. 10x Genomics Xenium or Vizgen MERSCOPE, see Chapter 3.