Introduction
For the examples in this book, we will rely on a set of publicly available datasets that cover different sequencing-based and imaging-based platforms, namely: Visium, Visium HD, Xenium (10x Genomics), and CosMx (NanoString).
This chapter provides an overview of the example datasets used in the code examples in the later chapters.
Distribution
OSF repository and OSTA.data
These datasets have been deposited in an Open Storage Framework (OSF) repository here, and can be easily queried and downloaded using functions from the osfr package. For convenience, we have implemented the OSTA.data package to:
- list and retrieve datasets available through our OSF node
- cache data as a .zip archive using BiocFileCache
- expose logical scalars
pol
ygons and mol
ecules to skip these data
The following datasets are currently available:
Code
## [1] "Chromium_HumanBreast_Janesick" "Chromium_HumanColon_Oliveira"
## [3] "CosMx1k_MouseBrain1" "CosMx1k_MouseBrain2"
## [5] "CosMx6k_HumanBrain" "VisiumHD_HumanColon_Oliveira"
## [7] "Visium_HumanBreast_Janesick" "Visium_HumanColon_Oliveira"
## [9] "Xenium_HumanBreast1_Janesick" "Xenium_HumanColon_Oliveira"
STexampleData
In addition, several datasets are available from the STexampleData package as pre-formatted SpatialExperiment
and SingleCellExperiment
formats. These data objects are stored on Bioconductor’s ExperimentHub resource, and can be loaded within R code examples by querying ExperimentHub
or using loader functions provided in the STexampleData
package.
Datasets
Below, we briefly summarize the characteristics of each dataset, and note across which parts of the book each dataset is being used.
HumanBreast_Janesick
In the underlying paper, the Xenium data (2 replicates) were accompanied by consecutive slices of Chromium and Visium data. Therefore, these replicates are expected to have nearly identical biological findings. By transferring Chromium cell type labels to spatial technologies, such as Visium (with full transcriptome) and Xenium (at single-cell resolution), we can combine analytical insights from different platforms.
HumanColon_Oliveira
In the underlying paper, there are both normal adjacent tissue (NAT) and colorectal carcinoma (CRC) samples from 5 patients. The Visium HD data (P2 CRC) were accompanied by consecutive slices of Chromium, Visium, and Xenium data. Therefore, we can jointly analyze these modalities.
- Oliveira et al. (2024)
- source: 10x Genomics
- annotations: repository (Chromium & VisiumHD deconvolution)
-
Chromium
- 18,082 genes x 279,609 cells
- P2 CRC:
-
Visium
- 18,085 genes x 4,269 spots
-
VisiumHD
- 18,085 genes x
- 8,731,400 bins (2um)
- 545,913 bins (8um)
- 137,051 bins (16um)
-
Xenium
- 422 RNA targets x 340,837 cells
CosMx1k_MouseBrain1/2
There are two sections from the CosMx mouse brain sample, namely “coronal hemisphere” (sample 1) and “coronal hippocampus and cortex” (sample 2).
- source: NanoString
- 950 RNA targets x
- 48,556 cells (coronal hemisphere)
- 38,996 cells (coronal hippocampus and cortex)
CosMx6k_HumanBrain
The CosMx human prefrontal cortex sample has a larger gene panel of ~6,000 RNA targets.
- source: NanoString
- 6,278 RNA targets x 188,686 cells
Appendix
References
Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, Florian Wagner, Stephen R. Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.” Nature Communications 14 (1): 8353.
Oliveira, Michelli F., Juan P. Romero, Meii Chung, Stephen Williams, Andrew D. Gottscho, Anushka Gupta, Susan E. Pilipauskas, et al. 2024.
“Characterization of Immune Cell Populations in the Tumor Microenvironment of Colorectal Cancer Using High Definition Spatial Profiling.” bioRxiv.
https://doi.org/10.1101/2024.06.04.597233.
Back to top