6  Datasets

6.1 Introduction

Throughout this book, we will rely on a set of publicly available datasets that cover different sequencing- and imaging-based platforms, namely: Visium, Visium HD, Xenium (10x Genomics), and CosMx (NanoString).

6.2 Distribution

These datasets have been deposited in an Open Storage Framework (OSF) repository here, and can be easily queried and downloaded using functions from the osfr package. For convenience, we have implemented the OSTA.data package to:

  • list and retrieve datasets available through our OSF node
  • cache data as a .zip archive using BiocFileCache
  • expose logical scalars polygons and molecules to skip these data
Code
if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("estellad/OSTA.data")

The following datasets are currently available:

Code
##   [1] "Chromium_HumanBreast_Janesick" "Chromium_HumanColon_Oliveira" 
##   [3] "CosMx1k_MouseBrain1"           "CosMx1k_MouseBrain2"          
##   [5] "CosMx6k_HumanBrain"            "VisiumHD_HumanColon_Oliveira" 
##   [7] "Visium_HumanBreast_Janesick"   "Visium_HumanColon_Oliveira"   
##   [9] "Xenium_HumanBreast1_Janesick"  "Xenium_HumanColon_Oliveira"

6.3 Description

Below, we briefly summarize the characteristics of each dataset, and note across which parts of the book each dataset is being used.

6.3.1 HumanBreast_Janesick

In the underlying paper, the Xenium data (2 replicates) were accompanied by consecutive slices of Chromium and Visium data. Therefore, these replicates are expected to have nearly identical biological findings. By transferring Chromium cell type labels to spatial technologies, such as Visium (with full transcriptome) and Xenium (at single-cell resolution), we can combine analytical insights from different platforms.

6.3.2 HumanColon_Oliveira

In the underlying paper, there are both normal adjacent tissue (NAT) and colorectal carcinoma (CRC) samples from 5 patients. The Visium HD data (P2 CRC) were accompanied by consecutive slices of Chromium, Visium, and Xenium data. Therefore, we can jointly analyze these modalities.

  • Oliveira et al. (2024)
  • source: 10x Genomics
  • annotations: repository (Chromium & VisiumHD deconvolution)
  • Chromium
    • 18,082 genes x 279,609 cells
      • P2, P3, P5 NAT
      • P1-5 CRC
  • P2 CRC:
    • Visium
      • 18,085 genes x 4,269 spots
    • VisiumHD
      • 18,085 genes x
        • 8,731,400 bins (2um)
          • 545,913 bins (8um)
          • 137,051 bins (16um)
    • Xenium
      • 422 RNA targets x 340,837 cells

6.3.3 CosMx1k_MouseBrain1/2

There are two sections from the CosMx mouse brain sample, namely “coronal hemisphere” (sample 1) and “coronal hippocampus and cortex” (sample 2).

  • source: NanoString
  • 950 RNA targets x
    • 48,556 cells (coronal hemisphere)
    • 38,996 cells (coronal hippocampus and cortex)

6.3.4 CosMx6k_HumanBrain

The CosMx human prefrontal cortex sample has a larger gene panel of ~6,000 RNA targets.

  • source: NanoString
  • 6,278 RNA targets x 188,686 cells

6.4 Appendix

References

Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, Florian Wagner, Stephen R. Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.” Nature Communications 14 (1): 8353.
Oliveira, Michelli F., Juan P. Romero, Meii Chung, Stephen Williams, Andrew D. Gottscho, Anushka Gupta, Susan E. Pilipauskas, et al. 2024. “Characterization of Immune Cell Populations in the Tumor Microenvironment of Colorectal Cancer Using High Definition Spatial Profiling.” bioRxiv. https://doi.org/10.1101/2024.06.04.597233.
Back to top