1 Introduction
1.1 Introduction
This book provides interactive examples and discussion on key principles of computational analysis workflows for spatial transcriptomics data using Bioconductor in R. The chapters describe individual analysis steps and extended workflows, including runnable examples with R code and datasets.
1.2 Contents
The chapters are organized into several parts:
Introduction: introduction, background on spatial transcriptomics, and R/Bioconductor data classes
Sequencing-based platforms: chapters on individual analysis steps and extended workflows for data from sequencing-based spatial transcriptomics platforms
Imaging-based platforms: chapters on individual analysis steps and extended workflows for data from imaging-based spatial transcriptomics platforms
Appendix: acknowledgments, related resources
1.3 Scope and who this book is for
The aim of this book is to demonstrate key principles of computational analysis workflows for spatial transcriptomics data through interactive examples and discussion, including example R code and datasets. We assume some familiarity with R programming and an understanding of the types of biological questions that single-cell and spatial transcriptomics can be used to answer. Previous experience with Bioconductor is not required.
The book focuses on downstream analyses, which start with a spot-level or cell-level gene expression count table and set of spatial coordinates as the main inputs. Preprocessing procedures to generate these inputs from the raw data for the various technological platforms are described in other resources (e.g. in our related book Visium Data Preprocessing for the 10x Genomics Visium platform) and tutorials provided by the platform manufacturers.
For most analysis steps, a number of methods are available to choose from. In general, we will showcase methods that we have found to work well and are computationally scalable, with a preference for methods available through Bioconductor. The book is not intended to provide a comprehensive listing of all available methods.
In the code examples, we will only include methods that are available through Bioconductor or CRAN (in R) or PyPI (in Python). This restriction helps ensure long-term stability and maintainability, enables regular testing via the Bioconductor build system, and makes it easier for readers to adapt the examples to integrate new methods or build extended Bioconductor-based workflows. Some methods that are available from GitHub or other sources will also be mentioned in the text.
1.4 Bioconductor
Bioconductor is an “open source and open development” project providing a cohesive and flexible framework for rigorous and reproducible analyses of high-throughput genomic data in R (Huber et al. 2015). Bioconductor provides access to more than 2000 contributed R packages, as well as infrastructure maintained by the Bioconductor Core Team, providing a rich analysis environment for users.
A key strength of the Bioconductor framework is the modularity and open development philosophy. Packages are contributed by research groups around the world, with the Bioconductor Core Team coordinating the overall project and maintaining infrastructure, build testing, and development guidelines. Contributed packages use consistent data structures, enabling users to easily connect packages developed by different research groups to build analysis workflows that include the latest state-of-the-art methods. Bioconductor packages also include comprehensive documentation, including extended tutorials and package vignettes.
1.5 Additional introductory resources
For readers who are new to R and/or Bioconductor, additional useful resources include:
The Orchestrating Single-Cell Analysis with Bioconductor (OSCA) online book (Amezquita et al. 2020), which contains comprehensive materials on analysis workflows for non-spatial single-cell data, as well as further introductory materials on R and Bioconductor.
The R for Data Science online book provides an excellent introduction to R.
Data Carpentry and Software Carpentry provide online lesson materials on R programming, the Unix shell, and version control.
The R/Bioconductor Data Science Team at the Lieber Institute for Brain Development has a detailed guide of free resources and videos to learn more about R and Bioconductor, as well as YouTube videos, including some on the basics of Bioconductor and infrastructure for storing gene expression data.
1.6 Feedback and contributions
We welcome feedback, suggestions, and contributions from readers in the research community. These may be provided as GitHub issues for further discussion with the developers.
Note that all methods used within code examples must be available as packages from either Bioconductor or CRAN (in R) or PyPI (in Python) to ensure long-term stability and maintainability (see Section 1.3).