1 Introduction
1.1 Introduction
This book provides interactive examples and discussion on key principles of computational analysis workflows for spatial omics data using Bioconductor in R. The book contains chapters describing individual analysis steps as well as extended workflows, each with examples including R code and datasets. In a few cases, R code is integrated with Python tools.
1.2 Contents
The chapters are organized into several parts:
Introduction: introduction and background on spatial omics, data representations, and related R/Bioconductor infrastructure
Sequencing-based platforms: chapters on analysis steps and workflows for data from sequencing-based platforms
Imaging-based platforms: chapters on analysis steps and workflows for data from imaging-based platforms
Non-platform-specific analyses: chapters on analyses and workflows that are non-platform-specific, e.g. downstream analyses applicable to data from both types of platforms, or integrating data across platforms
Appendix: acknowledgments, related resources, and session information
1.3 Scope and who this book is for
The aim of this book is to demonstrate key principles of computational analysis workflows for spatial omics data through interactive examples and discussion, including example R code and datasets. We assume some familiarity with R programming and an understanding of the types of biological questions that single-cell and spatial omics can be used to answer. Previous experience with Bioconductor is not required.
The book covers both preprocessing and downstream analyses, so the starting point can be either raw spatial omics datasets, or processed spot/cell-level expression matrices and sets of spatial coordinates as the main inputs. Preprocessing procedures can vary from platform to platform and we aim to cover here some of the main pipelines for multiple platforms.
For most analysis steps, multiple methods are available to choose from. In general, we will showcase methods that we have found to work well and are computationally scalable, with a preference for methods available through Bioconductor. However, the book is not intended to provide a comprehensive listing of all available methods.
In the code examples, we will only include methods that are available through Bioconductor or CRAN (in R) or PyPI (in Python). This restriction helps ensure long-term stability and maintainability, enables regular testing via the Bioconductor build system, and makes it easier for readers to adapt the examples to integrate new methods or build extended Bioconductor-based workflows. Some methods that are currently only available from GitHub may also be included if these are currently in the process of submission to Bioconductor / CRAN / PyPI.
1.4 Bioconductor
Bioconductor is an “open source and open development” project providing a cohesive and flexible framework for rigorous and reproducible analyses of high-throughput genomic data in R (Huber et al. 2015). Bioconductor provides access to more than 2000 contributed R packages, as well as infrastructure maintained by the Bioconductor Core Team, providing a rich analysis environment for users.
A key strength of the Bioconductor framework is the modularity and open development philosophy. Packages are contributed by research groups around the world, with the Bioconductor Core Team coordinating the overall project and maintaining infrastructure, build testing, and development guidelines. Contributed packages use consistent data structures, enabling users to easily connect packages developed by different research groups to build analysis workflows that include the latest state-of-the-art methods. Bioconductor packages also include comprehensive documentation, including extended tutorials and package vignettes.
1.5 Additional introductory resources
For readers who are new to R and/or Bioconductor, additional useful resources include:
The Orchestrating Single-Cell Analysis with Bioconductor (OSCA) online book (Amezquita et al. 2020), which contains comprehensive materials on analysis workflows for non-spatial single-cell data, as well as further introductory materials on R and Bioconductor.
The R for Data Science online book, written by Hadley Wickham, provides an excellent introduction to R. He also makes other books available free, including ggplot2 and Advanced R.
Data Carpentry and Software Carpentry provide online lesson materials on R programming, the Unix shell, and version control.
The R/Bioconductor Data Science Team at the Lieber Institute for Brain Development has a detailed guide of free resources and videos to learn more about R and Bioconductor, as well as YouTube videos, including some of the basics of Bioconductor and infrastructure for storing gene expression data, but also a wide range of topics in genomics.
1.6 Feedback and contributions
We welcome feedback, suggestions, and contributions from readers in the research community. These may be provided as GitHub issues for further discussion with the developers.
Note that all methods used within code examples should be available as packages from either Bioconductor or CRAN (in R) or PyPI (in Python) to ensure long-term stability and maintainability, as discussed above.