1  Introduction

1.1 Overview

This book provides discussion and interactive examples on best practices for computational analysis workflows for spatial transcriptomics data, using the Bioconductor framework within R.

1.2 Contents

The chapters are organized into several parts:

  • Introduction: introduction, background on spatial transcriptomics, and R/Bioconductor data classes

  • Analysis steps: individual analysis steps

  • Workflows: examples of complete analysis workflows

  • Appendix: related resources, acknowledgments, and references

1.3 Scope and who this book is for

The aim of this book is to demonstrate best practices for computational analysis workflows for spatial transcriptomics data through interactive examples and discussion using R code and example datasets. We assume some familiarity with R programming and an understanding of the types of biological questions that single-cell and spatial transcriptomics can be used to answer. Previous experience with Bioconductor is not required. By working through the examples in this book, readers will be able to adapt or extend the example workflows to analyze their own data.

In general, we focus on downstream analysis methods, which start with a gene expression count table and spatial coordinates as the main inputs. Preprocessing procedures to generate gene expression count tables are covered in other resources (e.g. our related book Visium Data Preprocessing for data from the 10x Genomics Visium platform) and tutorials provided by the manufacturers of the technological platforms. For molecule-based platforms, we will usually use data that has been aggregated to the cell level.

For analysis steps where a number of different methods are available, we will showcase examples of methods that we have found to work well and are computationally scalable, with a preference for methods available through Bioconductor.

All methods used in this book are available through Bioconductor or CRAN (in R) or PyPI (in Python) to ensure installability and long-term maintenance. We also mention a number of additional methods that are available as packages from GitHub, but these are not run within the code examples.

1.4 Bioconductor

Bioconductor is an ‘open source and open development’ project providing a cohesive and flexible framework for rigorous and reproducible analyses of high-throughput genomic data in R (Huber et al. 2015). Bioconductor provides access to more than 2,000 contributed R packages, as well as infrastructure maintained by the Bioconductor Core Team, providing a rich analysis environment for users.

A key strength of the Bioconductor framework is the modularity and open development philosophy. R packages are contributed by numerous research groups around the world, with the Bioconductor Core Team coordinating the overall project and maintaining infrastructure, build testing, and development guidelines. Contributed packages use consistent data structures, enabling users to connect packages developed by different groups to build analysis workflows that include the latest state-of-the-art methods. Bioconductor packages also include comprehensive documentation, including long-form tutorials or package vignettes.

1.5 Additional resources

For readers who are new to R and Bioconductor, additional useful resources include:

  • The Orchestrating Single-Cell Analysis with Bioconductor (OSCA) online book (Amezquita et al. 2020), which contains comprehensive materials on analysis workflows for non-spatial single-cell data as well as further introductory materials on R and Bioconductor.

  • The R for Data Science online book provides an excellent introduction to R.

  • Data Carpentry and Software Carpentry provide online lesson materials on R programming, the Unix shell, and version control.

  • The R/Bioconductor Data Science Team at the Lieber Institute for Brain Development has a detailed guide of free resources and videos to learn more about R and Bioconductor, as well as YouTube videos, including some on the basics of Bioconductor and infrastructure for storing gene expression data.

  • The Visium Data Preprocessing online book provides details on data preprocessing procedures for spatial transcriptomics data from the 10x Genomics Visium platform.

1.6 Contributions

We welcome suggestions for updates and additions to the book. Suggestions may be provided as GitHub issues for further discussion with the maintainers.

Note that all methods to be run within code examples must be available as either R packages from Bioconductor or CRAN, or Python packages from PyPI. This restriction provides readers with guarantees regarding installability, long-term availability, maintenance, and compatibility. Packages that are only available from GitHub or other repositories may be discussed within the text, but will not be included in the code examples. In general, our philosophy is to showcase examples of analysis methods that we have found to work well, are computationally scalable, user-friendly, and which can be integrated into existing Bioconductor-based workflows.