3  Interoperability

3.1 Introduction

Ideally, users would be able to leverage the full range of available tools for their analysis. Methods should be selected based on scientific merit (as demonstrated by third-party benchmarks), and independent of having been implemented in a given programming language (R or Python) or framework (say, Bioconductor or Seurat).

For single-cell and spatial omics data analysis, being able to leverage different ecosystems is especially powerful. On the one hand, Python offers superb infrastructure for image analysis and machine learning-based approaches. On the other hand, the R programming language has been historically dedicated to statistical computing; as a result, many a modern R method for spatial omics data builds on a solid foundation of tools for spatial statistics and statistical modeling in general.

R’s application to spatial data dates back decades, primarily in epidemiological and geospatial research. As a result, various tools for spatial analyses have been established; for example, sp provides a “coherent set of classes and methods for […] points, lines, polygons, and grids”; spatstat and, more recently, sf provide tools for spatial point pattern and vector data, respectively.

Different data structures, although standardized within a given framework, make switching between languages and tools especially cumbersome. In the realm of single-cell and spatial omics, Bioconductor tools build around SummarizedExperiment-derived classes, while Seurat, Giotto and VoltRon rely on objects built in-house; and Python’s scanpy (Wolf, Angerer, and Theis 2018) and squidpy (Palla et al. 2022) use AnnData. Attempts to alleviate the problem are being made; e.g. zellkonverter and functions from Seurat allow for conversion between Python’s AnnData and R/Bioconductor’s SingleCellExperiment.

On a higher level, tools that enable interoperability between programming languages have become available. For example, the reticulate package provides an R interface to Python, including support to translate between objects from both languages; basilisk facilitates Python environment management within the Bioconductor ecosystems, but can be interfaced with reticulate as well; and, Quarto can generate dynamic reports from code in different languages.

Quarto is the successor of R Markdown by Posit (formerly known as RStudio). Similar to .rmd files, .qmd files can include scientific content (e.g. cross-referencing, LaTeX-based equations), and can be published in multiple output formats (HTML, PDF, etc.). – This book uses Quarto.

3.2 Example

3.2.1 Calling Python

Say we have created a conda environment; e.g.:

# from the command line
conda create -n scvi python=3.12
conda activate scvi
conda install scvi-tools -c conda-forge

We can then point reticulate to said environment via:

library(reticulate)
bin <- "~/software/mambaforge/bin/conda"
options(reticulate.conda_binary=bin)
use_condaenv("scvi")

R commands will now be reticulated in the above environment; e.g.:

Note that running code in this way comes with the overhead of starting up a Python session in the background. But this is typically negligible compared to the runtime required to run computation-heavy methods, or when analyzing large-scale single-cell and spatial data (100,000s of cells).
scvi <- import("scvi")
dir.create(td <- tempfile())
(ad <- scvi$data$cortex(save_path=td))
# AnnData object with n_obs × n_vars = 3005 × 19972
#     obs: 'labels', 'precise_labels', 'cell_type'

3.2.2 Continuing in R

We can access any of the variables above in R. For basic outputs, this works out of the box:

unique(ad$obs$cell_type)
# [1] "interneurons"         "pyramidal SS" "pyramidal CA1"       
# [4] "oligodendrocytes"     "microglia"    "endothelial-mural"   
# [7] "astrocytes_ependymal"

Then again, reticulate supports few direct type conversions (e.g. dictionary \(\leftrightarrow\) named list). In the example demonstrated here, we can use zellkonverter to go from AnnData to SingleCellExperiment:

library(zellkonverter)
(sce <- AnnData2SCE(ad))
# class: SingleCellExperiment 
# dim: 19972 3005 
# metadata(0):
# assays(1): X
# rownames(19972): Tspan12 Tshz1 ... Gm21943_loc3 Gm20738_loc3
# rowData names(0):
# colnames(3005): 0 1 ... 3003 3004
# colData names(3): labels precise_labels cell_type
# reducedDimNames(0):
# mainExpName: NULL
# altExpNames(0):

3.2.3 Back to Python

We can also do the reverse, i.e. go from R’s SingleCellExperiment to Python’s AnnData:

(ad <- SCE2AnnData(sce, X_name="X"))
# AnnData object with n_obs × n_vars = 3005 × 19972
#     obs: 'labels', 'precise_labels', 'cell_type'
#     uns: 'X_name'

3.3 Appendix

References

Palla, Giovanni, Hannah Spitzer, Michal Klein, David Fischer, Anna Christina Schaar, Louis Benedikt Kuemmerle, Sergei Rybakov, et al. 2022. “Squidpy: A Scalable Framework for Spatial Omics Analysis.” Nature Methods 19 (2): 171–78.
Wolf, F Alexander, Philipp Angerer, and Fabian J Theis. 2018. SCANPY: Large-Scale Single-Cell Gene Expression Data Analysis.” Genome Biology 19 (1): 15.
Back to top