28 Differential colocalization

28.1 Preamble

28.1.1 Introduction

In this section, we will explore tools to assess differential colocalization between conditions with the two packages spicyR and spatialFDA. Both packages are based the comparison of known spatial statistics functions implemented in the package spatstat.

28.1.2 Dependencies

Code

library(SpatialExperiment)
library(spatialFDA)
library(spicyR)
library(refund)
library(dplyr)
library(stringr)
library(tidyr)
library(osfr)
library(ggplot2)

28.1.3 Load data

For this example, we use a dataset on type I diabetes (T1D) (Damond et al. 2019). This dataset contains three conditions (non-diabetic, onset, and long-duration diabetes), several patients per condition, and several fields-of-view per patient. This hierarchical setup requires the modeler to account for correlation between fields-of-view, for example if they stem from the same patient. Such a setup can be modeled with mixed models (see below).

An assay-free version of the dataset is available through our OSF repository (used here to reduce runtime as the following analyses do not require assay data):

Code

# load data from OSF repository (for smaller download)
osf_repo <- osf_retrieve_node("https://osf.io/5n4q3/")
osf_files <- osf_ls_files(osf_repo, 
    path = "zzz", n_max = Inf,
    pattern = "Damond_noAssays.rds")
dir.create(td <- tempfile())
osf_download(osf_files, path = td)

##  # A tibble: 1 × 4
##    name                           id              local_path      meta        
##    <chr>                          <chr>           <chr>           <list>      
##  1 IMC_HumanPancreas_Damond_noAs… 682e2ea1abc2a7… /tmp/RtmpbUS0F… <named list>

Code

spe <- readRDS(file.path(td, list.files(td, ".rds")))
# subset patient IDs
(spe <- spe[, spe$patient_id %in% c(6089, 6180, 6126, 6134, 6228, 6414)])

##  class: SpatialExperiment 
##  dim: 38 881544 
##  metadata(0):
##  assays(0):
##  rownames(38): H3 SMA ... DNA1 DNA2
##  rowData names(6): channel metal ... antibody_clone full_name
##  colnames(881544): 138_1 138_2 ... 566_1171 566_1172
##  colData names(29): cell_id image_name ... patient_BMI sample_id
##  reducedDimNames(0):
##  mainExpName: Damond_2019_Pancreas_FULL_v1
##  altExpNames(0):
##  spatialCoords names(2) : x y
##  imgData names(0):

28.2 spicyR

First, we show the functionality of spicyR. This package condenses the information of a spatial statistics curve to a scalar value and models this continuous value as the response in a linear (mixed) effects model (Canete et al. 2022).
By default, spicyR chooses the first level of the condition factor as the base and the second as the comparison condition.

Code

stages <- c("Non-diabetic", "Onset", "Long-duration")
spe$patient_stage <- factor(spe$patient_stage, stages)
spe$cell_type <- factor(spe$cell_type)

Code

# note runtime: ~5 min
spicyTestPair <- spicy(
  spe, 
  condition = "patient_stage", 
  imageID = "image_name", 
  cellType = "cell_type",
  window = "square", 
  cores = 4
)

##  Dropping unused levels. Using patient_stage = Non-diabetic as base
##  comparison group. If this is not the desired base group, please convert
##  cells$patient_stage into a factor and change the order of
##  levels(cells$patient_stage) so that the base group is at index 1.

##  Warning: The `BPPARAM` argument of `getPairwise()` is deprecated as of spicyR 1.18.0.
##  ℹ Please use the `cores` argument instead.
##  ℹ The deprecated feature was likely used in the spicyR package.
##    Please report the issue at <https://github.com/SydneyBioX/spicyR/issues>.

Code

out <- topPairs(spicyTestPair, n = 27)
head(out)

##                intercept coefficient      p.value   adj.pvalue  from     to
##  beta__Tc   -18.57376350   25.785860 1.197303e-24 1.767853e-22  beta     Tc
##  Tc__beta   -17.80888505   25.752245 1.381135e-24 1.767853e-22    Tc   beta
##  Tc__alpha  -15.18135600   18.175035 1.192449e-19 1.017557e-17    Tc  alpha
##  alpha__Tc  -15.96670406   18.023449 4.128542e-19 2.642267e-17 alpha     Tc
##  Tc__delta  -17.78429189   20.140818 2.671465e-17 1.310140e-15    Tc  delta
##  Tc__acinar   0.05078239   -3.476235 3.070640e-17 1.310140e-15    Tc acinar

The results can be represented as a bubble plot using the signifPlot function.

Code

signifPlot(spicyTestPair)

Here, we can observe that the most significant relationship occurs between cytotoxic T cells (Tc) and the beta cells of the pancreas. Tc and beta cells appear to be significantly more co-localized in patients with onset diabetes compared to non-diabetic patients, indicated by the positive coefficient.

We can use spicyBoxPlot to examine a cell type-cell type relationship closely.

Code

spicyBoxPlot(spicyTestPair, from = "Tc", to = "beta")

We can investigate individual images to ensure our results reflect biologically meaningful relationships. We can extract the individual co-localization metrics from the spicyTest object using the bind function.

Code

# extract co-localization metrics
cd <- colData(spe) |>
    as.data.frame() |> 
    distinct(patient_id, patient_stage, image_name) |>
    dplyr::rename(imageID=image_name)
bind(spicyTest) |> 
  # merge with metadata
  merge(cd, by = "imageID", all = FALSE) |>
  # select columns of interest
  dplyr::select(c(imageID, Tc__beta, patient_id, patient_stage)) |>
  dplyr::filter(imageID %in% c("L29", "O02", "E17", "G17"))

##    imageID   Tc__beta patient_id patient_stage
##  1     E17 -86.217317       6126  Non-diabetic
##  2     G17   5.450118       6414         Onset
##  3     L29 -87.837132       6134  Non-diabetic
##  4     O02  18.710160       6228         Onset

We can then visualize the spatial relationship between a pair of cell types in the original image using the plotImage function.

The image below (L29) comes from a non-diabetic patient. We can see that beta and Tc cells appear to be dispersed, which is reflected in the co-localization metric.

Code

# bug? setting 'cellType' doesn't work
spe$cellType <- spe$cell_type 
plotImage(spe,
          imageToPlot = "L29",
          from = "Tc",
          to = "beta",
          imageID = "image_name")

Code

          #cellType = "cell_type")

The image below (O02) comes from a patient with early onset diabetes. We can see that beta and Tc cells are far more co-localized, with Tc cells even infiltrating the pancreatic islets in some cases.

Code

plotImage(spe,
          imageToPlot = "O02",
          from = "Tc",
          to = "beta",
          imageID = "image_name")

Code

          #cellType = "cell_type")

spicyR can also be used with custom distance metrics and to perform survival analysis, which are outlined in the spicyR vignette.

28.3 spatialFDA

The spatialFDA package computes spatial summaries as functions (e.g., Ripley’s K function) and compares them using tools from functional data analysis (FDA). We will consider the colocalization (spacing) of alpha cells in the islet and cytoxic T-cells, which are important in the destruction of islet cells in T1D (Damond et al. 2019).

This example is adapted from the spatialFDA vignette.

Code

metricRes <- calcMetricPerFov(
  spe = spe, 
  selection = c("alpha", "Tc"), 
  subsetby = "image_name", 
  fun = "Gcross", 
  marks = "cell_type", 
  rSeq = seq(0, 50, length.out = 50), 
  by = c("patient_stage", "patient_id", "image_name"), 
  ncores = 1
)

##  [1] "Calculating Gcross from alpha to Tc"

First, we will plot the functional box plot of the square-root transformed spatial statistics functions (Sun and Genton 2011).

Code

metricRes$ID <- with(metricRes, paste(sep="x", 
    patient_stage, patient_id, image_name))

# removing field of views that have as a curve only zeros;
# these are cases where there are no cells of one type
metricRes <- metricRes %>% 
    dplyr::group_by(ID) %>% 
    dplyr::filter(sum(rs) >= 1)

# sqrt transform response
metricRes$rs <- sqrt(metricRes$rs)

collector <- plotFbPlot(metricRes, "r", "rs", "patient_stage")

The black line indicates the median curve, the purple region is the 50% central region, and the red lines indicate outliers.

From these plots, we note that the functions from ‘Onset’ patients seem to qualitatively have a higher median curve than those from ‘Non-diabetic’ or ‘Long-duration’ patients.

Another exploratory technique is functional PCA (FPCA). FPCA is similar to standard PCA in that it decomposes the functional data into the main modes of variation. Here, we decompose into eigenfunctions (Goldsmith et al. 2024).

Code

# filter out all rows that have a constant zero part - all r < 10
metricRes <- metricRes %>% filter(r > 10)

# prepare data frame from calcMetricRes to be in the correct format for pffr
dat <- prepData(metricRes, "r", "rs")

# create meta info of the IDs
splitData <- dat$ID %>% 
  str_replace("-","_") %>% 
  str_split_fixed("x", 3) %>% 
  data.frame(stringsAsFactors = TRUE) %>% 
  setNames(c("condition", "patient_id", "imageId")) %>% 
  mutate(condition = relevel(condition,"Non_diabetic"))

dat <- cbind(dat, splitData)

# drop rows with NA
dat <- dat |> drop_na()

# calculate the fPCA
pca <- functionalPCA(dat = dat, r = unique(metricRes$r), pve = 0.995)
evalues <- pca$evalues
efunctions <- pca$efunctions

plotFpca(dat = dat, res = pca, colourby = "condition") + 
  labs(color = "condition")

From the FPCA, we note that the variability in curves from ‘Onset’ patients is larger than the other two groups.

Next, we will show how FDA techniques can be used to make inferences from the functional spatial summaries. We use a generalized additive mixed model with a functional response (Scheipl, Staicu, and Greven 2015; Scheipl, Gertheiss, and Greven 2016; Goldsmith et al. 2024). In such models, some parameters are functional and some are constants (see below).

Code

# create design matrix
mm <- model.matrix(~ condition, data = dat)
colnames(mm)[1] <- "(Intercept)"
head(mm)

##    (Intercept) conditionLong_duration conditionOnset
##  1           1                      1              0
##  2           1                      1              0
##  3           1                      1              0
##  4           1                      1              0
##  5           1                      1              0
##  6           1                      1              0

Code

r <- unique(metricRes$r)

# fit the model
mdl <- functionalGam(
    data = dat, x = r, 
    designmat = mm, weights = dat$npoints, 
    formula = formula(Y ~ 1 + conditionLong_duration + 
            conditionOnset + s(patient_id, bs = "re")), 
    family = mgcv::scat(link = "log"), 
    algorithm = "gam"
)

summary(mdl)

##  
##  Family: Scaled t(3.692,0.098) 
##  Link function: log 
##  
##  Formula:
##  Y ~ 1 + conditionLong_duration + conditionOnset + s(patient_id, 
##      bs = "re")
##  
##  Constant coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
##  (Intercept)   -1.523      0.169  -9.013   <2e-16 ***
##  ---
##  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  
##  Smooth terms & functional coefficients:
##                              edf Ref.df  Chi.sq  p-value    
##  Intercept(x)              15.78 19.000 611.095  < 2e-16 ***
##  conditionLong_duration(x)  1.00  1.000   0.067 0.796154    
##  conditionOnset(x)          2.76  2.926  20.457 0.000316 ***
##  s(patient_id)             12.64 27.000 844.721  < 2e-16 ***
##  ---
##  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  
##  R-sq.(adj) =  0.675   Deviance explained = 57.9%
##  -REML score =  -7849  Scale est. = 1         n = 12720 (318 x 40)

Code

plotLs <- lapply(colnames(mm), plotMdl, mdl = mdl, 
                 shift = mdl$coefficients[["(Intercept)"]])

##  using seWithMean for  s(x.vec) .
##  using seWithMean for  s(x.vec) .
##  using seWithMean for  s(x.vec) .

Code

patchwork::wrap_plots(plotLs, nrow = 3, axes = "collect")

With the functional GAM, we see not only the effect (e.g., increased colocalization of alpha and cytotoxic T cells in ‘Onset’ patients compared to ‘Non-diabetic’ paitents) but also the strength of the effect at each scale.

The Q-Q plot indicates a reasonable model fit, although not perfect.

Code

qqnorm(resid(mdl), pch = 16)
qqline(resid(mdl))

28.4 Appendix

References

Canete, Nicolas P, Sourish S Iyengar, John T Ormerod, Heeva Baharlou, Andrew N Harman, and Ellis Patrick. 2022. “spicyR: Spatial Analysis of in Situ Cytometry Data in r.” Bioinformatics 38: 3099–3105. https://doi.org/10.1093/bioinformatics/btac268.

Damond, Nicolas, Stefanie Engler, Vito R. T. Zanotelli, Denis Schapiro, Clive H. Wasserfall, Irina Kusmartseva, Harry S. Nick, et al. 2019. “A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry.” Cell Metabolism 29: 755–68. https://doi.org/10.1016/j.cmet.2018.11.014.

Goldsmith, Jeff, Fabian Scheipl, Lei Huang, Julia Wrobel, Chongzhi Di, Jonathan Gellar, Jaroslaw Harezlak, et al. 2024. “Refund: Regression with Functional Data.” R Package. https://doi.org/10.32614/CRAN.package.refund.

Scheipl, Fabian, Jan Gertheiss, and Sonja Greven. 2016. “Generalized Functional Additive Mixed Models.” Electronic Journal of Statistics 10: 1455–92. https://doi.org/10.1214/16-EJS1145.

Scheipl, Fabian, Ana-Maria Staicu, and Sonja Greven. 2015. “Functional Additive Mixed Models.” Journal of Computational and Graphical Statistics 24: 477–501. https://doi.org/10.1080/10618600.2014.901914.

Sun, Ying, and Marc G. Genton. 2011. “Functional Boxplots.” Journal of Computational and Graphical Statistics 20: 316–34. https://doi.org/10.1198/jcgs.2011.09224.