31 Differential colocalization
31.1 Preamble
31.1.1 Introduction
In this section, we will explore tools to assess differential colocalization between conditions with the two packages spicyR and spatialFDA. Both packages are based the comparison of known spatial statistics functions implemented in the package spatstat.
31.1.2 Dependencies
31.1.3 Load data
For this example, we use a dataset on type I diabetes (T1D) (Damond et al. 2019). This dataset contains three conditions (non-diabetic, onset, and long-duration diabetes), several patients per condition, and several fields-of-view per patient. This hierarchical setup requires the modeler to account for correlation between fields-of-view, for example if they stem from the same patient. Such a setup can be modeled with mixed models (see below).
An assay-free version of the dataset is available through our OSF repository (used here to reduce runtime as the following analyses do not require assay data):
Code
# load data from OSF repository (for smaller download)
osf_repo <- osf_retrieve_node("https://osf.io/5n4q3/")
osf_files <- osf_ls_files(osf_repo,
path = "zzz", n_max = Inf,
pattern = "Damond_noAssays.rds")
dir.create(td <- tempfile())
osf_download(osf_files, path = td)
## # A tibble: 1 × 4
## name id local_path meta
## <chr> <chr> <chr> <list>
## 1 IMC_HumanPancreas_Damond_noAs… 682e2ea1abc2a7… /tmp/RtmpXk4Sn… <named list>
Code
spe <- readRDS(file.path(td, list.files(td, ".rds")))
# subset patient IDs
(spe <- spe[, spe$patient_id %in% c(6089, 6180, 6126, 6134, 6228, 6414)])
## class: SpatialExperiment
## dim: 38 881544
## metadata(0):
## assays(0):
## rownames(38): H3 SMA ... DNA1 DNA2
## rowData names(6): channel metal ... antibody_clone full_name
## colnames(881544): 138_1 138_2 ... 566_1171 566_1172
## colData names(29): cell_id image_name ... patient_BMI sample_id
## reducedDimNames(0):
## mainExpName: Damond_2019_Pancreas_FULL_v1
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(0):
31.2 spicyR
First, we show the functionality of spicyR. This package condenses the information of a spatial statistics curve to a scalar value and models this continuous value as the response in a linear (mixed) effects model (Canete et al. 2022).
By default, spicyR chooses the first level of the condition
factor as the base and the second as the comparison condition.
Code
## intercept coefficient p.value adj.pvalue from to
## beta__Tc -18.57376350 25.785860 1.197303e-24 1.767853e-22 beta Tc
## Tc__beta -17.80888505 25.752245 1.381135e-24 1.767853e-22 Tc beta
## Tc__alpha -15.18135600 18.175035 1.192449e-19 1.017557e-17 Tc alpha
## alpha__Tc -15.96670406 18.023449 4.128542e-19 2.642267e-17 alpha Tc
## Tc__delta -17.78429189 20.140818 2.671465e-17 1.310140e-15 Tc delta
## Tc__acinar 0.05078239 -3.476235 3.070640e-17 1.310140e-15 Tc acinar
The results can be represented as a bubble plot using the signifPlot
function.
Code
signifPlot(spicyTestPair)
Here, we can observe that the most significant relationship occurs between cytotoxic T cells (Tc) and the beta cells of the pancreas. Tc and beta cells appear to be significantly more co-localized in patients with onset diabetes compared to non-diabetic patients, indicated by the positive coefficient.
We can use spicyBoxPlot
to examine a cell type-cell type relationship closely.
Code
spicyBoxPlot(spicyTestPair, from = "Tc", to = "beta")
We can investigate individual images to ensure our results reflect biologically meaningful relationships. We can extract the individual co-localization metrics from the spicyTest
object using the bind
function.
Code
# extract co-localization metrics
cd <- colData(spe) |>
as.data.frame() |>
distinct(patient_id, patient_stage, image_name) |>
dplyr::rename(imageID=image_name)
bind(spicyTest) |>
# merge with metadata
merge(cd, by = "imageID", all = FALSE) |>
# select columns of interest
dplyr::select(c(imageID, Tc__beta, patient_id, patient_stage)) |>
dplyr::filter(imageID %in% c("L29", "O02", "E17", "G17"))
## imageID Tc__beta patient_id patient_stage
## 1 E17 -86.217317 6126 Non-diabetic
## 2 G17 5.450118 6414 Onset
## 3 L29 -87.837132 6134 Non-diabetic
## 4 O02 18.710160 6228 Onset
We can then visualize the spatial relationship between a pair of cell types in the original image using the plotImage
function.
The image below (L29) comes from a non-diabetic patient. We can see that beta and Tc cells appear to be dispersed, which is reflected in the co-localization metric.
Code
# bug? setting 'cellType' doesn't work
spe$cellType <- spe$cell_type
plotImage(spe,
imageToPlot = "L29",
from = "Tc",
to = "beta",
imageID = "image_name")
Code
#cellType = "cell_type")
The image below (O02) comes from a patient with early onset diabetes. We can see that beta and Tc cells are far more co-localized, with Tc cells even infiltrating the pancreatic islets in some cases.
Code
plotImage(spe,
imageToPlot = "O02",
from = "Tc",
to = "beta",
imageID = "image_name")
Code
#cellType = "cell_type")
spicyR can also be used with custom distance metrics and to perform survival analysis, which are outlined in the spicyR vignette.
31.3 spatialFDA
The spatialFDA package computes spatial summaries as functions (e.g., Ripley’s K function) and compares them using tools from functional data analysis (FDA). We will consider the colocalization (spacing) of alpha cells in the islet and cytoxic T-cells, which are important in the destruction of islet cells in T1D (Damond et al. 2019).
This example is adapted from the spatialFDA vignette.
Code
metricRes <- calcMetricPerFov(
spe = spe,
selection = c("alpha", "Tc"),
subsetby = "image_name",
fun = "Gcross",
marks = "cell_type",
rSeq = seq(0, 50, length.out = 50),
by = c("patient_stage", "patient_id", "image_name"),
ncores = 1
)
## [1] "Calculating Gcross from alpha to Tc"
First, we will plot the functional box plot of the square-root transformed spatial statistics functions (Sun and Genton 2011).
Code
metricRes$ID <- with(metricRes, paste(sep="x",
patient_stage, patient_id, image_name))
# removing field of views that have as a curve only zeros;
# these are cases where there are no cells of one type
metricRes <- metricRes %>%
dplyr::group_by(ID) %>%
dplyr::filter(sum(rs) >= 1)
# sqrt transform response
metricRes$rs <- sqrt(metricRes$rs)
collector <- plotFbPlot(metricRes, "r", "rs", "patient_stage")
The black line indicates the median curve, the purple region is the 50% central region, and the red lines indicate outliers.
From these plots, we note that the functions from ‘Onset’ patients seem to qualitatively have a higher median curve than those from ‘Non-diabetic’ or ‘Long-duration’ patients.
Another exploratory technique is functional PCA (FPCA). FPCA is similar to standard PCA in that it decomposes the functional data into the main modes of variation. Here, we decompose into eigenfunctions (Goldsmith et al. 2024).
Code
# filter out all rows that have a constant zero part - all r < 10
metricRes <- metricRes %>% filter(r > 10)
# prepare data frame from calcMetricRes to be in the correct format for pffr
dat <- prepData(metricRes, "r", "rs")
# create meta info of the IDs
splitData <- dat$ID %>%
str_replace("-","_") %>%
str_split_fixed("x", 3) %>%
data.frame(stringsAsFactors = TRUE) %>%
setNames(c("condition", "patient_id", "imageId")) %>%
mutate(condition = relevel(condition,"Non_diabetic"))
dat <- cbind(dat, splitData)
# drop rows with NA
dat <- dat |> drop_na()
# calculate the fPCA
pca <- functionalPCA(dat = dat, r = unique(metricRes$r), pve = 0.995)
evalues <- pca$evalues
efunctions <- pca$efunctions
plotFpca(dat = dat, res = pca, colourby = "condition") +
labs(color = "condition")
From the FPCA, we note that the variability in curves from ‘Onset’ patients is larger than the other two groups.
Next, we will show how FDA techniques can be used to make inferences from the functional spatial summaries. We use a generalized additive mixed model with a functional response (Scheipl, Staicu, and Greven 2015; Scheipl, Gertheiss, and Greven 2016; Goldsmith et al. 2024). In such models, some parameters are functional and some are constants (see below).
Code
# create design matrix
mm <- model.matrix(~ condition, data = dat)
colnames(mm)[1] <- "Intercept"
head(mm)
## Intercept conditionLong_duration conditionOnset
## 1 1 1 0
## 2 1 1 0
## 3 1 1 0
## 4 1 1 0
## 5 1 1 0
## 6 1 1 0
Code
r <- unique(metricRes$r)
# fit the model
mdl <- functionalGam(
data = dat, x = r,
designmat = mm, weights = dat$npoints,
formula = formula(Y ~ 1 + conditionLong_duration +
conditionOnset + s(patient_id, bs = "re")),
family = mgcv::scat(link = "log"),
algorithm = "gam"
)
summary(mdl)
##
## Family: Scaled t(3.692,0.098)
## Link function: log
##
## Formula:
## Y ~ 1 + conditionLong_duration + conditionOnset + s(patient_id,
## bs = "re")
##
## Constant coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.523 0.169 -9.013 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Smooth terms & functional coefficients:
## edf Ref.df Chi.sq p-value
## Intercept(x) 15.78 19.000 611.095 < 2e-16 ***
## conditionLong_duration(x) 1.00 1.000 0.067 0.796154
## conditionOnset(x) 2.76 2.926 20.457 0.000316 ***
## s(patient_id) 12.64 27.000 844.721 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.675 Deviance explained = 57.9%
## -REML score = -7849 Scale est. = 1 n = 12720 (318 x 40)
## using seWithMean for s(x.vec) .
## using seWithMean for s(x.vec) .
## using seWithMean for s(x.vec) .
Code
patchwork::wrap_plots(plotLs, nrow = 3, axes = "collect")
With the functional GAM, we see not only the effect (e.g., increased colocalization of alpha and cytotoxic T cells in ‘Onset’ patients compared to ‘Non-diabetic’ paitents) but also the strength of the effect at each scale.
The Q-Q plot indicates a reasonable model fit, although not perfect.