32 Structure-based analysis

32.1 Preamble

32.1.1 Introduction

Spatial omics data further allow us to quantify various features that are related to tissue architecture. This allows us to perform comparisons of anatomical structure-derived features. We will refer to this type of analysis as structure-based analysis.

We will use the sosta package to reconstruct and quantify pancreatic islets from human donors suffering different stages of diabetic type 1 (T1D) and healthy controls (Damond et al. 2019). We will then compare some of the features between conditions.

32.1.2 Dependencies

Code

library(sosta)
library(ggplot2)
library(dplyr)
library(tidyr)
library(SpatialExperiment)
library(ExperimentHub)
library(lme4)
library(lmerTest)
library(ggfortify)
library(osfr)

Code

set.seed(8048)

32.1.3 Load data

An assay-free version of the dataset is available through our OSF repository (used here to reduce runtime as the following analyses do not require assay data):

Code

# load data from OSF repository (for smaller download)
osf_repo <- osf_retrieve_node("https://osf.io/5n4q3/")
osf_files <- osf_ls_files(osf_repo, 
    path = "zzz", n_max = Inf,
    pattern = "Damond_noAssays.rds")
dir.create(td <- tempfile())
osf_download(osf_files, path = td)

##  # A tibble: 1 × 4
##    name                           id              local_path      meta        
##    <chr>                          <chr>           <chr>           <list>      
##  1 IMC_HumanPancreas_Damond_noAs… 682e2ea1abc2a7… /tmp/RtmpBwYgl… <named list>

Code

spe <- readRDS(file.path(td, list.files(td, ".rds")))

table(spe$patient_id, spe$patient_stage)

##        
##         Long-duration Non-diabetic  Onset
##    6089         91634            0      0
##    6126             0       158693      0
##    6134             0       181609      0
##    6180        141450            0      0
##    6228             0            0 142106
##    6264        146123            0      0
##    6278             0       185399      0
##    6362             0            0 202272
##    6380             0            0 129456
##    6386             0       127350      0
##    6414             0            0 166052
##    6418        104830            0      0

32.2 Visualize data

First, we plot a few selected fields of view (FOV).

Code

selPl <- sample(unique(spe$image_name), 4)
df <- cbind(colData(spe[, spe$image_name %in% selPl]), 
            spatialCoords(spe[, spe$image_name %in% selPl]))

df |> 
  as.data.frame() |> 
  ggplot(aes(x = cell_x, y = cell_y, color = cell_category)) + 
  geom_point(size = 0.5) + 
  facet_wrap( ~ image_name, ncol = 2) + 
  coord_equal() + 
  theme_classic() + 
  scale_color_manual(values=unname(pals::okabe(n = 5)))

32.3 Islet reconstrution

Next, we will reconstruct or segment the individual islets using a density-based approach implemented in sosta.

Code

n <- estimateReconstructionParametersSPE(
  spe, 
  marks = "cell_category", 
  imageCol = "image_name", 
  markSelect = "islet", 
  nImages = 10, # number of images for estimating parameters 
  nCores = 4
)

Code

thresSPE <- mean(n$thres)
bndwSPE <- mean(n$bndw)

Density based reconstruction

The sosta package uses a reconstruction method based on the point pattern density of the islet cells. This method needs two parameters – a bandwidth parameter that is used for the estimation of the intensity profile, and a threshold based on which the islets are reconstructed.

Code

shapeIntensityImage(
  spe, 
  marks = "cell_category", 
  imageCol = "image_name", 
  imageId = selPl[4], 
  markSelect = "islet"
)

Shown on the left is a density (pixel-level) image, and on the right a histogram of the intensity values. The method selects every pixel above a certain threshold for reconstruction. The smoothing bandwidth that generates the image is estimated using cross-validation (see function bw.diggle() of the spatstat.explore package from CRAN).

The threshold is estimated by taking the mean between the two modes of the (truncated) pixel intensity distribution. The function estimateReconstructionParametersSPE() repeats the estimation of these parameters for a selection of images in the dataset.

To make the reconstruction comparable between images we then use a fixed set of parameters for all images, in our case the mean of the individual parameters for a subset of the dataset.

Code

# run on all images
allIslets <- reconstructShapeDensitySPE(
  spe, 
  marks = "cell_category", 
  imageCol = "image_name", 
  markSelect = "islet", 
  bndw = bndwSPE, 
  thres = thresSPE, 
  nCores = 4
)

Now we can inspect the reconstruction in the sample images.

Code

df <- cbind(colData(spe[, spe$image_name %in% selPl]), 
            spatialCoords(spe[, spe$image_name %in% selPl]))

df |> 
  as.data.frame() |> 
  ggplot(aes(x = cell_x, y = cell_y, color = cell_category)) + 
  geom_point(size = 0.25) + 
  facet_wrap(~ image_name, ncol = 2) + 
  coord_equal() + 
  theme_classic() + 
  scale_color_manual(values=unname(pals::okabe(n = 5))) + 
  geom_sf(
    data = allIslets[allIslets$image_name %in% selPl, ], 
    inherit.aes = FALSE, 
    color = "firebrick", 
    fill = NA, 
    linewidth = 0.75
  )

##  Coordinate system already present. Adding new coordinate system, which will
##  replace the existing one.

The allIslets object is a simple feature collection which contains polygons (<GEOMETRY> column), a structure identifier (structID), and the image identifier (image_name). We will add some patient metadata to the object.

Code

colsKeep <- c(
  "patient_stage", 
  "tissue_slide", 
  "tissue_region", 
  "patient_id", 
  "patient_disease_duration", 
  "patient_age", 
  "patient_gender", 
  "sample_id"
)

# specify factor levels
lv <- c("Non-diabetic", "Onset", "Long-duration")

patientData <- colData(spe) |> 
  as_tibble() |> 
  group_by(image_name) |> 
  select(all_of(colsKeep)) |> 
  mutate(patient_stage = factor(patient_stage, levels = lv),
         patient_id = as.factor(patient_id)) |> 
  unique()

##  Adding missing grouping variables: `image_name`

Code

allIslets <- allIslets |> 
  dplyr::left_join(patientData, by = "image_name")

32.4 Quantification of geometric features

Now we can proceed with quantification of geometric aspects of the islets and combine them with patient information.

Code

isletMetrics <- totalShapeMetrics(allIslets)
allIslets <- allIslets |> 
  cbind(t(isletMetrics))

PCA can give us an overview of the different features. Each dot represents one structure.

Code

autoplot(
  prcomp(t(isletMetrics), scale. = TRUE), 
  x = 1, 
  y = 2, 
  data = allIslets, 
  color = "patient_stage", 
  size = 2, 
  # shape = "type", 
  loadings = TRUE, 
  loadings.colour = "steelblue3", 
  loadings.label = TRUE, 
  loadings.label.size = 3, 
  loadings.label.repel = TRUE, 
  loadings.label.colour = "black"
) + 
  theme_bw() + 
  coord_fixed() + 
  scale_color_manual(values = unname(pals::tol(n = 3)))

Code

allIslets |> 
  sf::st_drop_geometry() |> 
  select(patient_stage, rownames(isletMetrics)) |> 
  pivot_longer(-patient_stage) |> 
  filter(name %in% c("Area", "Compactness", "Curl")) |> 
  ggplot(aes(x = patient_stage, y = value, fill = patient_stage)) + 
  geom_violin() + 
  geom_boxplot(aes(fill = NULL), width = 0.3) + 
  facet_wrap( ~ name, scales = "free") + 
  scale_x_discrete(guide = guide_axis(n.dodge = 2)) + 
  #scale_y_sqrt() +
  guides(fill = "none") + 
  theme_bw() + 
  scale_fill_manual(values = unname(pals::tol(n = 3)))

Let’s focus on the area of the islets and facet by patient to look at patient variability. As the distribution is very skewed we will use a 1/4-power transformation on the area of the islets. The transformation was chosen after inspection of the model diagnostics (see below).

Code

allIslets |> 
  sf::st_drop_geometry() |> 
  select(patient_stage, patient_id,  rownames(isletMetrics)) |> 
  pivot_longer(-c(patient_stage, patient_id)) |> 
  filter(name %in% c("Area")) |> 
  ggplot(aes(x = patient_id, y = value^(1/4), fill = patient_stage)) + 
  geom_violin() + 
  geom_boxplot(aes(fill = NULL), width = 0.3) + 
  geom_jitter(size = .5) +
  facet_wrap( ~ patient_stage, scales = "free_x") + 
  guides(fill = "none") + 
  theme_bw() + 
  ylab("Area") +
  scale_fill_manual(values = unname(pals::tol(n = 3)))

32.5 Between-sample comparison using mixed effects models

An important aspect to note is that the individual structure metrics are not independent measurements, since there are generally repeated measurments (i.e., multiple islets) per slide and per patient. Therefore, we need to account for this correlation between measurements.

To account for this, we will use mixed linear models with random effects for the patient and the individual slides (image name). The lme4 package will be use for fitting linear mixed effects models (Bates et al. 2015) and lmerTest for p-value calculations (Kuznetsova, Brockhoff, and Christensen 2017).

Code

mod <- lmer((Area)^(1/4) ~ patient_stage + (1 | patient_id) + (1 | image_name), 
            data = allIslets)

Model diagnostics

32.6 Residuals vs. fitted

Code

plot(
  mod, 
  resid(., scaled = TRUE) ~ fitted(.), 
  col = allIslets$patient_id, 
  pch = 12, 
  abline = 0, 
  xlab = "Fitted values", 
  ylab = "Standardised residuals"
)

32.7 Residuals vs. fitted per sample

Code

plot(
  mod, 
  resid(., scaled = TRUE) ~ fitted(.) | patient_id, 
  abline = 0, 
  pch = 12, 
  xlab = "Fitted values", 
  ylab = "Standardised residuals"
)

32.8 QQ plot

Code

qqnorm(resid(mod), pch = 16)
qqline(resid(mod))

Code

summary(mod)

##  Linear mixed model fit by REML. t-tests use Satterthwaite's method [
##  lmerModLmerTest]
##  Formula: (Area)^(1/4) ~ patient_stage + (1 | patient_id) + (1 | image_name)
##     Data: allIslets
##  
##  REML criterion at convergence: 9028
##  
##  Scaled residuals: 
##      Min      1Q  Median      3Q     Max 
##  -2.6027 -0.6199  0.0265  0.6129  3.0764 
##  
##  Random effects:
##   Groups     Name        Variance Std.Dev.
##   image_name (Intercept) 1.4628   1.2095  
##   patient_id (Intercept) 0.5426   0.7366  
##   Residual               7.3230   2.7061  
##  Number of obs: 1806, groups:  image_name, 844; patient_id, 12
##  
##  Fixed effects:
##                             Estimate Std. Error      df t value Pr(>|t|)    
##  (Intercept)                  9.5935     0.3924  8.8393  24.447 2.01e-09 ***
##  patient_stageOnset          -0.5624     0.5557  8.8860  -1.012    0.338    
##  patient_stageLong-duration  -1.5404     0.5563  8.9198  -2.769    0.022 *  
##  ---
##  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  
##  Correlation of Fixed Effects:
##              (Intr) ptnt_O
##  ptnt_stgOns -0.706       
##  ptnt_stgLn- -0.705  0.498

From the fixed effects section in summary(mod), we note that there is a statistically significant difference in the transformed islet area of long-duration patients compared to non-diabetic patients, while the effect for onset patients is not significant at the 5% level. This results accounts for correlation at the patient and image level as modeled by random intercepts. The transformation (1/4-power) was chosen after inspection of the residual behavior in the model diagnostics for this dataset and feature of interest specifically.

32.9 Appendix

32.9.1 Other packages

Additional packages to calculate anatomical structure-derived features include:

The Bioconductor packages SPIAT and imcRtools offer other functions to quantify features on the anatomical structure level. These could also be input for mixed effect modeling as illustrated above.
The packages vegan and poem are useful when calculating diversity metrics within structures and / or border regions. To identify border regions and how to overlap anatomical regions with sample coordinates please have a look at the sosta vignette.

32.9.2 Notes

This chapter is based on the sosta vignette “2. Reconstruction and analysis of pancreatic islets from IMC data”.

References

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67: 1–48. https://doi.org/10.18637/jss.v067.i01.

Damond, Nicolas, Stefanie Engler, Vito R. T. Zanotelli, Denis Schapiro, Clive H. Wasserfall, Irina Kusmartseva, Harry S. Nick, et al. 2019. “A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry.” Cell Metabolism 29: 755–68. https://doi.org/10.1016/j.cmet.2018.11.014.

Kuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen. 2017. “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software 82: 1–26. https://doi.org/10.18637/jss.v082.i13.