Statistical methods for unsupervised analyses and differential comparisons in high-throughput genomic data

I am interested in developing scalable statistical methods for unsupervised analyses of data from high-throughput genomic platforms, including spatially-resolved transcriptomics and single-cell RNA sequencing. Recent projects include the development of a scalable method to identify spatially variable genes (nnSVG, Weber et al. 2023) and a collaborative analysis of spatial and single-nucleus data from the locus coeruleus region in postmortem human brain samples (Weber and Divecha et al. 2023). Previously, I contributed an unsupervised analysis workflow during a collaborative analysis of spatially-resolved transcriptomics data from the dorsolateral prefrontal cortex (DLPFC) in postmortem human brain samples (Maynard and Collado-Torres et al. 2021).

Methods for differential analyses are used to identify biological features such as differential abundance of cell types or differential expression states within cell types between groups of samples in different biological conditions. I have led the development of an improved computational framework for differential analyses in high-dimensional cytometry data (diffcyt, Weber et al. 2019) and contributed to the development of a comprehensive data analysis workflow for high-dimensional cytometry data (Nowicka et al. 2019).

Collaborative projects in neuroscience and cancer biology

My statistical and computational methodological work is motivated by collaborative projects in neuroscience and cancer biology. Spatially-resolved measurements are especially informative in these systems, due to the presence of highly spatially defined biological features.

Recent collaborations include an analysis of spatial and single-nucleus data from the locus coeruleus region in postmortem human brain samples (Weber and Divecha et al. 2023), contributions to an analysis of spatially-resolved transcriptomics data from the human dorsolateral prefrontal cortex (DLPFC) (Maynard and Collado-Torres et al. 2021), and a benchmark evaluation of methods for genetic variation-based demultiplexing of pooled single-cell RNA sequencing samples from tumor samples from high-grade serous ovarian cancer (HGSOC) and lung adenocarcinoma (Weber et al. 2021).

Applications of spatial statistics

Spatially-resolved transcriptomics data consist of gene expression measurements for up to thousands of genes at up to thousands of spatial locations, usually within two-dimensional tissue sections. Recent advances in statistical methodology and computational implementations in spatial statistics, such as nearest-neighbor Gaussian processes (NNGPs), enable us to analyze these data in a more efficient and spatially-aware manner, for example to identify spatially variable genes or perform spatially-aware clustering. I recently led a project to adapt these methods to the context of spatially-resolved transcriptomics to identify spatially variable genes (nnSVG, Weber et al. 2023), and applied this framework to data from postmortem human brain samples from the locus coeruleus region (Weber and Divecha et al. 2023).

Benchmarking, software infrastructure, and analysis workflows

Statistical and computational methods development requires rigorous benchmarking of new methods against existing and baseline methods. Together with several research groups, I led a project to develop guidance on how to perform different types of benchmarking studies (Weber et al. 2019). Previously, I performed a systematic benchmark comparison of clustering methods for high-dimensional cytometry data (Weber and Robinson 2016), as well as several additional comprehensive benchmarks during further computational projects (including Weber et al. 2019 and Weber et al. 2021).

I am motivated to develop accessible software infrastructure and well-documented analysis workflows to demonstrate the use of computational methods, show how different methods can be connected in analysis workflows, and help other researchers to analyze high-throughput genomic data in a rigorous and reproducible manner. To this end, I have co-led the development of software infrastructure for storing and manipulating spatially-resolved transcriptomics datasets within the R/Bioconductor framework (SpatialExperiment, Righelli, Weber, Crowell et al. 2022), and I am coordinating a collaborative effort to develop an interactive online textbook demonstrating key steps in computational analysis workflows for spatially-resolved transcriptomics data, including examples of R code and datasets (OSTA).

Open science

I support principles of open science, including the release of freely accessible open-source software, reproducible analyses, code and data resources, and publication of preprints. Open science improves the reliability and reproducibility of research outputs, and leads to faster scientific advances by enabling researchers to build on each others’ work. In January 2021, my efforts to create open code and data resources were recognized with a Research Symbiont award.