Unsupervised statistical methods

I am interested in developing improved statistical methods for unsupervised analyses in high-throughput and high-dimensional biological data, including spatially resolved transcriptomics, single-cell RNA sequencing, and mass cytometry (CyTOF). This includes techniques such as feature selection, dimension reduction, and clustering. I am also interested in benchmarking these methods, and making methods accessible through the development of user-friendly R packages and computational analysis pipelines.

Key papers:

  • Weber L.M. and Robinson M.D. (2016), Comparison of clustering methods for high‐dimensional single‐cell flow and mass cytometry data, Cytometry Part A, 89A, 12, 1084–1096. Links to: Paper, Code, Data.

    • benchmark paper comparing the performance of clustering algorithms for mass cytometry (CyTOF) data

  • Maynard K.R.*, Collado-Torres L.*, Weber L.M., Uytingco C., Barry B.K., Williams S.R., Catallini J.L. II, Tran M.N., Besich Z., Tippani M., Chew J., Yin Y., Kleinman J.E., Hyde T.M., Rao N., Hicks S.C., Martinowich K.+, Jaffe A.E.+ (2021), Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex, Nature Neuroscience, 24, 425-436. Links to: Paper, Web application, R/Bioconductor package, Code for paper, Code for web application.

    • collaboration paper investigating the spatial landscape of gene expression in human brain (dorsolateral prefrontal cortex); my contribution consisted of the development of an unsupervised analysis pipeline

Differential analyses

Methods for differential analyses are frequently used for exploratory analyses in high-throughput genomics data – for example identifying differentially abundant (DA) subpopulations of cells, or differential states (DS) of expression within cell populations. I have been involved with several projects to develop improved methods for DA and DS analyses in mass cytometry (CyTOF) data.

Key papers:

  • Weber L.M., Nowicka N., Soneson C., and Robinson M.D. (2019), diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering, Communications Biology, 2, 183. Links to: Paper, R/Bioconductor package, Code, Data.

    • development of an improved computational framework for differential analyses in mass cytometry (CyTOF) data, implemented as an R/Bioconductor software package named diffcyt

  • Nowicka N., Krieg C., Crowell H.L., Weber L.M., Hartmann F.J., Guglietta S., Becher B., Levesque M.P., and Robinson M.D. (2017; updated 2019), CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets, F1000Research, 6:748, v3. Links to: Paper, R/Bioconductor package.

    • contributed to a comprehensive data analysis pipeline for differential analyses in mass cytometry (CyTOF) data

Spatially resolved transcriptomics

My methodological work is motivated by experimental collaborations in high-throughput molecular biology. Recently, I have been involved in several collaborations on data analysis for spatially resolved transcriptomics (using the 10x Genomics Visium platform) in human brain. This is a fascinating new type of data, which allows measuring transcriptome-wide gene expression with spatial resolution, at near single-cell level. These projects allow us to contribute to fundamental biological insights, while opening up new avenues for statistical methodological research to develop data analysis methods that account for the unique characteristics of these data types.

Key papers:

  • Maynard and Collado-Torres et al. (2021) (collaboration on human brain; see details and link above under “Unsupervised statistical methods”)

  • Righell D.*, Weber L.M.*, Crowell H.L.*, Pardo B., Collado-Torres L., Ghazanfar S., Lun A.T.L., Hicks S.C.+, and Risso D.+ (2021), SpatialExperiment: infrastructure for spatially resolved transcriptomics data in R using Bioconductor, bioRxiv, 2021.01.27.428431v1. Links to: Paper, R/Bioconductor package.

    • implementation of SpatialExperiment, a core data structure for storing and manipulating spatially resolved transcriptomics data in R using the Bioconductor framework

Single-cell RNA sequencing

Single-cell RNA sequencing experiments are expensive, and frequently subject to systematic issues such as batch effects. Application of improved methods for experimental design, such as sample pooling and genetic variation-based sample demultiplexing, can help alleviate these issues.

Key papers:

  • Weber L.M., Hippen A.A., Hickey P.F., Berrett K.C., Gertz J., Doherty J.A., Greene C.S., and Hicks S.C. (2020), Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design, bioRxiv, v1, 371963. Links to: Paper, Code, Data, Data (controlled access).

    • evaluation of genetic variation-based sample demultiplexing in the context of an experimental collaboration on high-grade serous ovarian cancer (HGSOC), and development of a Snakemake computational pipeline


Benchmarking is a crucial aspect of computational methods development. New methods must be rigorously compared against existing methods, both in terms of statistical performance, as well as other aspects such as computational complexity and runtime. Useful contributions include both neutral benchmarks (i.e. major systematic comparisons of algorithms by researchers not involved with the development of any of the methods), as well as smaller (but still comprehensive) benchmarks performed by developers of a new method to demonstrate their performance.

Key papers:

  • Weber L.M., Saelens W., Cannoodt R., Soneson C., Hapfelmeier A., Gardner P.P., Boulesteix A.-L., Saeys Y., and Robinson M.D. (2019), Essential guidelines for computational method benchmarking, Genome Biology, 20, 125. Link to: Paper.

    • review paper summarizing our views and guidance for how to best perform different types of benchmarking studies

  • Weber and Robinson (2016) (comparison of clustering algorithms; see details and link above under “Unsupervised statistical methods”)

  • Weber et al. (2019) (development of diffcyt computational framework; see details and link above under “Differential analyses”)

Open science

Science is a collaborative human effort spanning continents and generations. Openness is crucial, since it allows us all to build on each others’ achievements and insights.

I strongly support modern practices in open science, including – providing freely available code and data repositories along with all papers to ensure reproducibility of computational analyses; developing freely available and open-source software packages to implement all new statistical methods; collaborative development of software via platforms such as GitHub; and making papers accessible by posting preprints and publishing open access papers.

These practices have undergone enormous change within the last 10 years – especially the widespread adoption of preprints in biology through bioRxiv. I am excited to continue following these developments over the coming years and decades, and will continue to adopt best practices in open science in my research work.