Submission

Aaditya Rangan's team

introduction

title

Identifying tau-driven cell-subtypes.

short description

We wish to conduct a heterogeneity analysis and identify tau-specific cell-subtypes within a cerebral organoid dataset.

Phase 1 Submission Form

Overview / Abstract

We wish to conduct a heterogeneity analysis and search for cell-subtypes within the cerebral organoid dataset collected by [Glasauer et al. 2022], as described below.
In our recent work, we found that Alzheimer's Disease (AD) exhibits heterogeneity at a genetic level.
This heterogeneity has a hierarchical structure involving (i) three different genetic correlation patterns surrounding the MAPT-gene at the base level, which then (ii) further subdivide into disease-specific clusters [Elman et al. 2024].
The MAPT-gene implicated in this heterogeneity encodes for tau, which in turn impacts neuronal development and is a major risk factor for AD.
The organoid dataset from Glasauer et al. provides us with an excellent opportunity to study the heterogeneity linked to two such MAPT-mutations (i.e., tau-mutations).
By identifying the cell subtypes that emerge as a result of tau-mutation, we'll take an important step towards understanding the mechanisms underlying tau pathology, as well as AD.

Secondary Analysis: Research Aims

Ultimately, we aim to identify genetic heterogeneity within the cerebral organoid dataset collected by Glasauer et al.
This dataset is stored on Dryad (doi.org/10.25349/D95898) and contains sn-RNA seq data from 62 samples, comprising ~80K cells measured across ~30K genes.
Importantly, the samples in this dataset involve organoids with tau-mutations (337VM, 406RW and 406WW) as well as isogenic controls.
Moreover, the various cells across the samples have already been characterized by putative type (e.g., excitatory pyramidal neurons, astrocytes, etc.).

The original analysis conducted in Glasauer et al. assesses (for each cell-type) the gene-expression differences between the tau-mutants and the controls, generally assuming that each cell-type is a homogeneous group.
We will conduct a follow-up analysis to identify tau-induced heterogeneity: namely, to identify tau-specific 'biclusters' within each cell-type.
Each bicluster will involve a subset of mutant cells that exhibit genetic correlations (across a subset of genes) that are not shared by the corresponding isogenic controls.
Each bicluster of this kind can be thought of as a genetically characterized tau-specific cell-subtype.
Once identified, we will perform a differential gene- and pathway-analysis on each identified cell-subtype, contrasting the likely gene-interactions within each tau-specific cell-subtype against the other subtypes, as well as against the controls.

To actually identify the cell-subtypes (i.e., biclusters) described above, we will use our recently developed biclustering algorithm, referred to as `loop-counting' [Zhou et al. 2024].
This loop-counting strategy has several advantages:
To start, our strategy is the first we are aware of that can detect biclusters enriched for asymmetric gene-gene-interactions such as gating and dependency, in addition to more traditional symmetric interactions like gene-gene-correlations.
Crucially, our loop-counting framework can correct for controls, identifying only those biclusters that are disease-specific, while also correcting for covariates, such as batch number; our workflow can also naturally accommodate for any missing data.
Finally, our algorithm uses a permutation test (against a label-shuffled null-hypothesis) to give each bicluster found a p-value, ensuring a fixed false-discovery rate.
To the best of our knowlege our algorithm outperforms other commonly used algorithms in the literature, including Louvain-clustering and the UMAP-clustering used in Glasauer's original analysis (see appendix of Zhou et al. 2024).

In terms of a timeline, we'll start by unpacking, cleaning and running basic diagnostics on the data.
Within two months we expect to have finished the primary biclustering analysis, and after another two months we will have finished the gene- and pathway-analyses.
Along the way, we expect to spend 2-4 months preparing our results for presentation and publication.

GREI Repository Data Sets

Dryad

DOI (Digital Object identifier) of GREI Repository Dataset

doi.org/10.25349/D95898

Outcomes and Outputs

We expect to identify several statistically significant biclusters within the organoid dataset from Glasauer et al.
Notably, the original analyses did not look for the same kinds of biclusters that we described above.
For example, Glasauer et al. reported no statistically significant enrichment of mutant cells within the broader category of excitatory pyramidal neurons.

By contrast, our methods have been specifically designed to accurately assess heterogeneity within sn-RNA-seq data, and have already been shown to work on organoid-data.
Indeed, a preliminary analysis of this data-set suggests the existence of multiple distinguishable cell-subtypes within many of the original cell-types, including the excitatory pyramidal neurons.
We believe that the additional sensitivity afforded by our methodology will allow us to characterize the tau-driven subtypes within this heterogeneous landscape, paving the way for a more detailed understanding of tau pathology.

We will delineate these cell-subtypes in terms of both (i) their p-value, (ii) the symmetric- and asymmetric-gene-gene-interactions that can be used to characterize that subtype, as well as (iii) the most prominent gene pathways and interactions which distinguish that tau-specific subtype from the corresponding control cell type.
We will also check to see if any of the subytpes are enriched for the AD-specific genes and pathways we identified in our earlier heterogeneity analysis [Elman et al. 2024].

We will write up a manuscript for publication in an AD-specific journal (e.g., the Journal of Alzheimer's Disease).
We will also prepare a poster presentation for the AAIC (Alzheimer's Association International Conference) in 2025.

One of the advantages of our biclustering methods is that they are automatic and deterministic, similar in some ways to principal-component-analysis (PCA).
These features mean that our results can be easily replicated and reproduced by any other researchers.
In accordance with FAIR principles, we will share the software and scripts, as well as all the results used to perform this analysis at github.com/adirangan.
The software and results will be packaged with a short vignette (i.e., tutorial) allowing others to run the same analysis and reproduce the results themselves.
The format for the output data files, including summary statistics, will be ascii-readable tables (e.g., csv-arrays with headers).

This particular project does not use data directly associated with individuals, and so the CARE principles do not directly apply.

Impact/ Scientific Significance

Alzheimer's Disease (AD) accounts for most dementia cases in the United States, with an estimated 6.5 million individuals over the age of 65 currently suffering from the disease, a number that is expected to increase drastically in the coming decades.
The disease comes with enormous economic costs for the country, as well as a devastating personal cost for patients and their loved ones.
While there has been a concerted effort towards treatment and prevention of AD, clinical trials have had limited success in preventing AD-related cognitive decline.
Indeed, while the FDA has approved two anti-amyloid drugs, there has been controversy over their impact on clinical worsening.
Thus, treating AD remains an ongoing public health priority, and better explaining its disease etiology will facilitate these efforts.

AD is a complex disease involving a collection of symptoms including amnestic impairment, neurodegeneration and cognitive decline that eventually leads to loss of everyday functioning.
While the etiology of AD remains unclear, it is likely due to a combination of genetic and environmental factors.
Moreover, while there are many prototypical features of AD which hold in aggregate, it is widely acknowledged that there is significant variability in how AD presents across individuals.
Several studies have attempted to characterize this phenotypical variability, however there is still a great deal of unexplained heterogeneity in AD presentation, with the potential for distinct disease subtypes.

To make matters more complicated, there is no guarantee that a characterization of this phenotypic heterogeneity (e.g., cognitive, pathological and atrophy subtypes of AD), will help cleanly delineate homogeneous subgroups of genetic risk, or allow for better risk assessment.
With this in mind, our current project focuses on directly identifying potential sources of genetically-driven heterogeneity in AD, without relying on prior phenotypic classification.
Once identified, the structure underlying this genetic heterogeneity (in the form of tau-specific cell-subtypes) can help trace a route from genetic risk through to disease phenotype, and can help clarify which pathways are impacted by different forms of the disease etiology (e.g., certain tau-mutations).
In this sense, we hope that this project will provide an 'anchor point' we can build on to more fully characterize the downstream impacts of genetic heterogeneity in AD.

In summary, we hope that by identifying tau-specific genetic cell-subtypes we can help better reclassify and/or predict tau-pathology and AD-prognosis, including disease trajectory and disease response at the individual level.

Team

Aaditya Rangan is an associate-professor in the applied mathematics department at New York University who has worked in computational biology and bioinformatics for over two decades.
Jeremy Elman in an assistant-professor in the department of psychiatry at the University of California at San Diego, and has studied Alzheimer’s disease for over a decade.
Caroline McGrouther is an MD PhD from the University of California at San Diego, and has been working as a researcher in bioinformatics for several years.
Haosheng Zhou was the master's student of A. Rangan, and is currently working towards a PhD in statistics and applied probability at the University of California at Santa Barbara.
We initially began collaborating as part of an NIH grant (U19AG023122) and have published several papers together [e.g., Schork and Elman 2023, Zhou et al 2024, Elman et al. 2024 and McGrouther et al. 2024].
Zhou is well on his way towards his PhD, and together with Rangan, McGrouther and Elman our team has a strong record of research involving statistical analysis in one form or another, with a strong focus on bioinformatics.
Perhaps most importantly, through the course of our collaborations we have developed, implemented and tested the biclustering method described above.
This includes the original loop-counting method [Rangan 2012, Rangan et al. 2018], as well as the more recent application to sn-RNA-seq data and the extension to incorporate asymmetric gene-gene-interactions [Zhou et al. 2024].

Considerations

As mentioned above, we recently published two papers exploring the genetic heterogeneity of AD.
These results illustrate the genetic heterogeneity of AD in terms of both the underlying genotype [Elman et al. 2024] and at the pathway-level [Schork and Elman 2023].
Furthermore, the hierarchical nature of AD genetic heterogeneity is strongly driven by stratification in the correlation-structure across SNPs surrounding the MAPT-gene.
This evidence strongly motivates our proposed study of the MAPT-mutations in the organoid-dataset of Glasauer et al.
In addition, we believe that we are uniquely poised to search for heterogeneity within this dataset, as our recently developed biclustering algorithms are specifically designed for just such a problem.
In summary, we are certain that we have the expertise, the computation tools and the manpower to carry out this project.

Supporting Documents

Provide up to 10 resources for the evaluation of your secondary research project including but not limited to: ● The persistent identifier of the dataset(s), other than GREI dataset DOIs already listed above, to be used in the proposed project (where available) ● Tools/workflows or resources to be utilized in the proposed project ● Relevant references or scientific publications that directly relate to the proposed project

Supporting Document (1)

http://www.github.com/adirangan

Supporting Document (2)

https://arxiv.org/abs/2405.00159

Supporting Document (3)

https://doi.org/10.1101/2023.05.02.23289347

Supporting Document (4)

https://doi.org/10.1101/2022.08.04.502792

Supporting Document (5)

https://pubmed.ncbi.nlm.nih.gov/36909609

Supporting Document (6)

https://doi.org/10.1371/journal.pcbi.1006105

Supporting Document (7)

https://www.sciencedirect.com/science/article/pii/S0021999111007534

Non Scored Criteria

Please complete this information. It will not be scored by the evaluation panel.

Entity Participation

Participate as an independent Team (i.e., registering as a group of individuals competing together but not on behalf of an established organization, institution, or corporation)

Research Discipline (non-scored criteria)

Alzheimer's Disease
Bioinformatics
Heterogeneity Analysis
Biclustering

IDeA State (non-scored criteria)

All Team Member Information - Name, Organization, Job Title, and Email address

[Point of Contact]: Aaditya Rangan, New York University, Associate Professor, avr209@nyu.edu.

Jeremy Elman, University of California, San Diego, Assistant Professor, jaelman@health.ucsd.edu

Caroline McGrouther, University of California, San Diego and New York University, Affiliated Researcher, ccm207@nyu.edu

Haosheng Zhou, University of California, Santa Barbara, Ph.D. Student, hzhou593@ucsb.edu

MSI (non-scored criteria)

Participation in prior DataWorks! Prizes (non-scored criteria)

Team Point of Contact Eligibility

yes

Eligibility (non-scored criteria)

Yes, I confirm that I have read and meet the terms of eligibility for this challenge

Was this page helpful? yes no