Erin Aboelnour's team
10.5281/zenodo.8338963
Demyelinating diseases such as Multiple Sclerosis (MS) cause progressive disability through loss of oligodendrocytes and neurodegeneration. Mouse toxicity models are widely used to study demyelination, but their relevance to MS remains unclear. We integrated single-cell transcriptomic datasets from lysophosphatidylcholine (LPC) and cuprizone (CPZ) models with the largest MS dataset to date. CPZ induced a stressed oligodendrocyte state resembling active MS lesions, while both mouse models showed an altered, inflammatory maturation state during remyelination. Microglial responses were largely conserved across models, though human MS lesions displayed greater heterogeneity. Cross-species analyses revealed CPZ-specific gene signatures present in MS, including loss of myelin-stabilizing genes. This work establishes guidance for selecting the most appropriate preclinical systems to test glial-protective and remyelination-promoting therapies, accelerating translational progress for MS.
Overview of Research Project
We performed a secondary analysis of human MS single-nuclei RNA-seq (snRNA-seq) data obtained from Zenodo. The original study included both grey and white matter MS samples, and our reanalysis focused on the assayed white matter tissue to directly compare MS phenotypes with LPC and CPZ demyelination mouse models that included only corpus callosum white matter samples. Our analyses centered on two key glial populations: mature oligodendrocytes , the myelinating cells of the brain, and microglia, the brain’s resident immune cells. We identified and annotated damage-associated oligodendrocytes and damage-associated microglia across human and mouse data, systematically characterizing conserved and divergent phenotypes to evaluate how well the models capture MS pathology.
Inclusion of GREI Data
From the GREI repository MS dataset (DOI: 10.5281/zenodo.8338963), we reanalyzed 321,565 high-quality nuclei transcriptomes taken from 47 individuals. A major strength of this dataset was its detailed immunohistochemistry to annotate the lesion types, spanning controls and five lesion categories: normal-appearing white matter, active lesions, chronic active lesions, chronic inactive lesions, and remyelinated lesions. This granularity enabled us to map cell state transitions in MS and benchmark them against experimental models. The data included the gene counts matrix and metadata including donor ID, age, sex, post-mortem interval (PMI) and broad cell type annotation.
Other Data Sources
To capture LPC and CPZ effects, we integrated three adult mouse corpus callosum sn/scRNA-seq datasets, including: (1) our newly generated dataset (deposited now at GSE293850), (2) LPC-treated corpus callosum data (GSE182846), and (3) CPZ-treated corpus callosum data (GSE148676). Together, these provided 112,017 nuclei and cells spanning baseline, demyelination, and remyelination phases of treatment, allowing us to trace dynamic cellular responses in vivo.
Scientific Approach
Our central question was: What cellular and transcriptional phenotypes do LPC and CPZ models capture that are conserved in MS? To answer this, we (1) performed high-resolution lineage annotations for mature oligodendrocyte and microglia populations in both species, (2) applied Milo for differential abundance (DA) testing across lesion stages and mouse time points, and (3) ran pseudobulk different gene expression (DGE) analyses using glmmTMB (human) and edgeR (mouse) to account for donor effects and batch variability, respectively, and (4) built a joint cross-species embedding, which revealed conserved cell states as well as species-specific divergences. These comparisons provided a mechanistic framework to evaluate the translational fidelity of the LPC and CPZ models to MS.
Models, Agents, and Technology Used
For computational analyses we used Jupyter notebooks, allowing transparent visualization and sharing of results. We used freely available, well-documented tools including Cell Ranger and CellBender, and scverse packages. The main tools we used were scVI and sysVI for integration, Milo for DA, glmmTMB and edgeR for DGE. Outputs are provided through our laboratory GitHub repository to ensure accessibility for future users interested in the analyses of these datasets. Analyses were performed using computation cluster amenities at the University of Notre Dame.
For our experimental validation, we used ACDBio to perform RNAscope and commercially available antibodies for IHC on LPC/CPZ mouse tissues and NIH biobank MS brain samples. Experimental validation included RNAscope and IHC on LPC and CPZ mouse tissues, extending to later timepoints to capture inflammatory resolution, as well as validation of select gene markers in human MS tissues.
Outcomes and Outputs
Our project directly addressed the goal of reanalyzing human MS patient data and performing cross-species comparisons to identify conserved transcriptional programs in two widely used mouse models. We successfully compared LPC, CPZ, and MS sn/scRNA-seq datasets at high resolution, defining dynamic changes in glial subpopulations during de- and remyelination. This analysis established correspondence between mouse and human MS glial phenotypes, providing a framework for how these models can be leveraged for mechanistic studies and therapeutic discovery. A manuscript describing these findings is under revision at Nature Communications. From these efforts, we generated three major community resources: (1) an integrated dataset of LPC and CPZ models, (2) a new integration of human white matter MS samples, and (3) a cross-species embedding. In our publication we provide searchable outputs (.csv) of all differential abundance (DA) and differential gene expression (DGE) analyses.
Methods and Metadata
Human MS data was reanalyzed following strict quality controls (removal of nuclei with >10% mitochondrial counts, >10,000 genes, or >50,000 reads). Demographics showed no significant differences in age, sex, PMI, or read depth between MS and controls. Data was integrated using scVI, enabling robust annotation of oligodendrocyte and microglia lineages. We applied Milo to test DA across lesion types and pseudobulk DGE using glmmTMB, with donor as a random effect.
Mouse data was reprocessed from raw fastq files, mapped with Cell Ranger, cleaned of ambient RNA with CellBender, and quality filtered analogously to the human dataset (removing nuclei with >5% mitochondrial counts, >20% ribosomal counts, <200 genes, and outliers with high gene or read counts). Integration was performed with sysVI to handle batch effects across single-cell and single-nuclei data. DA testing (Milo) and DGE (edgeR) were conducted in parallel to the human pipeline.
Finally, human and mouse datasets were integrated into a joint embedding, enabling systematic cross-species comparisons. We converted mouse genes to human and subset to shared homologues. We confirmed cell identity similarities using MetaNeighbor, and used gene scores from DGE lists to show conserved pathway enrichment in damage associate glia.
Conclusions
Our findings highlight distinct and shared transcriptional programs across species. CPZ induced a disease-associated oligodendrocyte state overlapping with active MS lesions, marked by stress-response signatures, supporting its use in drug testing for oligodendrocyte protection. CPZ and LPC models revealed oligodendrocyte states with impaired myelin gene expression during demyelination and altered maturation during remyelination, validated by in vivo experiments. In contrast, human microglia displayed greater heterogeneity and lacked proliferation signals observed in mice, although core demyelination responses were conserved. These results support strategic use of LPC and CPZ models to probe specific pathways relevant to MS.
Standards, Resources, and Tools
Analyses used open-source, field-standard tools with transparent documentation and reproducible workflows. Analyses were performed in command-line and Jupyter notebook using R and Python coding language.
Replicability and Reproducibility
We demonstrated replicability by applying an alternative integration method (Harmony) to mouse datasets, and showed consistent glial states. Human cell states were validated against the original study’s labels, which showed high concordance. Reproducibility was ensured by depositing raw mouse data in GEO, and releasing all analysis scripts and workflows on GitHub with YAML environment files. As a next step, we will publish final integrated datasets and outputs to Zenodo and build a shiny app for easy exploration of our analyses. Together, these measures guarantee transparency, reuse, and interoperability for the broader research community.
Scientific Contributions
This project makes the first systematic, cross-species comparison of transcriptional programs in human MS lesions and two widely used mouse demyelination models. By generating high-resolution cellular maps of glia states across lesion stages and model time courses, our work clarifies where mouse models faithfully capture human pathology and where they diverge. This resource fills a major gap in the MS field, where model systems are indispensable but often applied without clear evidence of their translational relevance. The resulting datasets and analytical framework now provide the research community with a roadmap for interpreting model-derived findings in the context of human disease.
Impact on Diagnosis, Treatment, and Prevention
Our findings refine how experimental models should be used for therapeutic discovery. Specifically:
Together, these insights sharpen the field’s ability to design model-based studies that address the most clinically relevant aspects of MS biology. While our work does not yet directly alter diagnosis or clinical management, it lays essential groundwork for identifying and testing new therapeutic strategies that could accelerate remyelination, protect vulnerable glial populations, and ultimately improve outcomes for people living with MS.
Completed Work
The major goals of the project were completed, and the resulting work is currently under review at Nature Communications. Using state-of-the-art computational tools, we generated curated mouse, human, and cross-species integrations, performed differential abundance testing, and carried out comprehensive transcriptional analyses of glial cell types central to MS pathology.
One revision to the original scope was the addition of a workflow to define gene coexpression meta-modules. While successfully implemented, most modules captured patient-specific B cell phenotypes that were not comparable with mouse models. These findings are scientifically valuable and will inform future human-focused projects, but they were beyond the immediate scope of this study.
Constraints and Data Resources
The primary constraint was integrating heterogeneous sc/sn datasets and across species. We addressed these by remapping raw sequencing files, applying strict quality controls, and leveraging advanced integration frameworks to mitigate batch effects. We also undersample another important glial cell, the astrocytes in the mouse data so these could not be compared to the MS cohort. Another comparison we did not make was the stratification of patients into groupings based on transcriptional phenotype, and instead focused on broad transcriptional phenotypes. Future studies with even larger cohorts may provide better insight into the frequency of the specific glial responses we identified.
Research Quality
(a) We validated computational outputs both analytically and experimentally. Integration of sc/snRNA-seq datasets across species was carefully managed with cutting-edge analytical tools to address donor and batch effects. In mouse datasets, IHC confirmed key transcriptional findings, supported cell type annotations and damage-associated states. Selected transcriptional changes were also validated in MS tissue, ensuring biological relevance.
(b) Barriers included limited availability and diversity of MS samples for IHC, which constrained validation largely to chronic active lesions. We also could not implement tools for cell-to-cell communication because they do not yet account for strong batch effects. Despite these challenges, rigorous QC, robust statistical frameworks, and experimental validation ensured high-quality findings, with future work aimed at expanding uncovering how modulation of stress and inflammatory pathways can affect OL stability and survival.
Reuse and Analytical Challenges
We encountered few difficulties reusing the Zenodo dataset. Despite its large size (>5 GB), it was easily downloaded and well annotated with metadata to account for donor and batch effects. The original files were formatted for Seurat/R code, so we converted them to H5AD for use with Python scverse packages. The authors did not share computational outputs, and the available R code was not easily adaptable, so all the results we presented here were reimplemented for our study. One aspect we could not replicate was the multi-factorial analysis used for patient stratification, which would require substantial time beyond the scope of this project. Future work will explore whether mouse-derived glial states are broadly represented in MS patients or limited to specific subgroups.