The Cell Type Knowledge Explorer is a publicly accessible atlas for navigating multimodal cell type data of the mammalian brain.
Please complete these prompts for your round one submission.
Abstract / Overview
Large-scale data generation and analysis of single cell data is rapidly scaling, yet few tools for interactive exploration of derived cell type taxonomies exist. The Cell Type Knowledge Explorer (CTKE) is a publicly accessible web application to browse data associated with the multimodal cell census of primary motor cortex, and was developed in close collaboration with the BRAIN Initiative Cell Census Network (BICCN). The CTKE provides access to single-cell transcriptomic and epigenomic profiling of human, marmoset, and mouse primary motor cortex, with additional data sets assessing the spatial distribution, cell morphology, and electrical properties of cell types. CTKE was developed as a data reuse resource for brain cell types.
Our DataWorks team is comprised of computational biologists and technologists from the Allen Institute for Brain Science who work collaboratively on the analysis, organization, and presentation of our data. Originally formed as part of an NIH-sponsored grant on developing reproducible Data Standards in Neuroscience for the BRAIN Initiative, the team came together with the goal of developing a data reuse platform centered on organization of knowledge and useful data defining cell types in the brain. Key supporting members of the team are neuroinformatics experts, data analysts, and experts on data reuse applications.
Collaborative work is done through daily focused interactions of product design and implementation via in person and virtual communication. Data management is handled by working with our technology partners who carefully manage archival data, reproducibility, and provenance. Developing an application of this scope used data and analysis generated through many of the BRAIN Initiative Cell Census Network participating laboratories who were involved in consultation for product development and presentation.
The project began on September 10th, 2020, with an initial timeframe of 2 years. The main goals of this project were to establish standard approaches for defining cell type characterization and for developing transcriptomics taxonomies. Standards included definition and quantification of cell types, application of cell type ontology and nomenclature, integration of multiple sources of data, and engaging the cell type community on their improvement and adoption. The resulting Cell Type Knowledge Explorer(CTKE) was first launched March 13th, 2022.
The Cell Type Knowledge Explorer is developed specifically for reuse of data from the BICCN consortium. Information presented is freely and publicly accessible and links to many relevant external sources. Individual web pages include information from multiple reference projects and associated publications. Pages are divided into themes (e.g., “transcriptomics”), and each section includes links to the source data, exploration tools, and to a project summary page for access to primary data. Each section has a series of images based on these data with documentation. The CTKE is supported by a knowledge base which enables cell type search and which links to an external provisional cell ontology. Finally, additional text, metadata, and relational links put information on each page in an appropriate biological context. In summary, this data reuse resource combines public data from multiple sources to present a novel view of cell types in primary motor cortex.
All researchers aiming to reuse data or develop data reuse applications should ideally link back to original data sources and associated publications. This allows increased fidelity of access to the original data and will also ensure that there is minimal confusion about data versions or sources and provenance. To increase robustness, meticulous documentation for how these data are used in a new application should be kept. Code used for processing data and visualizing results should be kept in public repositories.
The CTKE reuses a collection of independently generated data sets and combines and integrates these datasets to present a novel view of cell type organization. Each contributing dataset puts forth many cell types spanning a single cell type modality and resulting taxonomy. The CTKE combines appropriate subsets of each data set with an overarching ontology to instead focus on multimodal features of a single cell type.
We leveraged FAIR principles to facilitate reuse of our data throughout. Each page of the application links to project summaries, which are ID-based permanent pages that link to relevant data, metadata, funding, details, and associated publications (Findable and Accessible). Second, metadata associated with each project are saved as part of a formal ontology, and cell subclasses are assigned aliases that are part of a controlled vocabulary, allowing direct comparison of data and metadata between studies and direct integration into the CTKE (Interoperable). The formal ontology itself is developed in a way that supports interoperability with other ontologies. The CTKE itself is an example of the reusable principle (Reusable), as this product includes reusable data from multiple sources.
While the CTKE was built in collaboratively with dedicated teams of software developers and data ontologists, several aspects of its design make it possible to recreate our approach. First, all data highlighted in the application are publicly available in existing data archives. Second, the code used to generate and store cell type-centric data visualizations and associated metadata are open source and freely available in an accompanying GitHub repository. We provide a general framework for the development of data-driven ontologies (see Supplemental links) for single-cell transcriptomics datasets that can be used to power the search capabilities present in the Cell Type Knowledge Explorer.
Potential for Community Engagement and Outreach
The Allen Institute has been a major resource for neuroscience data for nearly 20 years since the development and completion of the Allen Mouse Brain Atlas in 2006. Since this time, the Institute has produced numerous public resources that have facilitated the process of scientific discovery and, we hope, positively influenced data sharing, reuse, and open science. Our continuing focus on open neuroscience is in a multi-year phase to understand cell types in the brain, bridging cell types and brain function to better elucidate the structure and function of healthy brains and alterations in disease.
The focus and mission of the Institute has been central to planning our scientific goals and is a strength of our collaborative approach. We believe in the transparency of rapidly sharing data and results, and FAIR data science, which is fundamental to our central working model. The CTKE is a direct synthesis of its constituent datasets and as a data reuse resource can positively encourage community engagement in the active science of cell types. We believe that the CTKE reflects a summary of some of the best features of the Allen Institute products developed for the community.
Supporting Information (Optional)
Include links to relevant and publicly accessible website page(s), up to three relevant publications, and/or up to five relevant resources.