An open pipeline for antiviral drug discovery
short description
To nucleate a global antiviral pipeline to prevent future pandemics, we created a new model for open science accelerated drug discovery.
Submission Form - Scored Questions
Please complete all questions related to your project. All questions in this section are scored.
Category of submission
Data sharing
Overview / Abstract

The COVID-19 pandemic has demonstrated the enormous cost of being unprepared for the emergence of new pandemics, with the current global death estimated to be at least 24.6 million people. Preventing future pandemics will require a stockpile of direct-acting oral antivirals with activity against viruses of human pandemic potential. While governments are now beginning to invest in antiviral discovery, the cost and complexity of drug discovery programs and the need for broad coverage requires coordination of global resources to efficiently focus effort, reduce duplication, and enable pharma and academia to easily initiate new discovery programs to produce differentiated antivirals.

In our earlier COVID Moonshot effort, we leveraged an open science strategy of rapidly disclosing compound, structural, and assay data into the public domain, spurring numerous groups to build on our work and contributing key data to the discovery of Ensitrelvir—now approved in Japan and fast-tracked by USFDA. We build on this success for our NIAID-funded AI-driven Structure-enabled Antiviral Platform (ASAP) Discovery Consortium, developing a recipe for sharing valuable discovery data for an entire portfolio of antiviral discovery programs. This new data sharing model aims to enable the global antiviral discovery community to assess validated targets, track our progress, build on our data, and rapidly initiate their own discovery efforts from our structures, assays, reagents, and chemical matter.

Data sharing or reuse recipe title

An open science pipeline for small molecule drug discovery that aims to nucleate a global pipeline of differentiated therapeutics, accelerate the study of target biology, and communicate opportunities for new discovery while ensuring clinical developability of discovery assets.

Data Sharing or Reuse Practices

Our Philosophy: 

  • Drug discovery generates useful outputs besides drugs: We treat every major research output as a first-class product, rather than a costly byproduct of discovery.
  • Prioritize impact: We prioritize access to resources that will save others significant time and money in initiating new discovery efforts or researching viral biology
  • Fast but FAIR: All data is ultimately deposited in durable, domain-focused, FAIR repositories where it can be found even if not needed for decades, while we can rapidly stage new data during the slow deposition process, and index everything in a highly organized manner via our open source website. 
  • Minimize friction: We aim to minimize friction to access and use our data and reagents.
  • Explicit licenses: We aim to conform to the Reproducible Research Standard and use the least restrictive licenses whenever possible.

Website: Our website [] indexes all data in an easily navigable manner and stages data in the (often slow) process of deposition to durable community-focused FAIR-compliant databases. Our website is open source [] and based on the popular Hugo static site generator, making it possible to fully build and serve for free via GitHub Pages. The website source code can be easily forked and tailored (by modifying YAML data files) to support other organizations.


Our data is indexed in two major ways:

By discovery program: indexes all data associated with each drug target, enabling researchers to easily initiate their own programs, monitor our progress, purchase chemical probes to study viral biology, or browse or download data on structures or compound activities for research purposes.

By research output type: indexes all data generated by type of research output, enabling researchers to access detailed assay protocols, browse or download X-ray structures, directly purchase potent antiviral compounds at cost, access assay data, or obtain entire target enabling packages to initiate their own discovery programs.


The major types of output data we share for each drug discovery program are:
Targeting Opportunities summarizes relevant information about each target for drug hunters, including the relevant domain and binding sites, notable chemical matter, rationale for antiviral effect, and other useful information for setting up and prosecuting a drug discovery compaign.

All Molecules synthesized, assayed, and released are listed in a searchable and downloadable form. Assay data is also ultimately deposited in durable FAIR domain-focused repositories such as ChEMBL.

ASAP generates hundreds of novel X-ray Structures which are rapidly released in searchable, interactive, downloadable form via Fragalysis. All structures are ultimately deposited in the RCSB and PDBe

ASAP rapidly publishes Preprints describing progress to each major milestone of each discovery program to communicate important trends in the data, such as structure-activity relationships (SAR).

ASAP uses the analysis of circulating variants and deep mutational scanning (DMS) technologies identify functionally tolerated target protein mutations to inform the design of antivirals robust to resistance.

Each Target Product Profile (TPP) describes the desired characteristics of the drug product the program aims to produce to treat particular diseases. ASAP works with global stakeholders to align TPPs to ensure they meet the needs of communities and fulfill our mission of global, equitable, and affordable access to therapies.

Each Target Candidate Profile (TCP) describes the objectives we aim to achieve to produce preclinical candidate molecules for the corresponding TPP.

Each assay cascade—set of assays and progression criteria we use to efficiently achieve the TCP—is described in detail, providing complete protocols for in-house assays through, and listing vendors and assay catalog names from Contract Research Organizations (CROs). 

Target Enabling Package (TEP)—the complete data package needed to enable a structure-based drug discovery campaign against a target. Each TEP contains relevant protein constructs for targets; protocols for expression, purification, and crystallography; structures from an X-ray fragment screen; small molecule hits; and biochemical assay protocols with at least one validated inhibitor with quantifiable activity. Plasmids are made available via AddGene under the unrestricted OpenMTA, protocols via, and chemical matter purchasable directly from Enamine at cost. 

Assay data: All assay data is made available for hit-to-lead immediately, and released quarterly for the subsequent lead optimization phase after review to ensure we can patent a single compound to enable downstream licensing practices that ensure globally equitable and affordable access. (Details of ASAP’s open licensing model are being prepared in a separate preprint.)


Our scope: ASAP’s data sharing paradigm focuses on pandemic preparedness, but can be applied to any small molecule drug discovery pipeline for the public benefit where the goal is to enable limited global resources to self-organize to combat major threats to human health in areas such as viral pandemics, antimicrobial resistance, neglected tropical diseases, rare diseases, or other areas where therapeutics are sorely needed. 

Why this is novel: Even premier nonprofit organizations dedicated to drug discovery for neglected diseases such as the Drugs for Nelgected Diseases Initiative DNDi [] and the Medicines for Malaria Venture [] focus on high-level conceptual overviews, rather than on details that would actually accelerate global drug discovery efforts.

How do we know that sharing drug discovery data can have impact? Because we’ve done it successfully in our COVID Moonshot project—the direct predecessor to ASAP. First, we found that publicly and openly sharing X-ray structures and biochemical potency data for molecules we synthesized for our drug discovery effort led to a large number of scientists around the world accessing this data and using it in their own research. Second, though it took us a number of months to assemble all the openly shared data into a preprint providing additional insights into structure-activity relationships and other findings, this preprint was widely downloaded (over 10,000 times). Third, we know of at least one instance in which data we rapidly shared into durable discipline-focused FAIR repositories like the RCSB inspired a marketed drug for SARS-CoV-2, as our data is cited in the paper describing the discovery of Ensitrelvir by Shionogi (now approved in Japan and fast-tracked by the USFDA).

Open science data sharing is essential for making the most of limited global investment in drug discovery for pandemic preparedness (and other areas): Drug discovery is an incredibly expensive enterprise, where the total costs of failed discovery efforts are now so high that each approved drug costs over $2.5B in R&D investment. However, an enormous amount of valuable data, resources, reagents, and materials are generated in the course of a discovery program that can accelerate or jump-start other drug discovery efforts that take differentiated paths or aid biologists aiming to better understand viral biology to probe weaknesses. While most companies seek to block competitors from building on this data, the limited global investment in drug discovery for pandemics, the enormous threat posed to humanity and lack of market incentive make it essential for these global resources to be used efficiently. By sharing drug discovery data openly, we can simultaneously coordinate global discovery efforts simply by making it possible for other efforts to known when we are working on similar targets and chemical series, help nucleate differentiated drug discovery efforts by enabling other teams to save huge amounts of time and money by starting from target-enabling packages, and aid researchers in better understanding viral biology by providing them with cheap chemical probes that will help dissect biochemical pathways.

While more efforts sharing discovery data accelerates all discovery efforts, not all efforts need to share data for our approach to help coordinate and accelerate discovery. As we have seen, it is possible for closed commercial discovery efforts to benefit from open efforts to reduce discovery timelines, still enabling overall efficiency improvements in the deployment of limited global resource.

How to learn from this project

We believe our data sharing model is easy to understand and straightforward to replicate now that it has been established with ASAP. However, it was non-obvious to produce initially without having the benefit of hindsight from our COVID Moonshot predecessor project, where questions of what data was most valuable for rapidly launching differentiated antiviral discovery programs and catalyzing antiviral research were rapidly answered while the world’s attention turned to COVID. 

We have clearly identified the high-value data produced by structure-based small molecule discovery programs, as well as mechanisms for sharing this data that strikes a balance between rapid dissemination and slower but durable sharing in persistent domain-focused repositories that adhere to FAIR principles. We have also provided a clear approach to indexing this data in multiple useful ways that enable other efforts working in this area to easily coordinate their efforts, initiate new efforts in a manner that saves time and cost, reduce overall duplication of effort, and obtain resources that will aid in key fundamental research.

The ASAP Pipeline and Outputs web pages provide a simple blueprint for data sharing. The data sharing recipe can effectively be “read off” of these structures and replicated by other open drug discovery efforts. In addition, our website is open source [] and based on the popular Hugo static site generator, making it possible to fully build and serve for free via GitHub Pages. The website source code can be easily forked and tailored (by modifying YAML data files) to support other organizations.

Adoption of practice by peers

Detailed publication: We plan to write a detailed living best practices paper describing in detail the research outputs we are sharing, the rationale for doing so, important considerations for doing so in an appropriate and useful manner, and specific recommendations for processes and workflows. This publication will aim to propose a concrete “version 1.0” best practice for openly sharing drug discovery data, with the goal of building consensus across the field to continually improve practice. We will take inspiration from the Living Journal of Computational Molecular Sciences (LiveCoMS), where best practices papers can be continually updated as new learning are integrated, and periodically re-reviewed by external referees.

Software: Where possible, we aim to make the process of data capture, annotation, and sharing implemented as open source modular software that can also be shared and adapted to other drug discovery efforts.

Persuading peer organizations to adopt our approach: Once established for antivirals through our connections with other NIAID-funded Antiviral Drug Discovery (AViDD) Centers, we aim to systematically engage with other efforts for which this model would be suitable for disseminating knowledge and coordinating limited global resources in drug discovery—such as antimicrobial resistance, neglected tropical diseases, and rare diseases. We have already been invited to present our data sharing approach in detail to the Drugs for Neglected Diseases Initiative (DNDi) and the Medicines for Malaria Venture (MMV) for them to consider adoption of this scheme.

Sustainability: We have spoken with the NIH NIAID ODSS data science team, who are enthusiastic about assisting other NIH-funded Antiviral Drug Discovery (AViDD) Centers adopt our data sharing recipe and have sustained funding within the NIH to build and maintain persistent data commons infrastucture. While ODSS has substantial data science and data sharing expertise, they currently lack relevant expertise around sharing chemical, structural, and drug discovery data that our recipe focuses on.

Prize: Notably, we aim for the prize money to be distributed among the junior scientists participating in ASAP that are systematically underpaid by institutions (something we are unable to correct due to institutional policies) but whose tireless effort has been the basis for our ability to share this highly valuable discovery data with the world.


The COVID Moonshot was a spontaneous global open science collaboration nucleated by the common desire to rapidly discovery, develop, and manufacture an inexpensive but effective oral SARS-CoV-2 antiviral. When the Haitao Yang lab in Shanghai was forced to shut down after sharing the structure of the SARS-CoV-2 main viral protease online 14 Feb 2020, they transmitted the plasmid sequence to collaborators at Diamond Light Source in the UK to carry the torch.

Frank von Delft at Diamond/Oxford (along with many colleagues) was able to rapidly prosecute a high-throughout X-ray fragment screen, releasing 78 structures of small druglike molecules bound to the protease online (and on Twitter) by 18 Mar 2020. Alpha Lee (PostEra) quickly joined the effort, setting up an online data sharing infrastructure to disseminate data and engage with interested scientists around the world. Enamine, a synthetic chemistry CRO in Ukraine, committed to synthesize molecules essentially at cost. Nir London (Weizmann) set up biochemical assays to assess potency, and launched a crowdsourcing effort to solicit designs that grew the fragments into leads. Ed Griffen (MedChemica) joined as the lead medicinal chemist to reshape the Moonshot into a serious drug discovery effort. John Chodera (MSKCC) scaled computational chemistry on the worldwide volunteer distributed computing network Folding@home—which rapidly grew to over a million volunteers and became the world’s first exascale computing network. Annette von Delft (Oxford) joined to coordinate cellular assays and clinical translation, and Ben Perry and Peter Sjö (Drugs for Neglected Diseases Initiative) joined to coordinate preclinical strategy. Numerous junior researchers enabled the open science effort to deliver a novel oral antiviral against SARS-CoV-2 within 18 months, without a patent. The WHO Access to COVID Tools Accelerator awarded the effort $11M to carry the antiviral to clinic-readiness, and the effort is currently working with generics manufacturers aiming to begin clinical trials early next year with the goal of global equitable and affordable access.

ASAP builds on this team, adding an explicit Data Infrastructure Core to manage and disseminate data we generate. ASAP is currently funded by a NIAID Antiviral Drug Discovery (AViDD) U19 award to run multiple antiviral discovery programs for pandemic preparedness. ASAP is led by an administrative core headed by John Chodera, here the team lead.

Key principles
small molecule drug discovery; structure-based drug discovery; open science drug discovery; open science drug discovery for the public good
Video: How to learn from this practice
Research disciplines
small molecule drug discovery; direct-acting antivirals; antiviral research; structure-based drug discovery
Supporting Documentation
Include links to relevant and publicly accessible website page(s), up to three relevant works that resulted from using your data sharing or reuse recipe or which were integral to the development of these practices publications, and/or up to three relevant resources.
Supporting Documentation 1
Supporting documentation 2
Supporting documentation 3
Supporting documentation 4 (optional)
Supporting documentation 5 (optional)
Team information - Not Scored
Please respond to these questions related to team participation in the challenge. Your responses to questions in this section will not be scored by judges.
Entity Participation
My team is participating as an independent team
IDeA State Status (not scored)
n/a, I am not participating as part of an entity
Minority Serving Institution (not scored)
n/a, I am not participating as part of an entity
Participation in 2022 DataWorks! Prize
no, our team captain and/or the majority of our team did not participate in the 2022 DataWorks! Prize
Eligibility Requirements
yes, I have read and understand the eligibility requirements

comments (public)