To understand and assess the uncertain effects of COVID-19 on people affected by cancer, CCC19 was founded in March 2020 and developed a robust and agile strategy to collect and disseminate prospective, granular, uniformly organized information on patients with cancer diagnosed with COVID-19 — at scale and as rapidly as possible. This systematic data sharing recipe included three key components: data sourcing and acquisition, data management, and data model sharing.
Taking inspiration from existing best practices, CCC19 sought to accelerate clinical research by facilitating data sharing amongst 126 cancer institutions across North America and eventually logged more than 19,000 cases – the largest registry of its kind. Data standardization is managed through existing clinical vocabularies whenever possible. Through continuous quality assurance of contributed data from participating institutions, CCC19 ensures compliance and standardization with registry-based research standards. Our knowledge is publicly accessible because of the direct features of REDCap that enable local reusability, and open code sharing on GitHub. To further emphasize best practice “recipes” to advance biological and biomedical research activities, all CCC19 publications, the data model, and derived variable code are publicly available. CCC19’s transparent and streamlined approach to data management demonstrates the power of data sharing practices to advance scientific discovery and human health.
Crowdsourcing for a Catastrophe: The COVID-19 and Cancer Consortium Cookbook
The CCC19 data sharing and reuse strategy balances managing collaboration between multiple institutions with the goal of rapidly delivering reliable research on the impact of COVID-19 on people with cancer. The CCC19 data sharing strategy encompasses three ingredients: data sourcing and acquisition, data management, and data model sharing.
Data sourcing and acquisition
CCC19 was inspired by many efforts, such as AACR Project GENIE’s goal to accelerate precision oncology by aggregating genomic and clinical data from multiple cancer centers and institutions. As a “rare condition”, the confluence of COVID-19 and cancer could not be understood at any single institution, region, or country. Due to complex jurisdictional considerations, CCC19 was necessarily limited to North America. Anonymous data acquisition is accomplished using REDCap, a secure, open-source, web-based software platform designed to support the customization of data entry forms through branching logic – users can “choose their own adventure” to quickly focus on the relevant data items. The CCC19’s model for streamlining data collection aims to accelerate the pace of discoveries and bridge the gap between research findings and clinical applications.
The overall vision of CCC19’s data management strategy is to maximize accuracy and completeness while enabling a flexible data model in the face of a great deal of uncertainty. All variables collected have an “unknown” option and the survey instrument is editable by the anonymous person or persons filling out the information. Following the example of the Center for International Blood and Marrow Transplant Research (CIBMTR) quality control strategy, CCC19 developed a quality score system and rubric used to evaluate case reports for targeted improvement. With a point person at each participating institution, it has been possible to objectively improve the quality score through the reduction of unknown and missing data, without revealing the identifies of the survey respondents. The CCC19 REDCap project uses the built-in REDCap audit trail, which logs all changes to the project. Given the sensitivity of medical information, even when de-identified, row-level data is not available without additional safeguards; aggregate information is available through interactive dashboards.
CCC19 employs a systematic and comprehensive review of contributed data and procedures to ensure accuracy, reliability, and compliance with established protocols and standards. Data completeness and quality of data are scored based on the number and severity of identified quality concerns. Participating institutions are graded on a Bronze to Platinum scale to motivate researchers to attain high standards based on the quantity and quality of data contributions. Furthermore, cross-sectional and longitudinal cohort studies were employed to determine COVID-19 outcomes and effectiveness of certain treatments in a faster time frame than could be achieved with conventional clinical trials (https://www.cell.com/cancer-cell/fulltext/S1535-6108(20)30553-5). The high volume of data and the combined expertise of CCC19 enabled efficient research on the effects of COVID-19 on people with cancer.
FAIR data model sharing
CCC19 emphasizes accessibility with clear usage rights and proper metadata, interoperability through REDCap, and reusability to facilitate collaboration and accelerate scientific advancements. Variables are mapped to existing standard terminologies whenever possible, enabling reuse of the data model by other endeavors. Given the flexibility and extensibility of REDCap, institutions can build and maintain local instances for direct data entry and allow for site-specific variables. To maximize the transparency of the CCC19 consortium, the data dictionary is made publicly available through an easily findable and accessible GitHub repository: https://github.com/covidncancer/CCC19_dictionary. The R script used to create all the derived variables used in published analyses and a spreadsheet describing the derived variables is also made freely available through the same GitHub repository. To facilitate the dissemination of scientific knowledge, all conducted scientific inquiries, manuscripts, and articles have been made available without paywall restrictions. This decision allows researchers, scholars, students, and the general public to freely access and benefit from the findings of CCC19.
The CCC19 has addressed key aspects of data sourcing, management, sharing, and accessibility. By embracing open access principles, CCC19 demonstrates a commitment to knowledge sharing and the global advancement of science. The CCC19 facilitates knowledge dissemination and enhances the transparency and reproducibility of research to rapidly deliver reliable research on the impact of COVID-19 on people with cancer.
CCC19 facilitates large scale collaboration
Since its inception in March 2020, the CCC19 has facilitated data collection and analyses of patients diagnosed with cancer and COVID-19. During the height of the COVID-19 pandemic, CCC19 urgently came together to inform patients, healthcare staff, research teams, and healthcare systems all over the world.
Prior to the pandemic, there had not been well-established frameworks for collecting and sharing data across different institutions rapidly; either many adjustments would have been needed to utilize pre-existing databases, or new ones would need to be created entirely. To enable collaboration on a wider and faster scale, the CCC19 implemented the already established REDCap to encourage data sharing between major institutions, as well as regional and smaller health institutions that traditionally did not contribute due to their lower number of clinical cases. REDCap even served organizations beyond the United States of America, to Canada, Mexico and elsewhere. These collaborative efforts were extended even further through REDCap’s feature of a remotely accessible online survey tool, as affected patients were encouraged to directly take part and request their healthcare providers to submit their information into REDCap. In order to generate larger datasets, the CCC19 has fostered a highly diverse and collaborative environment and has more than 600 active members.
CCC19 provides a reproducible recipe for sharing data within a multi-institutional collaboration
Without REDCap, researchers would not have been able to collect large amounts of data through a single platform. Rather, they would have had to draw data from multiple different databases, which would have resulted in more laborious integration; additional time and resources would have been needed for data cleaning, processing, and validation before implementation. The CCC19’s REDCap provided a streamlined, centralized data capture tool that facilitated expansive cancer and COVID-19 patient data collection and sharing.
CCC19 also provided the means for other researchers to reproduce the REDCap project at their own institutions. The CCC19 data dictionary, the derived variables R script, and a spreadsheet of the derived variables with their values are supplied by the CCC19 through its free repository on GitHub. Open access publications all include the list of specific derived variables used in the analysis, and/or pseudocode sufficient to reproduce the analysis completely. Moreover, the patient advocacy committee has developed easy-to-understand videos, Q&A forums, and pamphlets to increase transparency and help the general public understand the research findings of the CCC19.
The CCC19, through its publications and REDCap project, has collected more information than otherwise would have been collected by a singular institution alone, with a rapidity and scale unmatched by similar efforts around the globe. With the ability to acquire large datasets with constrained resources, researchers are able to conduct research more efficiently with reliable results, leading to improved understanding of COVID-19, cancer, and their juxtaposition.
CCC19 mobilized vaccine research and treatment efforts for people with cancer during COVID-19
During the height of the pandemic, the CCC19 came together to impact research and medical practice at the discipline level by collecting data from across North America. Data collected by CCC19 yielded research that evaluated COVID-19 treatments and vaccines and provided evidence for clinical and public health decisions, well in advance of prospective clinical trials. The CCC19 also brought about improved interpretation of COVID-19-related data through uniting an interdisciplinary team of various health specialists, biostatisticians, and other experts. Through large-scale data sharing efforts, the CCC19 has catalyzed advancements and breakthroughs across the research and medical community.
CCC19 responded to a novel concern swiftly and expansively
Prior to the establishment of the CCC19, such subsequent data analyses, research, and medical advancements would not have been able to occur in the timely manner demanded by the COVID-19 pandemic. The CCC19 has introduced fast and wide-scale data collection, sharing, and reuse that has quickly furthered the state of collaboration, efficiency, and quality in research and medical practices, especially as they relate to cancer and COVID-19. In just three years, the CCC19 has published numerous manuscripts which have been utilized and cited in thousands of other publications.
CCC19 has been a catalyst for careers
Many early members of the CCC19 were trainees directly impacted by the pandemic and motivated to make a difference for their patients. Through their involvement in the CCC19, they were able to lead publications, present at major national and international conferences, and obtain faculty positions at prestigious academic universities.
Introduced as a mission goal of the National Cancer Policy Forum in 2009, “learning health systems” describes a framework for modern healthcare organizations that aims to rapidly integrate large volumes of healthcare data in an iterative and real-time fashion to encourage high-quality clinical decision-making at scale. The CCC19 represents a collaborative case study in the implementation of a learning health system model, which could be used to inform similar efforts across myriad healthcare applications. The success of the CCC19 is largely reflective of the following core principles of data sharing practice:
Principle 1: Team science and data governance
A pillar of successful implementation of learning health system models is the establishment of a cooperative interdisciplinary team. The CCC19 swiftly created a multi-institutional collaboration in response to a pressing need for rapid data assembly. A primary mode of outreach was social media platforms including X (previously known as Twitter). Iterative sharing of information via scientific conferences, news releases, and scientific publication can each contribute to team growth.
Upon establishment of a data sharing partnership, it is critical to clarify expectations for best practices regarding future collection and sharing. The CCC19 required a Data Transfer and Use Agreement (DTUA) for participating institutes and established working groups with expertise leveraged to meet specific needs of the evolving multidisciplinary group. Such practices can ensure effective participation among diverse users, facilitating successful dissemination of large-scale data.
Principle 2: Flexible data infrastructure
As data complexity and availability grew, a versatile framework and infrastructure for collection, analysis, and sharing has become increasingly critical. Our team utilized REDCap as the foundation for data collection, as it is widely standardized and does not require informatics expertise. REDCap also allows for iterative feedback to modify survey instruments as knowledge and data evolve in real time.
Principle 3: Transparency
Documentations outlining the data curation process were shared through scientific publications and repositories. Manuscripts were open access to improve accessibility, and data were made available upon request to the CCC19 team. These approaches of developing flexible, yet transparent data structures promote collaboration, iteration, and adaptation within a learning healthcare system.
The core principles and methodology of the CCC19 are primarily disseminated through traditional scientific channels–rigorous peer reviewed journals and scientific conferences. As evidenced by the rapid growth of the CCC19 to over 100 participating institutions in months following its inception, social media platforms can also be catalysts for nearly instantaneous connections that were historically slow to develop via conventional forms of scientific collaboration. Our diverse and international collaborators have subsequently developed a global presence among the oncology community.
As awardees of the 2023 DataWorks prize, an additional primary goal of our team would be to share our experience in diverse scientific settings, including non-oncologic academic conferences and peer-reviewed scientific journals. In doing so, we can ideally obtain the largest outreach to both academic and public audiences with the ultimate aim of shaping data sharing practices among the general population. Several of the groups’ collaborative prior publications have yielded exceptionally high Altmetric attention scores, suggesting that these data have captured widespread interest within both the scientific community and general public. For peers interested in applying our practices to their own needs and interests, we recommend the initial development of a plan for collaboration utilizing the core principles outlined above, which may be applied in a variety of settings to address complex and multidisciplinary problems.
We believe that many of our innovations are reusable and extensible beyond the current COVID-19 pandemic. The spirit of collaboration around a common threat with unknown consequences, the rigorous but inclusive governance structure, and the mechanisms of outreach even in the context of anonymity can all be broadly applied. We have also learned valuable lessons that would help advance and ameliorate the next generation of similar efforts, such as the importance of standardized operating procedures, engagement of consortium members at all levels of training and diversity of backgrounds, as well as the consideration of potential disagreements such as academic authorship and the ever-present possibility of bureaucratic entanglement. Notably, our achievements were made with very limited funding and were driven primarily by the generosity and passion of the members and participating sites of our consortium.
How CCC19 started
The COVID-19 and Cancer Consortium (CCC19) is a crowdsourced effort by the medical and patient advocacy community to investigate the implications of COVID-19 for people with cancer. CCC19 started as a voluntary initiative in March 2020 and grew into a multinational endeavor by June 2023 with 126 participating institutions and data from 19,275 patients with cancer. Initial incubation of the idea for the consortium began through the website HemOnc.org and through social media channels, most notably the site formerly known as Twitter. A medical resident, Dr. Aakash Desai, highlighted the scarcity of research through a tweet which was retweeted broadly by #medtwitter influencers with collectively tens of thousands of followers. Spurred by Desai’s tweet, a team at Vanderbilt University Medical Center (VUMC) spearheaded the formation of the CCC19 data collection mechanism. Within 72 hours, the VUMC team had developed a REDCap database, gained an IRB exemption for collecting de-identified clinical data, and obtained the CCC19.org domain name from a Scottish eco-village. During this same period a governance structure was established by 5 founding institutions. By May 2020, CCC19 had expanded to nearly 90 institutions; over time, hundreds of individuals have joined the effort, with more than 600 active members currently.
A multifaceted approach to data management
A patient-centric data management philosophy was conceived by the CCC19 steering committee, consisting of experts in oncology, hematology, viral epidemiology, clinical informatics, and biostatistics. Other aspects of CCC19 leadership were tackled by domain expert sub-committees: biostatistics, publications, funding, and a patient advocacy committee (PAC). Central to the consortium, the PAC was built on other successful endeavors of engaging patients such as the Count Me In program, which encourages patients with cancer to share their personal medical records with clinical researchers, and OpenNotes, which provides patients more access to their medical records. The success of CCC19 is best attributed to the patients with cancer whose clinical data and perspectives built CCC19, and the researchers who collaborated with CCC19 despite not being financially compensated for their time.