Submission

introduction

title

Advancing Health Equity with AI-driven Analysis

short description

Using AI/ML to analyze data from global repositories, we uncover health disparities and inform interventions for equitable health outcomes.

Phase 1 Submission Form

Overview / Abstract

This project leverages data from GREI repositories and food desert datasets to analyze clinical trials, patient-reported outcomes, and social determinants of health. This study aims to identify how community-level factors like food access contribute to health disparities in treatment outcomes. Utilizing HeartBeat, a proprietary AI-powered data management tool, and Dorothy, an advanced AI for NLP and data synthesis, we will integrate diverse datasets, including food access data from USDA, CDC, and other sources, to uncover patterns that can inform targeted public health interventions.

Secondary Analysis: Research Aims

Our project will conduct a secondary analysis to uncover community-level factors driving disparities in treatment outcomes among ethnic groups, rural areas, and low-income populations. By integrating data from seven GREI repositories and food desert datasets, we aim to generate actionable insights for public health interventions.

Data Utilization

We will use structured and unstructured data from the following GREI repositories, accessed via APIs for efficient data retrieval:

Zenodo: Structured clinical trial data, including patient demographics and outcomes, plus audiovisual data. Roughly 50 datasets and media files will be analyzed.
Dryad: Patient-reported outcomes with qualitative data such as patient satisfaction and symptom reports, along with images or structured text. Around 30 datasets will be used.
Harvard Dataverse: Data on social determinants of health, including socioeconomic and environmental factors. We will use ~20 datasets, including structured text and databases.
Figshare: Research outputs, including reports, structured graphics, and images relevant to public health. Approximately 25 outputs will be analyzed.
Mendeley Data: Biomedical and clinical datasets, including scientific formats and software applications, with ~40 datasets.
Open Science Framework (OSF): Multidisciplinary research materials, including scientific documents and plain text. Around 20 datasets will be used.
Vivli: Clinical research data, including individual participant-level data (IPD) and clinical trial data, with ~15 datasets.

Methods and Analysis

Using HeartBeat and Dorothy, our AI, the project will involve:

Data Integration: HeartBeat will harmonize structured and unstructured data across datasets, including images, reports, and raw data. Dorothy will perform natural language processing to extract insights from textual data.
API Connection: Data from the GREI repositories will be accessed via APIs, allowing for automated, scalable data retrieval and integration into our analysis pipeline.
AI-Driven Analysis: Dorothy will use NLP to analyze qualitative data, identify themes, and generate summaries. HeartBeat will employ machine learning techniques like regression and survival analysis to detect trends and correlations.
Ethical Considerations: We will follow FAIR principles to ensure data is findable, accessible, interoperable, and reusable. CARE principles will guide the ethical handling of sensitive data, especially for vulnerable communities.

Timeline

The project will be completed in 6 months with key milestones:

Months 1-2: Data Acquisition and Preparation
- Secure datasets from all repositories via API.
- Conduct data quality checks and harmonize datasets.
Months 3-4: Data Integration and Initial Analysis
- Integrate datasets and perform preliminary analysis.
- Identify key patterns and correlations.
Months 5-6: Final Analysis and Dissemination
- Complete the AI-driven analysis and validate findings.
- Prepare results for dissemination through publications and public health channels.

GREI Repository Data Sets

Dataverse
Dryad
Figshare
Mendeley Data
Open Science Framework (OSF)
Vivli
Zenodo (CERN and Northwestern University)

DOI (Digital Object identifier) of GREI Repository Dataset

We will utilize datasets from the following GREI repositories via API, incorporating structured and unstructured data such as images, reports, and raw datasets:
- Zenodo: http://doi.org/10.17616/R3QP53
- Dryad: http://doi.org/10.17616/R34S33
- Harvard Dataverse: http://doi.org/10.17616/R3C880
- Figshare: http://doi.org/10.17616/R3PK5R
- Mendeley Data: http://doi.org/10.17616/R3DD11
- Open Science Framework (OSF): http://doi.org/10.17616/R3N03T
- Vivli: http://doi.org/10.17616/R3SB9S

Outcomes and Outputs

Our project will uncover key factors driving health disparities, influencing scientific research and public health policy.

Research Findings and Expected Outcomes

We will identify community-level factors like socioeconomic status, healthcare access, and food access that contribute to disparities in treatment outcomes. By integrating clinical trial data with social determinants of health and food desert data from sources such as the USDA Food Access Research Atlas and Feeding America's Map the Meal Gap, we aim to reveal patterns explaining these disparities. The findings will guide targeted public health interventions aimed at reducing inequities.

Sharing and Dissemination of Findings

Our findings will be shared widely through:

Publications: Submitting research to open-access journals like the American Journal of Public Health for visibility among researchers, policymakers, and health professionals.
Data Repositories: Depositing datasets and analysis scripts in GREI repositories like Zenodo and Dryad, as well as food desert data sources such as the USDA and Map the Meal Gap, with detailed metadata for enhanced discoverability.
Conferences: Presenting at major public health conferences, such as the Annual Meeting of the American Public Health Association (APHA), to engage diverse stakeholders.
Online Presence: Creating a project webpage with interactive data visualizations, downloadable resources, and links to data repositories.

FAIR and CARE Principles

We are committed to following both FAIR and CARE principles:

FAIR Compliance: Data will be Findable, Accessible, Interoperable, and Reusable, with persistent identifiers and standardized metadata for easy access and reuse.
CARE Compliance: We will handle data ethically, especially data involving vulnerable populations, with community engagement to respect their perspectives.

Replicability and Reproducibility

To ensure replicability and reproducibility:

Comprehensive Documentation: All data processing and analysis procedures will be thoroughly documented and shared alongside the datasets.
Open-Source Tools: Analysis will be conducted using open-source tools, with scripts shared on platforms like GitHub to facilitate replication.
Validation: We will validate findings using subsets of data, strengthening the robustness and credibility of our conclusions.

By adhering to these principles and strategies, our project will deliver impactful, ethical, and accessible insights for the broader research and public health community.

Impact/ Scientific Significance

Our project will significantly advance public health by addressing health disparities through advanced data analysis. By focusing on community-level factors influencing treatment outcomes among ethnic groups, rural areas, and low-income populations, our work will have broad implications for healthcare practices and public health policies.

Scientific Contributions
Integrating clinical trial data, patient-reported outcomes, social determinants of health, and food access data from sources such as the USDA Food Access Research Atlas and Map the Meal Gap, our research will uncover complex factors driving health disparities. This holistic approach marks a critical step forward in understanding and addressing these disparities.

Health Equity: Our findings will show how factors like socioeconomic status, healthcare access, and food deserts impact treatment outcomes, offering insights to develop targeted interventions that promote equity.
Innovative Data Reuse: Leveraging GREI and food desert data demonstrates how secondary analysis can reveal new insights without additional data collection, maximizing existing datasets' value and showcasing the potential of data reuse in public health.

Impact on Healthcare
Our research will inform clinical practices and public health strategies:

Personalized Interventions: Identifying factors that contribute to health disparities will allow healthcare providers to design more effective, community-specific interventions, improving outcomes for marginalized groups.
Policy Influence: Findings will guide policymakers in resource allocation, prioritizing community-level interventions to reduce disparities, including improving access to healthy food in underserved areas.

Best Practices in Data Reuse
Our project will serve as a model for data reuse in public health:

Methodological Innovation: We will document and share data harmonization techniques, offering a blueprint for others to replicate or adapt our approach.
Ethical Data Management: Adhering to FAIR and CARE principles, we will ethically manage sensitive data, particularly for vulnerable populations, contributing to guidelines for ethical data reuse.
Enhancing Reusability: By depositing our datasets with comprehensive metadata into GREI repositories and linking to food desert data sources, we will support future research and enhance data accessibility.

Long-Term Impact
The project’s impact will extend beyond immediate findings:

Building Research Capacity: By integrating diverse datasets, we will inspire more researchers to engage in secondary analysis, expanding health disparities research.
Educational Outreach: Our findings and methods will be incorporated into educational materials, training future public health professionals to address health disparities.

Our project will have a lasting impact by advancing scientific knowledge, shaping public health policies, and improving health outcomes in underserved communities.

Team

The Data Love Co., co-founded by Jasmine Motupalli and Irzana Golding, combines expertise in AI, data science, and strategic leadership to address complex challenges through advanced analytics.

Jasmine Motupalli: Jasmine, with an M.S. in Operations Research, has a strong background in data science and statistical analysis. She has led AI-driven projects in both government and corporate sectors, including the development of predictive analytics tools at the Pentagon and customer intelligence strategies at Gusto. She is a PhD student at the University of Denver researching the role of emerging technologies in promoting equity. Jasmine is responsible for leading the statistical analysis and AI integration in this project.
Irzana Golding: Irzana holds a B.Sc. in Mathematics and Statistics and specializes in business intelligence and data strategy. She has led large-scale data management initiatives at Cisco and Gusto, focusing on the implementation of AI-driven analytics and the orchestration of complex data ecosystems. Irzana will guide the project’s strategic direction and contribute her expertise in data integration and analysis.

We collaborate closely, leveraging our complementary skills and a shared commitment to innovation. Regular strategic meetings and collaborative tools ensure effective communication and project management. Our combined experience in statistical analysis, AI, and data management positions us to deliver robust, actionable insights for this project.

Considerations

Key considerations to ensure the success of this project include:

Data Quality: Rigorous validation processes will be employed to ensure the accuracy and completeness of all datasets used. This includes cross-referencing data with existing records and ensuring that metadata is comprehensive and up-to-date.
Stakeholder Engagement: We will engage with public health experts and community representatives throughout the project to ensure that our findings are relevant, actionable, and respectful of the communities involved.
Ethical Compliance: Adherence to ethical standards, particularly regarding the use of sensitive data, will be paramount. This includes ensuring that all data use aligns with the principles of FAIR and CARE, and that all findings are disseminated in a way that respects the rights of the communities involved.

Supporting Documents

Provide up to 10 resources for the evaluation of your secondary research project including but not limited to: ● The persistent identifier of the dataset(s), other than GREI dataset DOIs already listed above, to be used in the proposed project (where available) ● Tools/workflows or resources to be utilized in the proposed project ● Relevant references or scientific publications that directly relate to the proposed project

Supporting Document (1)

https://dataloveco.com/wp-content/uploads/2024/08/Supporting-Documents.pdf

Supporting Document (2)

https://github.com/dataloveco

Supporting Document (3)

https://doi.org/10.5195/jmla.2018.283

Supporting Document (4)

https://doi.org/10.1038/s41562-016-0021

Supporting Document (5)

https://doi.org/10.1093/jamia/ocz047

Supporting Document (6)

https://doi.org/10.1177/00333549141291S203

Supporting Document (7)

https://doi.org/10.1007/s11606-024-08708-8

Supporting Document (8)

https://doi.org/10.1353/hpu.2015.0083

Supporting Document (9)

https://doi.org/10.1177/136346150003700307

Supporting Document (10)

https://map.feedingamerica.org/

Non Scored Criteria

Please complete this information. It will not be scored by the evaluation panel.

Entity Participation

Participate as an Entity (i.e., registering as a group of individuals competing together on behalf of a legally established organization, institution, or corporation)

Legal Entity Organization Name

The Data Love Co.
7494 S Downing Cir E
Centennial, CO 80122

Research Discipline (non-scored criteria)

1. Public Health
2. Data Science and Analytics
3. Health Informatics
4. Social and Behavioral Sciences
5. Epidemiology

IDeA State (non-scored criteria)

All Team Member Information - Name, Organization, Job Title, and Email address

Point of Contact/Team Leader: Jasmine Motupalli, The Data Love Co., Co-CEO, jasmine@dataloveco.com
Data Lead: Irzana Golding, The Data Love Co., Co-CEO, irzana@dataloveco.com

MSI (non-scored criteria)

Participation in prior DataWorks! Prizes (non-scored criteria)

DataWorks! Prize Prior Participation - Team Name

N/A

Team Point of Contact Eligibility

yes

Eligibility (non-scored criteria)

Yes, I confirm that I have read and meet the terms of eligibility for this challenge

Was this page helpful? yes no