menu

Submission

introduction
title
Machine learning and suicidal trajectories
short description
Leveraging predictive and causal machine learning to understand suicidal trajectories from childhood to adulthood for effective prevention
Phase 1 Submission Form
Overview / Abstract

Suicide is a significant and escalating public health concern. In 2022, around 50,000 lives were lost to suicide in the U.S., 13.2 million people reported suicidal ideation and 1.6 million reported a suicide attempt (HHS). Suicide is the second leading cause of death among youth aged 10 to 24 years, with rates increasing around 50% in the past two decades (CDC). The likelihood of engaging in suicidal thoughts and behaviors (STB) varies over the life course, and understanding these dynamics is critical for effective prevention.

 

While previous work has identified risk and protective factors associated with STBs, no studies have focused on the transitions from suicidal thoughts to behaviors using state-of-the-art machine learning techniques for prediction and causal inference. The proposed projects aim to use two longitudinal datasets from the GREI to examine suicidal trajectories and their risk and protective factors. We hope this work informs targeted suicide prevention strategies.

Secondary Analysis: Research Aims

This study will use two secondary, longitudinal, national datasets available through the GREI repositories: 1) ABCD: The Adolescent Brain Cognitive Development Study (11,878 participants ages 9-15 years old; five waves from years 2016 – 2022) available through Mendeley Data and NDA; 2) Add Health: The National Longitudinal Study of Adolescent to Adult Health (20,000 participants ages 13-43 years old; five waves from years 1994 – 2018) available through Dataverse and ICPSR. STBs are assessed in both datasets via parent- and participant-report surveys. Current and past suicidal ideation and attempts are reported. Each also contains a wide range of potential risk and protective factors such as sociodemographics, social determinants of health (SDoH), family history, and Census-linked, neighborhood-level external data. Tables 1 (ABCD) and 2 (Add Health) describe the sample characteristics. We will use re-sampling and weighting techniques to ensure representative samples across waves and demographics.

 

For Aim 1, we plan to predict the following outcomes: 1) transition from no STB to suicidal ideation (SI); 2) transition from SI to suicide attempt (SA). We will use a combination of supervised classification and feature importance algorithms to identify the risk and protective factors most strongly predictive of these transitions. We hypothesize that SDoHs will emerge as predictive factors of STB transitions across developmental stages. We will design our predictive model using the GPBoost algorithm, combining tree-boosting with mixed effects models for supervised classification of longitudinal data. To ensure model fairness across demographics, metrics such as predicted positive rate, predicted positive group rate and group size ratio will be assessed. SHapley Additive exPlanations (SHAP) will be applied to uncover the set of variables with the most significant impact on prediction performance both overall and by sociodemographics and their intersections. 

 

For Aim 2, the causal effects of risk and protective factors for transitions to STB will be estimated. While traditional machine learning is limited to explaining patterns and correlations in data, causal machine learning goes beyond associative prediction by modeling interventions. We hypothesize that SDoHs will have a significant causal effect on STBs across developmental stages. Dynamic double machine learning (DoubleML) will be used to approximate the Conditional Treatment Effects (CATEs) for a given longitudinal outcome (i.e., suicidal trajectory) to understand the impacts of treatment (e.g., a protective factor) on important subgroups and their intersections. 

 

The datasets will be prepared for analysis through December. From January to February 2025, Aim 1 will be tested. From March to April 2025, Aim 2 will be tested. Results will be drafted from May to June 2025, and the algorithms used will be developed into our Suicide Risk Dashboard.

GREI Repository Data Sets
Dataverse
Mendeley Data
DOI (Digital Object identifier) of GREI Repository Dataset
Add Health, Wave I: https://doi.org/10.15139/S3/11900
Add Health, Wave II: https://doi.org/10.15139/S3/11917
Add Health, Wave III: https://doi.org/10.15139/S3/11918
Add Health, Wave IV: https://doi.org/10.15139/S3/11920
ABCD: https://doi.org/10.15154/1523041
Outcomes and Outputs

Results of the proposed analyses will have two overarching outcomes: 1) advance understanding of the risk and protective factors that are predictive and causal of the emergence of STB across development; and 2) determine which risk and protective factors (e.g., SDoH and sociodemographics) would be most valuable for effective prevention strategies. 

 

We are dedicated to maintaining ethical, responsible, and fair AI practices throughout our work so that existing disparities or biases are not reinforced. This is especially critical for novel approaches like those proposed in this application. Fairness methods will be applied across our work using libraries such as Fairlearn and IBM AI Fairness 360. Best practices for transparency, replicability, and reproducibility of our results will be followed by publishing and sharing our open-source analysis code through the AI Hub’s GitHub.

 

We aim to share our findings through scientific journals and have identified the following candidates for submission: JAMA + AI, Nature Machine Learning, and Lancet Psychiatry. Our findings will also be presented at conference symposiums such as the International Association for Suicide Prevention and the American Foundation for Suicide Prevention.

 

However, scientific publications and conferences reach limited and specific audiences, and we would like to make our results accessible to all. Proposed analyses and their pipelines will be integrated into the AI Hub’s Suicide Risk Dashboard. This Dashboard is currently under development and will feature dataset management, interactive visualizations, and data analysis workflows that simplify and encourage responsible and reproducible AI/ML applications in suicide research and policy. Adding these novel analyses to the Dashboard will offer valuable information to researchers, healthcare professionals, policymakers, and more.

 

The FAIR Guiding Principles will directly inform our outcomes and outputs. This work follows the findability principle by using public data from the GREI. Our Suicide Dashboard establishes this principle by implementing standardized workflows for new datasets through automated codebook scanning and metadata mapping functions that generalize across datasets. Accessible statistical analyses and algorithms will be maintained through sharing open-source code and automated workflows through the Suicide Risk Dashboard. The dashboard integration will also support interoperability primarily through comparison with other datasets. Lastly, reusability is at the core of our proposal, as we hope these novel techniques will be impactful and generalizable for suicide prevention.

Impact/ Scientific Significance

Many communities across the US have individuals who experience STB, however, some populations are disproportionately affected. The US Department of Health and Human Services in their “2024 National Strategy for Suicide Prevention,” calls for health equity in suicide prevention to better meet the needs of diverse populations. Also, in 2020, in their “Ring the Alarm” report, the Congressional Black Caucus described how the effectiveness of prevention strategies for youth of color might be limited given that many existing programs have been tested in majority White populations.

 

Previous studies have shown heterogeneous trajectories of STBs dependent upon race/ethnicity, sex, sexual orientation, and SES disparities. However, none of this work has focused on understanding the transitions within STBs leveraging current, state-of-the-art machine learning techniques for prediction and causal inference. The novel approaches proposed here will help determine the interacting features contributing to transitions in STBs and how these relate to sociodemographics and SDoH.

 

Ideation-to-action frameworks for understanding suicide, such as the Three-Step Theory (3ST), are often used to inform prevention and intervention efforts. These theories distinguish between the development of SI and the transition from SI to SAs as distinct processes. Large, nationally representative, longitudinal datasets are required to understand causal risk and protective factors for these transitions, and existing studies have been limited. These proposed studies would be the first to analyze the most updated waves of the ABCD and Add Health datasets within the same analysis workflow to compare risk and protective factors across development.

 

Our approach will support the tailoring of suicide prevention and intervention strategies for specific sociodemographic and SDoH profiles along the STB spectrum.

Team

The Artificial Intelligence (AI) Hub at New York University’s McSilver Institute was established in 2022 to investigate how AI-driven systems can be used to address public health challenges relating to race and poverty equitably. Among the AI Hub’s areas of focus are research and policy initiatives aimed at mental health and youth suicide prevention, especially for underserved communities. Our team holds a strong foundation in statistical analysis and public health research. The Assistant Director of Research, Dr. Kara Emery, is a research data scientist with a record of publications and applications using machine learning. Our Subject Matter Expert (SME), Dr. Arielle H. Sheftall, is an Associate Professor at the University of Rochester Medical Center and a leading researcher in youth suicidal behaviors. Another SME, Dr. John Dixon, is an AI and machine learning expert with over 20 years of experience developing tools and applications. Our Senior Program Associate, Ezra Solidum, has a background in public health and research experience in mental health and health equity. Our Full-Stack Software Developer, Jasmine Falk, is a software engineer and researcher committed to building ethical technology. Our Research Data Scientist, Adrian Harris, is a biostatistician with expertise in behavioral health and machine learning ethics. Our research employs a collaborative problem-solving approach with the diverse expertise necessary to tackle public health issues

Considerations

Tables 1 and 2 in the Supporting Documents show the sample sizes of the ABCD and Add Health datasets. The sample sizes should be adequate for our planned analyses. We will use inverse probability-of-censoring weighting (IPCW) and the Synthetic Minority Oversampling Technique (SMOTE) to establish unbiased performance across waves and demographic subsamples. Furthermore, the AI Hub has the compute infrastructure to run the proposed models and can use NYU’s High-Performance Computing GPU clusters if necessary.

 

The McSilver Institute for Poverty Policy and Research has a history of developing evidence-based interventions, services, and programs. It creates the tools, training, and infrastructure needed by front-line workers (e.g., clinicians, teachers, and case managers) so they can deploy research-derived interventions. Thus, the AI Hub is well-positioned to design effective suicide prevention and intervention efforts.

Supporting Documents
Provide up to 10 resources for the evaluation of your secondary research project including but not limited to: ● The persistent identifier of the dataset(s), other than GREI dataset DOIs already listed above, to be used in the proposed project (where available) ● Tools/workflows or resources to be utilized in the proposed project ● Relevant references or scientific publications that directly relate to the proposed project
Non Scored Criteria
Please complete this information. It will not be scored by the evaluation panel.
Entity Participation
Participate as an Entity (i.e., registering as a group of individuals competing together on behalf of a legally established organization, institution, or corporation)
Legal Entity Organization Name
New York University
Research Discipline (non-scored criteria)
biostatistics
public health
suicide prevention
data science
psychiatry
IDeA State (non-scored criteria)
No
All Team Member Information - Name, Organization, Job Title, and Email address
Dr. Kara Emery (Point of Contact)
Organization: NYU McSilver Institute
Job Title: Assistant Director of Research
email: kara.emery@nyu.edu

Dr. Arielle Sheftall
Organization: NYU McSilver Institute, University of Rochester Medical Center
Job Title: Subject Matter Expert, Associate Professor
email: as19064@nyu.edu

Dr. John Dixon
Organization: NYU McSilver Institute
Job Title: Subject Matter Expert
email: jd5533@nyu.edu

Ezra Solidum
Organization: NYU McSilver Institute
Job Title: Senior Program Associate
email: es5488@nyu.edu

Jasmine Falk
Organization: NYU McSilver Institute
Job Title: Full-Stack Software Developer
email: jasmine.falk@nyu.edu

Adrian Harris
Organization: NYU McSilver Institute
Job Title: Research Data Scientist
email: ah5588@nyu.edu
MSI (non-scored criteria)
No
Participation in prior DataWorks! Prizes (non-scored criteria)
No
Team Point of Contact Eligibility
yes
Eligibility (non-scored criteria)
Yes, I confirm that I have read and meet the terms of eligibility for this challenge