Help NOAA better forecast changes in Earth’s magnetic field!
The efficient transfer of energy from solar wind into the Earth’s magnetic field causes geomagnetic storms. The resulting variations in the magnetic field increase errors in magnetic navigation. The disturbance-storm-time index, or Dst, is a measure of the severity of the geomagnetic storm.
As a key specification of the magnetospheric dynamics, the Dst index is used to drive geomagnetic disturbance models such as NOAA/NCEI’s High Definition Geomagnetic Model - Real-Time (HDGM-RT). Additionally, magnetic surveyors, government agencies, academic institutions, satellite operators, and power grid operators use the Dst index to analyze the strength and duration of geomagnetic storms.
Empirical models have been proposed as early as in 1975 to forecast Dst solely from solar-wind observations at the Lagrangian (L1) position by satellites such as NOAA’s Deep Space Climate Observatory (DSCOVR) or NASA's Advanced Composition Explorer (ACE). Over the past three decades, several models were proposed for solar wind forecasting of Dst, including empirical, physics-based, and machine learning approaches. While the ML models generally perform better than models based on the other approaches, there is still room to improve, especially when predicting extreme events. More importantly, we seek solutions that work on the raw, real-time data streams and are agnostic to sensor malfunctions and noise.
In this challenge, your task is to develop models for forecasting Dst that push the boundary of predictive performance, under operationally viable constraints, using the real-time solar-wind (RTSW) data feeds from NOAA’s DSCOVR and NASA’s ACE satellites. Improved models can provide more advanced warning of geomagnetic storms and reduce errors in magnetic navigation systems.
Click Learn More below to head over to Driven Data and get started!
The goal of this challenge is to develop models for forecasting Dst that 1) push the boundary of predictive performance 2) under operationally viable constraints 3) using specified real-time solar-wind data feeds. More information on the dataset, performance metric, and submission specifications is provided below.
Finalists and runners-up will be determined by performance on the private test set. These participants will then have the opportunity to submit their code to be audited using an out-of-sample verification set. The top 4 eligible teams that pass this final check will be awarded prizes.
The data for this challenge is comprised of solar wind data collected from two satellites: NOAA's Advanced Composition Explorer (ACE) and Deep Space Climate Observatory (DSCOVR). Your goal is to predict the Disturbance Storm-Time Index (Dst), a measure of magnetic activity, from the provided data up to the time of prediction. For any given timestep, you are tasked with forecasting Dst at both the current time (t0) and an hour into the future (t+1).
Forecast Dst solely from solar-wind observations at the Lagrangian (L1) position by satellites such as NOAA’s Deep Space Climate Observatory (DSCOVR) or NASA's Advanced Composition Explorer (ACE).
Note: This is a real-time prediction task. Therefore, your solution may not use data captured later in time to predict Dst, and it may not take Dst as an input.
Dst values are measured by 4 ground-based stations near the equator. These values are then averaged to provide a measurement of Dst for any given hour. However, these values are not always provided in a timely manner. Your goal is to be able to predict Dst in real-time for both the current hour and the next hour. For example, if the current timestep is 10:00 am, you are tasked with predicting the Dst values for both 10:00 am and 11:00 am.
To ensure similar distributions between the training and test data, the data is separated into three non-contiguous periods. All data are provided with a period and timedelta multi-index which indicates the relative timestep for each observation within a period, but not the real timestamp. The period identifiers and timedeltas are common across datasets.
Three different time-series datasets are provided as features, in addition to the Dst labels:
Solar wind data collected from ACE and DSCOVR satellites
Smoothed sunspot counts
Coordinate positions for ACE and DSCOVR
Dst values averaged across the four stations
One more note about the training data: historical training data is provided to you, but the testing environment will be set-up to simulate a real-time environment. It is up to you to figure out how to align and use the appropriate training data, being careful not to leak any "future" information.
SOLAR WIND DATA
The primary feature data are provided in train_values.csv. They are composed of solar-wind readings from the ACE and DSCOVR satellites.
bx_gse - Interplanetary-magnetic-field (IMF) X-component in geocentric solar ecliptic (GSE) coordinate (nanotesla (nT))
by_gse - Interplanetary-magnetic-field Y-component in GSE coordinate (nT)
bz_gse - Interplanetary-magnetic-field Z-component in GSE coordinate (nT)
theta_gse - Interplanetary-magnetic-field latitude in GSE coordinates (defined as the angle between the magnetic vector B and the ecliptic plane, being positive when B points North) (degrees)
phi_gse - Interplanetary-magnetic-field longitude in GSE coordinates (the angle between the projection of the IMF vector on the ecliptic and the Earth–Sun direction) (degrees)
bx_gsm - Interplanetary-magnetic-field X-component in geocentric solar magnetospheric (GSM) coordinate (nT)
by_gsm - Interplanetary-magnetic-field Y-component in GSM coordinate (nT)
bz_gsm - Interplanetary-magnetic-field Z-component in (GSM) coordinate (nT)
theta_gsm - Interplanetary-magnetic-field latitude in GSM coordinates (degrees)
phi_gsm - Interplanetary-magnetic-field longitude in GSM coordinates (degrees)
temperature - Solar wind ion temperature (degrees K)
source - Starting in 2016, the solar wind data for any given point in time can be sourced from either DSCOVR or ACE satellites depending on the quality. "ac" denotes it was sourced from ACE, and "ds" from DSCOVR.
0 days 00:00:00
SATELLITE COORDINATE DATA
ACE and DSCOVR satellites are not stationary. They actually orbit around the L1 point, in a relatively constant position with respect to the Earth as the Earth revolves around the sun. The positional information might give additional improvements to the forecasting of the Dst values.
train_satellite_positions.csv records the daily positions of the DSCOVR and ACE Spacecrafts in Geocentric Solar Ecliptic (GSE) Coordinates for projections in the XY, XZ, and YZ Planes. The columns for each spacecraft are denoted by the suffix, _ace or _dscovr. Note: some dates are missing for DSCOVR.
gse_x - Position of the satellite in the X direction of GSE coordinates (km)
gse_y - Position of the satellite in the Y direction of GSE coordinates (km)
gse_z - Position of the satellite in the Z direction of GSE coordinates (km)
0 days 00:00:00
The Sun exhibits a well-known, periodic variation in the number of spots on its disk over a period of about 11 years, called a solar cycle. In general, large geomagnetic storms occur more frequently during the peak of these cycles. Sunspot numbers might allow for calibration of models to the solar cycle.
Sunspot numbers are provided at a monthly frequency. Because sunspot numbers are reliably projected, these numbers can be used for the current month, even if the "real" number of sunspots are yet to be recorded.
Sunspots are indexed according to the first corresponding day in the Dst values, train_labels.csv.
smoothed_ssn - Monthly sunspot numbers, smoothed
0 days 00:00:00
The labels are the Dst values for the current timestep, and the following timestep. You are not allowed to use historical Dst values for prediction.
Performance is evaluated according to Root Mean Squared Error (RMSE). RMSE will be calculated on t0 and t+1 separately, and then averaged.
This metric is implemented in sciki-learn, with the squared parameter set to False.
This is a code execution challenge! Rather than submitting your predicted labels, you'll package everything needed to do inference and submit that for containerized execution.
A cartoon of the Earth's magnetosphere. The Dst or disturbance-storm-time index is a measure of the “ring current” (blue) around the Earth. The ring current is an electric current carried by charged particles trapped in the magnetosphere.
The efficient transfer of energy from solar-wind into the Earth’s magnetic field causes geomagnetic storms. The resulting ground magnetic field variations increase the errors of systems that use Earth’s natural magnetic field as a pointing reference.
The Dst or disturbance-storm-time index is a measure of the severity of the geomagnetic storm. More specifically, the negative deflection of the Earth's magnetic field due to the ring current (see the above figure) is measured by the Dst index. As a key specification of the magnetospheric dynamics, the Dst index is used to drive geomagnetic disturbance models such as NOAA/NCEI’s High Definition Geomagnetic Model - Real Time (HDGM-RT). Additionally, magnetic surveyors, government agencies, academic institutions, satellite operators, and power grid operators use the Dst index to analyze the strength and duration of geomagnetic storms.
The Dst is calculated as an average deflection of the horizontal component of the magnetic field observed at four near-equatorial ground observatories. The more intense the geomagnetic storm is, the more negative the Dst value becomes. However, the observatory-based Dst values are not very useful for the real-time magnetic modeling due to latency, instrument outages and connectivity issues. Over the past four decades, several models were proposed for solar-wind forecasting of Dst. Here, instead of relying on ground measurements, models predict the Dst values solely based on solar-wind measurements by satellites such as NOAA’s Deep Space Climate Observatory (DSCOVR) or NASA's Advanced Composition Explorer (ACE), situated approximately 1.6 million kilometers away from Earth along the Sun-Earth line. Since radio communication is faster than solar-wind, the solar-wind measurements provide 15-30 minutes of lead-time before the storm arrives on Earth.
The NOAA’s National Centers for Environmental Information (NCEI), in partnership with the University of Colorado’s Cooperative Institute for Research in Environmental Sciences (CIRES) is conducting an open data-science challenge to forecast Dst using the real-time solar-wind (RTSW) data in an operationally viable setup. Recent advances in machine learning research hold immediate promise for improving Dst forecasting even without formal training in space physics. The right challenge in this context can identify solutions that are both operationally viable and highly accurate.
At the Cooperative Institute for Research In Environmental Sciences (CIRES), more than 800 environmental scientists work to understand the dynamic Earth system, including people’s relationship with the planet. CIRES is a partnership of NOAA and the University of Colorado Boulder, and its areas of expertise include weather and climate, changes at Earth’s poles, air quality and atmospheric chemistry, water resources, and solid Earth sciences. The vision at CIRES is to be instrumental in ensuring a sustainable future environment by advancing scientific and societal understanding of the Earth system.
The transfer of energy from solar wind to Earth's magnetic field can cause massive geomagnetic storms, wreaking havoc on key infrastructure systems like GPS, satellite communication, and electric power transmission. The severity of these geomagnetic storms is measured by the Disturbance Storm-time Index, or Dst.
The goal of the MagNet: Model the Geomagnetic Field challenge was to develop models for forecasting Dst that 1) push the boundary of predictive performance, 2) under operationally viable constraints, and 3) using specified real-time solar-wind data feeds. This is a hard problem where the best approaches are not evident at the outset. Competitors were tasked with improving forecasts both for the current Dst value (t0) and Dst one hour in the future (t1).
Over the course of the competition, DrivenData saw over 600 participants and an impressive 1,200 submissions. The number of submissions is especially notable given the technical constraints of the code execution environment and the limit of 3 submissions per week.
Among the winners, we saw a variety of creative solutions. Competitors used a combination of Long Short-term Memory (LSTM), Gated Recurrent Units (GRU), Convolution Neural Networks (CNN), and Light Gradient-boosted Models (LGBM) to secure the top leaderboard positions. In addition to using different models, competitors experimented with various time windows and imputation methods to deal with sensor malfunctions and missing data.
The top four prize-winners were able to achieve 11.1 - 11.5 nT RMSE on the private test set, beating the benchmark of 15.2 nT. Interestingly, an ensemble of the top four models does best of all with an RMSE of 10.6 nT, achieving a 30% reduction from the benchmark!