MagNet: Model the Geomagnetic Field

Overview

Help NOAA better forecast changes in Earth’s magnetic field!

The efficient transfer of energy from solar wind into the Earth’s magnetic field causes geomagnetic storms. The resulting variations in the magnetic field increase errors in magnetic navigation. The disturbance-storm-time index, or Dst, is a measure of the severity of the geomagnetic storm.

As a key specification of the magnetospheric dynamics, the Dst index is used to drive geomagnetic disturbance models such as NOAA/NCEI’s High Definition Geomagnetic Model - Real-Time (HDGM-RT). Additionally, magnetic surveyors, government agencies, academic institutions, satellite operators, and power grid operators use the Dst index to analyze the strength and duration of geomagnetic storms.

Empirical models have been proposed as early as in 1975 to forecast Dst solely from solar-wind observations at the Lagrangian (L1) position by satellites such as NOAA’s Deep Space Climate Observatory (DSCOVR) or NASA's Advanced Composition Explorer (ACE). Over the past three decades, several models were proposed for solar wind forecasting of Dst, including empirical, physics-based, and machine learning approaches. While the ML models generally perform better than models based on the other approaches, there is still room to improve, especially when predicting extreme events. More importantly, we seek solutions that work on the raw, real-time data streams and are agnostic to sensor malfunctions and noise.

Your Task

In this challenge, your task is to develop models for forecasting Dst that push the boundary of predictive performance, under operationally viable constraints, using the real-time solar-wind (RTSW) data feeds from NOAA’s DSCOVR and NASA’s ACE satellites. Improved models can provide more advanced warning of geomagnetic storms and reduce errors in magnetic navigation systems.

Click Learn More below to head over to Driven Data and get started!

Guidelines

Problem description

The goal of this challenge is to develop models for forecasting Dst that 1) push the boundary of predictive performance 2) under operationally viable constraints 3) using specified real-time solar-wind data feeds. More information on the dataset, performance metric, and submission specifications is provided below.

Finalists and runners-up will be determined by performance on the private test set. These participants will then have the opportunity to submit their code to be audited using an out-of-sample verification set. The top 4 eligible teams that pass this final check will be awarded prizes.

Data

Performance metric

Root Mean Squared Error

Submission Format

Example

The features in this dataset

Overview

The data for this challenge is comprised of solar wind data collected from two satellites: NOAA's Advanced Composition Explorer (ACE) and Deep Space Climate Observatory (DSCOVR). Your goal is to predict the Disturbance Storm-Time Index (Dst), a measure of magnetic activity, from the provided data up to the time of prediction. For any given timestep, you are tasked with forecasting Dst at both the current time (t0) and an hour into the future (t+1).

Forecast Dst solely from solar-wind observations at the Lagrangian (L1) position by satellites such as NOAA’s Deep Space Climate Observatory (DSCOVR) or NASA's Advanced Composition Explorer (ACE).

Note: This is a real-time prediction task. Therefore, your solution may not use data captured later in time to predict Dst, and it may not take Dst as an input.

Dst values are measured by 4 ground-based stations near the equator. These values are then averaged to provide a measurement of Dst for any given hour. However, these values are not always provided in a timely manner. Your goal is to be able to predict Dst in real-time for both the current hour and the next hour. For example, if the current timestep is 10:00 am, you are tasked with predicting the Dst values for both 10:00 am and 11:00 am.

To ensure similar distributions between the training and test data, the data is separated into three non-contiguous periods. All data are provided with a period and timedelta multi-index which indicates the relative timestep for each observation within a period, but not the real timestamp. The period identifiers and timedeltas are common across datasets.

Training Data

Three different time-series datasets are provided as features, in addition to the Dst labels:

filename	description	frequency
train_values.csv	Solar wind data collected from ACE and DSCOVR satellites	minutely
train_sunspots.csv	Smoothed sunspot counts	monthly
train_satellite_positions.csv	Coordinate positions for ACE and DSCOVR	daily
train_labels.csv	Dst values averaged across the four stations	hourly

One more note about the training data: historical training data is provided to you, but the testing environment will be set-up to simulate a real-time environment. It is up to you to figure out how to align and use the appropriate training data, being careful not to leak any "future" information.

SOLAR WIND DATA

The primary feature data are provided in train_values.csv. They are composed of solar-wind readings from the ACE and DSCOVR satellites.

bx_gse - Interplanetary-magnetic-field (IMF) X-component in geocentric solar ecliptic (GSE) coordinate (nanotesla (nT))

by_gse - Interplanetary-magnetic-field Y-component in GSE coordinate (nT)

bz_gse - Interplanetary-magnetic-field Z-component in GSE coordinate (nT)
theta_gse - Interplanetary-magnetic-field latitude in GSE coordinates (defined as the angle between the magnetic vector B and the ecliptic plane, being positive when B points North) (degrees)
phi_gse - Interplanetary-magnetic-field longitude in GSE coordinates (the angle between the projection of the IMF vector on the ecliptic and the Earth–Sun direction) (degrees)
bx_gsm - Interplanetary-magnetic-field X-component in geocentric solar magnetospheric (GSM) coordinate (nT)
by_gsm - Interplanetary-magnetic-field Y-component in GSM coordinate (nT)
bz_gsm - Interplanetary-magnetic-field Z-component in (GSM) coordinate (nT)
theta_gsm - Interplanetary-magnetic-field latitude in GSM coordinates (degrees)
phi_gsm - Interplanetary-magnetic-field longitude in GSM coordinates (degrees)
bt - Interplanetary-magnetic-field component magnitude (nT)
density - Solar wind proton density (N/cm^3)
speed - Solar wind bulk speed (km/s)
temperature - Solar wind ion temperature (degrees K)
source - Starting in 2016, the solar wind data for any given point in time can be sourced from either DSCOVR or ACE satellites depending on the quality. "ac" denotes it was sourced from ACE, and "ds" from DSCOVR.

Example row:

column	value
period	train_a
timedelta	0 days 00:00:00
bx_gse	-5.55
by_gse	3.0
bz_gse	1.25
theta_gse	11.09
phi_gse	153.37
bx_gsm	-5.55
by_gsm	3.0
bz_gsm	1.25
theta_gsm	11.09
phi_gsm	153.37
bt	6.8
density	1.53
speed	383.92
temperature	110237.0
source	ac

SATELLITE COORDINATE DATA

ACE and DSCOVR satellites are not stationary. They actually orbit around the L1 point, in a relatively constant position with respect to the Earth as the Earth revolves around the sun. The positional information might give additional improvements to the forecasting of the Dst values.

train_satellite_positions.csv records the daily positions of the DSCOVR and ACE Spacecrafts in Geocentric Solar Ecliptic (GSE) Coordinates for projections in the XY, XZ, and YZ Planes. The columns for each spacecraft are denoted by the suffix, _ace or _dscovr. Note: some dates are missing for DSCOVR.

gse_x - Position of the satellite in the X direction of GSE coordinates (km)
gse_y - Position of the satellite in the Y direction of GSE coordinates (km)
gse_z - Position of the satellite in the Z direction of GSE coordinates (km)

Example row:

period	timedelta	gse_x_ace	gse_y_ace	gse_z_ace	gse_x_dscovr	gse_y_dscovr	gse_z_dscovr
train_a	0 days 00:00:00	1522376.9	143704.6	149496.7	NaN	NaN	NaN

SUNSPOT NUMBERS

The Sun exhibits a well-known, periodic variation in the number of spots on its disk over a period of about 11 years, called a solar cycle. In general, large geomagnetic storms occur more frequently during the peak of these cycles. Sunspot numbers might allow for calibration of models to the solar cycle.

Sunspot numbers are provided at a monthly frequency. Because sunspot numbers are reliably projected, these numbers can be used for the current month, even if the "real" number of sunspots are yet to be recorded.

Sunspots are indexed according to the first corresponding day in the Dst values, train_labels.csv.

smoothed_ssn - Monthly sunspot numbers, smoothed

Example row:

period	timedelta	smoothed_ssn
train_a	0 days 00:00:00	65.4

Labels

The labels are the Dst values for the current timestep, and the following timestep. You are not allowed to use historical Dst values for prediction.

Performance metric

Performance is evaluated according to Root Mean Squared Error (RMSE). RMSE will be calculated on t0 and t+1 separately, and then averaged.

This metric is implemented in sciki-learn, with the squared parameter set to False.

Submission format

This is a code execution challenge! Rather than submitting your predicted labels, you'll package everything needed to do inference and submit that for containerized execution.

The execution environment will be simulating real-time conditions, subject to data availability constraints. See complete details on making your executable code submission here.

About the project

A cartoon of the Earth's magnetosphere. The Dst or disturbance-storm-time index is a measure of the “ring current” (blue) around the Earth. The ring current is an electric current carried by charged particles trapped in the magnetosphere.

Project background

The efficient transfer of energy from solar-wind into the Earth’s magnetic field causes geomagnetic storms. The resulting ground magnetic field variations increase the errors of systems that use Earth’s natural magnetic field as a pointing reference.

The Dst or disturbance-storm-time index is a measure of the severity of the geomagnetic storm. More specifically, the negative deflection of the Earth's magnetic field due to the ring current (see the above figure) is measured by the Dst index. As a key specification of the magnetospheric dynamics, the Dst index is used to drive geomagnetic disturbance models such as NOAA/NCEI’s High Definition Geomagnetic Model - Real Time (HDGM-RT). Additionally, magnetic surveyors, government agencies, academic institutions, satellite operators, and power grid operators use the Dst index to analyze the strength and duration of geomagnetic storms.

The Dst is calculated as an average deflection of the horizontal component of the magnetic field observed at four near-equatorial ground observatories. The more intense the geomagnetic storm is, the more negative the Dst value becomes. However, the observatory-based Dst values are not very useful for the real-time magnetic modeling due to latency, instrument outages and connectivity issues. Over the past four decades, several models were proposed for solar-wind forecasting of Dst. Here, instead of relying on ground measurements, models predict the Dst values solely based on solar-wind measurements by satellites such as NOAA’s Deep Space Climate Observatory (DSCOVR) or NASA's Advanced Composition Explorer (ACE), situated approximately 1.6 million kilometers away from Earth along the Sun-Earth line. Since radio communication is faster than solar-wind, the solar-wind measurements provide 15-30 minutes of lead-time before the storm arrives on Earth.

The NOAA’s National Centers for Environmental Information (NCEI), in partnership with the University of Colorado’s Cooperative Institute for Research in Environmental Sciences (CIRES) is conducting an open data-science challenge to forecast Dst using the real-time solar-wind (RTSW) data in an operationally viable setup. Recent advances in machine learning research hold immediate promise for improving Dst forecasting even without formal training in space physics. The right challenge in this context can identify solutions that are both operationally viable and highly accurate.

About NOAA NCEI / CIRES

NOAA's National Centers for Environmental Information (NCEI) hosts and provides public access to one of the most significant archives for environmental data on Earth. NCEI contributes to the NESDIS mission by developing new products and services that span the science disciplines and enable better data discovery.

At the Cooperative Institute for Research In Environmental Sciences (CIRES), more than 800 environmental scientists work to understand the dynamic Earth system, including people’s relationship with the planet. CIRES is a partnership of NOAA and the University of Colorado Boulder, and its areas of expertise include weather and climate, changes at Earth’s poles, air quality and atmospheric chemistry, water resources, and solid Earth sciences. The vision at CIRES is to be instrumental in ensuring a sustainable future environment by advancing scientific and societal understanding of the Earth system.

Additional resources

Learn More

NASA Tournament Lab

MagNet: Model the Geomagnetic Field

Overview

Your Task

Guidelines

Problem description

The features in this dataset

Overview

Training Data

Labels

Performance metric

Submission format

About the project

Project background

About NOAA NCEI / CIRES

Additional resources

Challenge Updates

Meet the Winners of MagNet: Model the Geomagnetic Field

Other Challenges You May Be Interested In