NASA Tournament Lab


Mars Spectrometry: Detect Evidence for Past Life

This challenge seeks innovative methods to automatically help analyze and interpret mass spectrometry data related to Mars exploration.


Did Mars ever have environmental conditions that could have supported life? This is one of the key questions in the field of planetary science. Answering it will not only inform our expectations about whether there is life elsewhere in the universe, but it can also help us better understand how and why life developed on Earth.

NASA missions like the Curiosity and Perseverance rovers carry a rich array of instruments suited to collect data and build evidence towards answering this question. One particularly powerful capability they have is collecting rock and soil samples and taking measurements that can be used to determine their chemical makeup. These chemical signatures can indicate whether the environment's conditions could have sustained life.

While sending these complex robots and their delicate instruments over 500 million kilometers through space and landing them autonomously on Mars are awe-inspiring feats of engineering, the challenges do not stop there. Communication between rovers and Earth is severely constrained, with limited transfer rates and short daily communication windows. When scientists on Earth receive sample data from the rover, they must rapidly analyze them and make difficult inferences about the chemistry in order to prioritize the next operations and send those instructions back to the rover.

Improving methods for analyzing planetary data will help scientists more quickly and effectively conduct mission operations and maximize scientific learnings. The longer-term goal, in the future, would be to deploy sufficiently powerful methods onboard rovers to autonomously guide science operations and reduce reliance on a "ground-in-the-loop" control operations model.



In this challenge, your goal is to build a model to automatically analyze mass spectrometry data collected for Mars exploration in order to help scientists in their analysis towards understanding the present and past habitability of Mars.

Specifically, the model should detect the presence of certain families of chemical compounds in data collected from performing evolved gas analysis (EGA) on a set of analog samples. The winning techniques may be used to help analyze data from Mars, and potentially even inform future designs for planetary mission instruments performing in-situ analysis. In other words, one day your model might literally be out-of-this-world!

Some additional notes:

Understanding the data. The mass spectrometry data used in this competition can require specialist knowledge to interpret. See the Problem Description page for discussion of the data that may inform your data processing and feature engineering. If you have questions, please feel welcome to ask on the community forum.

External data use. As noted in the Challenge Rules, external data and pre-trained models are allowed in this competition as long as they are freely and publicly available with permissive open licensing. Note that many mass spectral reference libraries, including those from the National Institute of Standards and Technology (NIST), are not available with open licensing and therefore not allowed in the competition. If you are using any external datasets or pre-trained models, you are required to publicly share about them in the competition discussion forum in order to be eligible for a prize. If you have any questions, please ask on the forum.

Research nature. A focus of this challenge is to feature a new dataset for research and to engage planetary geologists, analytical chemists, and data scientists in working with it. As with any research dataset like this one, initial algorithms may pick up on correlations that are incidental to the task. Solutions in this challenge are intended to serve as a starting point for continued research and development. The challenge organizers intend to make the data available online after the competition for ongoing improvement.


Timeline and Prizes

  • Competition End Date: April 18, 2022, 11:59 p.m. UTC
  • Prize Amounts:
    • 1st Place: $15,000
    • 2nd Place: $7,500
    • 3rd Place: $5,000
    • Bonus: $2,500


A subset of the test data for this competition comes from the SAM testbed, a replica of the Sample Analysis at Mars (SAM) instrument suite onboard the Curiosity rover. The top five participants ranked by performance on just the SAM testbed samples will be invited to submit a brief write-up of their methodology. A judging panel of subject matter experts will review the finalists' write-ups and select a winner based on their solution's technical merits and its potential to be applied to future data.


The competition will have two phases with a timed release of additional labels that can be used for training. See the Problem Description page for more details about the dataset splits.

  • Phase 1: Development – February 17–March 17, 2022
  • Phase 2: Final Training – March 18–April 18, 2022 (Validation set labels released)



  1. Click the 'Learn More' button and navigate to the challenge page on
  2. Once on the DrivenData portal, click the “Compete” button in the sidebar to enroll in the competition
  3. Get familiar with the problem through the overview and problem description. You might also want to reference some of the additional resources from the about page.
  4. Download the data from the data tab
  5. Create and train your own model. The benchmark blog post is a good place to start
  6. Use your model to generate predictions that match the submission format
  7. Click “Submit” in the sidebar, and “Make new submission”. You’re in!


Problem description

In this challenge, your goal is to detect the presence of certain families of chemical compounds in geological material samples using evolved gas analysis (EGA) mass spectrometry data collected for Mars exploration missions. These families are of rocks, minerals, and ionic compounds relevant to understanding the present and past conditions for life on Mars.

The data from this challenge comes from laboratory instruments at NASA's Goddard Space Flight Center and Johnson Space Center that are affiliated with the Sample Analysis at Mars (SAM) science team. SAM is an instrument suite aboard the Curiosity rover on Mars. For more about SAM and the SAM team, see the "About" page.



Submissions and evaluation



Each observational unit in the dataset is a physical sample. The features for each sample are the mass spectrometry measurements from EGA and are provided as individual CSV files. There are four dimensions given in long format:

  • time - Time in seconds since start of a reference time.
  • temp – Temperature of the sample in ºC at time of the measurement.
  • m/z – Mass-to-charge ratio of ion being measured.
  • abundance – Rate of ions detected, per second. Typically, all abundance values are compared in a relative way within one sample's analysis run. (Note that different samples will have abundances in different units, discussed more later.)


Mass spectrometry (MS) is an analytical method that can be used to determine the composition of a sample. First, the substance under analysis is ionized—the molecules are transformed into ions (electrically charged particles). During this ionization process, fragmentation occurs as energetically unstable molecular atoms dissociate. These ions pass through a stage called the mass analyzer that can separate the ions by their mass-to-charge ratio (m/z), and then the abundances (often, counts) of the separated ions are measured by an ion detector. The output measurements are typically visualized as a mass spectrum—a histogram with abundance on the y-axis and m/z on the x-axis. To infer the composition of the sample under analysis, scientists can use domain knowledge of how materials fragment under ionization or compare the mass spectrum to reference spectra measured from known substances.

Plot of mass spectrum for sample S0235.
Example of a mass spectrum. This mass spectrum shows a large peak at m/z=4.0 and smaller peaks at 18.0 and 32.0. Plotted data is for sample S0235 taken at a time snapshot at time=218.376.

Evolved gas analysis (EGA), used in generating the data for this challenge, is an analytical technique that involves heating a sample and measuring the gases released with a mass spectrometer. By introducing temperature as an additional dimension, EGA can provide more information about a sample's chemistry than mass spectrometry alone. In EGA, the sample is steadily heated up in an oven. Gases are released by desorption, dehydration, or decomposition, and these gases flow to the mass spectrometer using a carrier gas (helium in the case of the SAM instrument). The measurements by the mass spectrometer are collected as time series, and scientists can use the mass spectra to identify the gases produced from the sample over time. Based on domain knowledge of how different materials produce gases as they are heated, the composition and mineralogy of the sample can be backed out.

The figures below show the EGA data for an example sample from the training data. The mass spectrometry data is collected as a time series, and the sample's temperature over time is also collected. In the data for this competition, the temperature is already joined to the ion abundances such that each mass spectrometer measurement has an associated temperature.

Plot of ion abundance vs. time for sample S0235.
Sample S0235 ion abundance plotted over time, with each m/z plotted as a separate time series with m/z values of 4.0, 18.0, and 32.0 highlighted. In contrast to the previous mass spectrum showing a time snapshot, we can see that m/z=18.0 and 32.0 peak at different times in the analysis run, corresponding to different temperatures of the sample. 


Plot of temperature vs. time for sample S0235.
Temperature (°C) over time for sample S0235.



Some notes on how scientists typically interpret the EGA–MS data:

  • In chemical analysis, it is common to compare the relative amounts of different substances (e.g., a hydrated sulfate mineral such as gypsum [CaSO4•2H2O] releases 2 moles of water and one mole of SO2 when thermally decomposed). Accordingly, scientists will typically interpret mass spectrometry abundances collected from one sample in a relative way. Sometimes, mass spectra are normalized as "relative abundance" from 0 to 100 with the highest abundance value mapped to 100. (Note that this normalization should happen after background subtraction, discussed later in this section.)
  • As previously discussed, peaks in ion abundances indicate an increase of the respective ions. Scientists typically look for known combinations of certain ions in certain ratios as evidence of certain gases having evolved (been produced) from the sample. The temperature at which those peaks were measured is accordingly the temperature at which the respective gases evolved. This is indicative of certain chemical reactions (such as thermal decomposition) and gives information about the composition of the sample.
    • Note that the shape of the peak can matter—whether the gas evolved in a narrow or broad temperature band gives information about the underlying chemical reaction and/or mineralogy.
    • Ions of a given m/z may have more than one peak in one EGA run. This means that there were different chemical reactions at different temperatures that eventually led to that ion being detected.
    • It can be possible that ions from a given m/z result from more than one compound. For example, carbon dioxide (CO2), carbon monoxide (CO), and nitrogen gas (N2) all have major ions or fragments that appear at m/z 28. If a sample releases multiple gases (or there is atmospheric background) with such overlap, scientists have to separate out the different contributions when analyzing the data.
    • Scientists will often integrate the abundance curve in the time domain for a given m/z value when considering how much of that ion was measured in the EGA run. The integration turns the abundance from a time series of rate values (counts per second) to a quantity (counts).
  • Helium (He) is used as a carrier gas in all EGA runs in this competition. That means the presence of helium is not a meaningful signal in the classification task. Helium ions will typically show up in the data as ions detected with an m/z value of 4.0 and are usually disregarded.
  • Some ions have a background presence in the gas passing through the mass spectrometer, as evidenced by a relatively constant non-zero abundance value over the entire run, across the temperature range. This can happen for various reasons, such as contamination from the atmosphere. Scientists typically subtract this background to clean the data.
    • The simplest way to subtract the background is to take the initial value for a given m/z and subtract it from that m/z value's whole time series. This works well if the background abundance is constant over the run.
    • Background ions do not always have static abundances. Sometimes they may increase or decrease over the EGA run. In such cases, scientists may do more sophisticated subtraction, such as fitting a line or even a polynomial to the measurements and subtracting that.



The data from this competition has been collected from multiple labs from NASA's Goddard Space Flight Center and Johnson Space Center. There are differences in the feature distributions, and this can be captured by distinguishing two kinds of instruments that were used to conduct the measurements:

  1. Commercial instruments—the data comes from commercially manufactured instruments that have been configured as SAM analogs at the Goddard and Johnson labs
  2. SAM testbed—the data comes from the SAM testbed at Goddard, a replica of the SAM instrument suite on Curiosity

The instrument type for each sample in the competition data is indicated by the instrument_type column of the metadata.csv file, with values commercial and sam_testbed.

Notable differences you will see between data collected from these two types of instruments are as follows:

  • Commercial instruments measure ion abundance as ion current in amps (Coulombs per second), while the SAM testbed measures abundance as counts per second. This results in their respective samples having drastically different orders of magnitude for their abundance values. As noted in the previous section, however, the key idea is to compare relative abundance values within one sample's run, and not to compare absolute abundance values across samples.
  • Commercial instrument runs will have ion abundance measurements for all m/z values at every timestep of measurement. The SAM testbed can only measure abundance for one m/z value at a time—the mass spectrometer scans across m/z values in ascending order and cycles through its range of detection.
  • Commercial instrument runs were generally configured to collect data for whole number m/z values from 0.0 to 100.0. The SAM testbed detects ions for a larger range of m/z values, up to 534.0 or 537.0, and data sometimes includes fractional m/z values.
    • In general, ions relevant to the detection of the label classes for this competition will be within the 0.0–100.0 range.
    • In general, if fractional m/z abundances are significant, they will be highly correlated to those of the nearest whole number m/z, e.g., m/z=1.9 will be highly correlated with 2.0. For EGA data, it is generally enough to only look at the whole number m/z values and ignore the fractional m/z values.

There are many fewer SAM testbed samples in the competition dataset than commercial samples—a consequence of the SAM testbed's uniqueness and specialized purpose. Note also that some label classes are not represented in the training data, but may be present in the test set used for final evaluation.

We expect modeling the SAM testbed samples will be a hard task! However, the ability to accurately classify samples analyzed with the SAM testbed is important to our competition sponsors. Accordingly, competitors with the top five best-performing solutions on just the SAM testbed samples in the testbed will be considered as finalists for the bonus prize. The bonus prize will be awarded for the best modeling methodology based on a submitted write-up. See "Competition Timeline and Prizes".



Additional unlabeled data is provided that can be used in developing your model. These samples are provided with features only. You may find these useful for unsupervised or semi-supervised methods. We look forward to seeing what you come up with!

Note that some of the samples in the supplemental dataset were run under different experimental parameters than the primary data for the competition. This may cause the physics and chemistry in the EGA to be different and result in different distributions in the data. The differences in parameters are indicated for each sample in the supplemental_metadata.csv file. In summary, the differences are:

  • Oxygen or nitrogen were used as the carrier gas. This is indicated by the carrier_gas column by o2 and n2 respectively. Samples which use helium will have he. In all samples of the primary data, helium is used.
  • The run was conducted at a different pressure. This is indicated by the different_pressure column, which will be 1 for a different pressure than the primary data and 0 for the same.

Samples which have carrier_gas as he and different_pressure as 0 can be expected to behave similarly to the primary data.



You are provided with multilabel binary labels for the training set. There are ten label classes, each indicating presence of material in the sample belonging to the respective rock, mineral, or ionic compound families:

  1. Basalt
  2. Carbonate
  3. Chloride
  4. Iron Oxide
  5. Oxalate
  6. Oxychlorine (chlorate, perchlorate)
  7. Phyllosilicate
  8. Silicate
  9. Sulfate
  10. Sulfide

Each sample can have any number of class assignments. A 1 indicates that a compound from that family is present in the sample, and a 0 indicates otherwise.





Submissions and evaluation

Note that this competition's limits on rate of submissions are stricter than usual, in order to reduce the impact of overfitting and luck on the results. The dataset in this competition is relatively small, reflecting the specialized nature of the task—there is just not that much data out there for EGA for Mars planetary science. Please see the submissions page for submission restriction details and the status of your available submissions.



The data for this competition is split into three sets: train, validation, and test. You will be submitting predictions for samples in the validation and test splits.

The performance metric evaluated on the validation set will be used for leaderboard ranking while the competition is open, but will not be used for final ranking and prize determination. Labels for the validation set will be released at the beginning of Phase 2: Final Training on March 8, 2022 so that you will have more data available for training your final model.

Final ranking and prizes will be based on performance on the test set. There is also a bonus prize awarded for best modeling methodology for the SAM testbed samples, with eligible finalists selected based on their performance on the SAM testbed samples within the test set.



The format for the submission file is a CSV file. Each row should correspond to one sample and there will be one column for each label class. For each sample and each label class, there should be a numerical score in the range [0.0, 1.0] that represents the confidence of the prediction that a compound belonging to that label class family is present in the sample.


For example, if you predicted:


then your .csv file that you submit would look like:

sample_id,basalt,carbonate,chloride,iron_oxide,oxalate,oxychlorine,phyllosilicate,silicate,sulfate,sulfide S0760,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5 







Performance is evaluated according to aggregated log loss. Log loss (a.k.a. logistic loss or cross-entropy loss) penalizes confident but incorrect predictions. It also rewards confidence scores that are well-calibrated probabilities, meaning that they accurately reflect the long-run probability of being correct. This is an error metric, so a lower value is better.

Log loss for a single observation is calculated as follows:

where y is a binary variable indicating whether the label is correct and pp is the user-predicted probability that the label is present. The loss for the entire dataset is the summed loss of individual observations.

The log loss scores across target label classes are aggregated with an unweighted average. This treats each observation and each label class equally, regardless of prevalence in the evaluation set.

Good luck!

Good luck and enjoy the challenge! We and our partners at NASA are looking forward to seeing your approaches and hopefully be able to incorporate learnings in future space missions. Check out the benchmark blog post for tips on how to get started. If you have any questions you can always visit the user forum!