This challenge is Part 1 of a multi-phased challenge. To participate in the other stages please visit https://deid.drivendata.org. See the complete challenge rules here. Additionally, further details about the other stages can be found below.
The Public Safety Communications Research Division (PSCR) of the National Institute of Standards and Technology (NIST) invites members of the public to join the Differential Privacy Temporal Map Challenge (DeID2). This multi-stage challenge will award up to $276,000 to advance differential privacy technologies by building and measuring the accuracy of algorithms that de-identify data sets containing temporal and geographic information with provable differential privacy.
The DeID2 Challenge is composed of three contests:
A Better Meter Stick for Differential Privacy Contest, which is a metric competition to develop new metrics by which the quality of privatized data produced by differential privacy algorithms can be assessed.
The Differential Privacy Temporal Map Contest, which is a series of algorithm sprints that will explore new methods in differential privacy for geographic time series data that preserve the utility of the data as much as possible while guaranteeing privacy.
Open Source and Development Contest, which is open only to leading teams at the end of the final algorithm sprint.
There are no fees or qualifications needed to enter any stage, and teams can participate in either or both the metric competition and the algorithm sprints. The metric competition and the first algorithm sprint will run simultaneously. Please note that while this challenge is open to international participation, the Team Lead must be a US citizen or permanent resident of the US or its territories. The Team Lead is the sole person who will accept the cash prizes on behalf of the team, but we encourage the make-up of the team to include solvers globally.
Large data sets containing personally identifiable information (PII) are exceptionally valuable resources for research and policy analysis in a host of fields such as emergency planning and epidemiology. This project seeks to engage the public in developing algorithms to de-identify data sets containing PII to result in data sets that remain valuable tools yet cannot compromise the privacy of individuals whose information is contained within the data set.
Previous NIST PSCR differential privacy projects (NIST Differential Privacy Synthetic Data Challenge and The Unlinkable Data Challenge: Advancing Methods in Differential Privacy - collectively referred to as DeID1) demonstrated that crowdsourced challenges can make meaningful advancements in this difficult and complex field. Those previous contests raised awareness of the problem, brought in innovators from outside the privacy community, and demonstrated the value of head-to-head competitions for driving progress in data privacy. This Differential Privacy Temporal Map Challenge hopes to build on these results by extending the reach and utility of differential privacy algorithms to new data types.
Temporal map data is of particular interest to the public safety community. There are a number of different situations where this type of data is important, such as in epidemiology studies, resource allocation, and emergency planning. Yet the ability to track a person’s location over a period of time presents particularly serious privacy concerns. The Differential Privacy Temporal Map Contest invites solvers to develop algorithms that preserve data utility while guaranteeing privacy. Learn more about this competition here.
This A Better Meter Stick for Differential Privacy Contest invites solvers to present a concept paper detailing metrics with which to assess the accuracy and quality of outputs from algorithms that de-identify data sets containing temporal map data. High-quality metrics developed from this contest may be used to evaluate differential privacy algorithms submitted to the final algorithm sprint of the Differential Privacy Temporal Map Contest.
The long-term objective of this project is to develop differential privacy algorithms that are robust enough to use successfully with any data sets - not just those that are provided for these contests. The Open Source Development Contest provides leading teams with an opportunity to further develop their software to increase its utility and usability for open-source audiences.
The DeID2 Challenge is implemented by DrivenData and HeroX under contract with NIST PSCR. This website is not owned or operated by the Government. All content, data, and information included on or collected by this site is created, managed, and owned by parties other than the Government.
Public safety use-cases of temporal map data include emergency planning, epidemiologic analysis, and policy setting. High-quality data is required to perform sound analyses in these areas. Both time and space segments may be sparsely populated yet critically important. Further, these sparsely populated segments have an inherently greater risk to linkage attack, where auxiliary and possibly completely unrelated datasets, in combination with records in the dataset that contain sensitive information, can be used to determine uniquely identifiable individuals. Although differential privacy has a formal guarantee against linkage attacks, there is no guarantee of accuracy in output data.
In this contest, NIST PSCR seeks novel metrics by which to assess the quality of differentially private algorithms on temporal map data. Submissions should provide robust metrics that can be applied to a wide variety of data sets involving temporal and spatial data. Solvers are encouraged to provide examples of how their proposed metrics will improve use-case outcomes.
Better Meter Stick for Differential Privacy Contest Guidelines
NIST PSCR is interested in creative, effective and insightful approaches to evaluating the outputs from differential privacy algorithms, especially those involving temporal map data. The area of data privatization is growing rapidly, as is our understanding of the quality of privatized data.
NIST PSCR invites solvers to develop metrics that best assess the accuracy of the data output by the algorithms that de-identify temporal map data. In particular, methods are sought that:
Measure the quality of data with respect to temporal or geographic accuracy/utility, or both.
Evaluate data quality in contexts beyond this challenge.
Are clearly explained, and straightforward to correctly implement and use.
As you propose your evaluation metrics, be prepared to explain their relevance and how they would be used. These metrics may be your original content, based on existing work, or any combination thereof. If your proposed metrics are based on existing work or techniques, please provide citations. Participants will be required to submit both a broad overview of proposed approaches and specific details about the metric definition and usage. Additionally, we are interested in how easily an approach can accommodate large data sets (scalability) and how well it can translate to different use cases (generalizability).
In order to help the community propose better metrics, NIST PSCR will review the executive summary section for any metric submitted by November 30, 2020 and provide high level feedback. NIST PSCR will only review and comment on the executive summary section. This feedback is intended to help participants develop more complete and thoughtful metrics. All feedback will be provided by December 3, 2020. Participation in this step is optional, but it is a good opportunity that will allow you to check if you are on the right track.
Each submission should contain only one metric. Each individual/team lead is allowed one submission. Organizations may have multiple teams, but each team must have a different team lead. All submissions are due by January 5, 2021, 10pm EST and should include a title, brief description of the proposed metric, introduction to the team, and a completed submission template (found here).
Please carefully review the contents of the Competitors’ Resources tab. This tab includes the data sets described below, a sample submission to this contest, a submission template, and a document sharing “tips and tricks” for developing a generalized metric that effectively measures the practical utility of the privatized data.
Each submission should include a demonstration of the metric on at least one data set. The data may be real or synthesized. NIST PSCR is providing one example of temporal map data that may be used, including 4 data sets: 1) a ground truth data set; 2) a privatized data set of poor quality; 3) a privatized data set of moderate quality; and 4) a supplementary data set with demographic characteristics of map segments. These data sets can be downloaded from the Competitors’ Resources tab. Additional information about the data can be found here.
Participants may use these provided data sets, or they may use or create their own. Any data sets used must be freely and publicly available (which included the provided data above), or created by the submitting participants. Preference is also given to data sets with usefulness for public safety.
This A Better Meter Stick for Differential Privacy Contest will award up to $29,000 to the top-ranked submissions and to submitted metrics receiving the most votes from public voting as follows:
A Better Meter Stick for Differential Privacy Prize Structure
(total prize purse of $29,000)
Winners are selected by the Judges, based on the evaluation of submissions against the Judging Criteria. Up to $25,000 will be awarded to winners in up to four tiers. Submissions that have similar quality scores may be given the same rankings with up to 10 winners total:
First prize: Up to 2 winners of $5,000
Second prize: Up to 2 winners of $3,000
Third prize: Up to 3 winners of $2,000
Fourth prize: Up to 3 winners of $1,000
People’s Choice Prize
Winners are selected by public voting on submitted metrics that have been pre-vetted by NIST PSCR for compliance with minimum performance criteria. Up to a total of $4,000 will be awarded to up to four winners.
People’s Choice: Up to 4 winners of $1,000
August 24, 2020
Open to submissions
October 1, 2020
Executive Summaries due for optional preliminary review
November 30, 2020 10:00pm EST
Complete submissions due
January 5, 2021 10:00pm EST
NIST PSCR Compliance check (for public voting)
January 5-6, 2021
January 8, 2021 9:00am EST - January 21, 2021 10:00pm EST
Judging and Evaluation
January 5 - February 2, 2021
February 4, 2021
Submissions to the Metric Paper Contest will undergo initial filtering to ensure they meet minimum criteria before they are reviewed and evaluated by members of the expert judging panel. These minimum criteria include:
Submitter or submitting team meets eligibility requirements,
All required sections of the submission form are completed,
Proposed metric is coherently presented and plausible,
Each submission should contain only one metric. Each individual/team lead is allowed one submission.
Submissions that have passed the initial filtering step will be reviewed for Technical Merit by members of the expert judging panel, evaluated against the evaluation criteria listed below, and scored based on the relative weightings shown.
Clarity (30/100 points)
Metric explanation is clear and well written, defines jargon and does not assume any specific area of technical expertise. Pseudocode is clearly defined and easily understood.
Participants clearly address whether the proposed metric provides snapshot evaluation (quickly computable summary score) and/or deep dive evaluation (generates reports locating significant points of disparity between the real and synthetic data distributions), and explain how to apply it.
Participants thoroughly answer the questions, and provide clear guidance on metric limitations.
Utility (40/100 points)
The metric effectively distinguishes between real and synthetic data.
The metric represents a breadth of use cases for the data.
Motivating examples are clearly explained and fit the abstract problem definition.
Metric is innovative, unique, and likely to lead to greater, future improvements compared with other proposed metrics.
Robustness (30/100 points)
Metric is feasible to use for large volume use cases.
The metric has flexible parameters that control the focus, breadth, and rigor of evaluation.
The proposed metric is relevant in many different data applications that fit the abstract problem definition.
Each metric must be submitted as a separate submission. Metrics may be oriented towards map data, temporal sequence data, or combined temporal map data. Successful submissions to the Metric Paper Contest will include:
A brief description of the proposed metric, (Note that this will be included with your title when identifying your metric during the public voting stage)
An introduction to the submitter or submitting team that includes brief background and expertise, and optional explanation of the author’s interest in the problem,
A PDF document, using the provided template, with a minimum length of 2 pages that thoughtfully and clearly addresses what the metric is, why it works, and how it addresses the needs of data users, drawing on results from application on data. The document must include the following three sections and address the points outlined below.
Executive Summary(1-2 pages) Please provide a 1-2 page, easily readable review of the main ideas. This is likely to be especially useful for people reading multiple submissions during the public voting phase. The executive summary should be readily understood by a technical layperson and include:
The high-level explanation of the proposed metric, reasoning and rationale for why it works
An example use case
Any technical background information needed to understand the metric. (Note that these metric write-ups should be accessible to technical experts from a diverse variety of disciplines. Please provide clear definitions of any terms/tools that are specific to your field, and provide a clear explanation for any properties that will be relevant to your metric definition or defense.)
A written definition of the metric, including English explanation and pseudocode that has been clearly written and annotated with comments. Code can also be included (optionally) with the submission.
Explanation of parameters and configurations. Note that this includes feature-specific configurations. For instance, a metric could reference “demographic features” or “financial features” for specific treatment, and given a new data set with a new schema, the appropriate features could be specified in a configuration file without loss of generalizability.
Walk-through examples of metric use in snapshot mode (quickly computable summary score) and/or deep dive mode (generates reports locating significant points of disparity between the real and synthetic data distributions) as applicable to the metric.
Describe the metric’s tuning properties that control the focus, breadth, and rigor of evaluation
Describe the discriminative power of the proposed metric: how well it identifies points of disparity between the ground truth and privatized data
Describe the coverage properties of the proposed metric: how well it abstracts/covers a breadth of uses for the data
Address the feasibility of implementing the proposed metric. For instance, what is the computation time and resource requirements for the metric when running on data? How does the metric scale with an increase in variables, map segments, time segments, and records? This information may include empirical results (e.g. runtime) or theoretical results (e.g. mathematical properties). Feel free to provide assumptions about hardware (e.g. CPU model, memory, operating system) and feature constraints.
Provide an example of 2-3 very different data applications where the metric can be used.
All participants 18 years or older are invited to register to participate except for individuals from entities or countries sanctioned by the United States Government.
A Participant (whether an individual, team, or legal entity) must have registered to participate in order to be an eligible Participant.
Cash prizes are restricted to eligible Participants who have complied with all of the requirements under section 3719 of title 15, United States Code as contained herein. At the time of entry, the Official Representative (individual or team lead, in the case of a group project) must be age 18 or older and a U.S. citizen or permanent resident of the United States or its territories. In the case of NIST PSCR: Differential Privacy Temporal Map Challenge, Official Rules Page 18 of 24 a private entity, the business shall be incorporated in and maintain a place of business in the United States or its territories.
Employees, contractors, directors and officers (including their spouses, parents, and/or children) of HeroX and DrivenData, Inc. and each of their respective parent companies, subsidiaries and affiliated companies, distributors, web design, advertising, fulfillment, judging and agencies involved in the administration, development, fulfillment and execution of this Challenge will not be eligible to compete in this Challenge.
Participants may not be a Federal entity or Federal employee acting within the scope of their employment. Current and former NIST PSCR Federal employees or Associates are not eligible to compete in a prize challenge within one year from their exit date. Individuals currently receiving PSCR funding through a grant or cooperative agreement are eligible to compete but may not utilize the previous NIST funding for competing in this challenge. Previous and current PSCR prize challenge participants are eligible to compete. Non-NIST Federal employees acting in their personal capacities should consult with their respective agency ethics officials to determine whether their participation in this competition is permissible. A Participant shall not be deemed ineligible because the Participant consulted with Federal employees or used Federal facilities in preparing its entry to the Challenge if the Federal employees and facilities are made available to all Participants on an equitable basis.
Participants, including individuals and private entities, must not have been convicted of a felony criminal violation under any Federal law within the preceding 24 months and must not have any unpaid Federal tax liability that has been assessed, for which all judicial and administrative remedies have been exhausted or have lapsed, and that is not being paid in a timely manner pursuant to an agreement with the authority responsible for collecting the tax liability. Participants must not be suspended, debarred, or otherwise excluded from doing business with the Federal Government.
Multiple individuals and/or legal entities may collaborate as a group to submit a single entry and a single individual from the group must be designated as an Official Representative for each entry. That designated individual will be responsible for meeting all entry and evaluation requirements.
Please see the Official Rules on challenge.gov for rules and complete terms and conditions.
Thank you to all of those that tuned in on Tuesday to watch our live webinar. If you missed it, you can watch the recording below or download the slide deck.
During the webinar, we announced the extension of the deadline for the optional preliminary review of executive summaries to November 30, 2020 10:00pm EDT. By submitting your executive summary prior to this deadline, you will receive feedback from the NIST team. Feedback will be received by December 3rd, 2020.
Maybe you've had some questions, thoughts, or ideas about the challenge so far -- but you're still wondering where to take them? In fact, there's a quick, easy-to-use way to ask questions and start conversations about the DeID2 - A Better Meter Stick for Differential Privacy: the challenge forum.
Interested? Simply go to the forum to see what people are already saying. If you'd like to start a new conversation, click "New topic" (pictured below) and begin crafting your message.
This is a great way to start connecting with other community members around different aspects of the challenge, gain insights, and even collaborate! Keep in mind that HeroX regularly checks in on the forum, so it's also a great way to get in touch with us about any questions (or suggestions) you might have.