The Unlinkable Data Challenge: Advancing Methods in Differential Privacy

The Unlinkable Data Challenge: Advancing Methods in Differential Privacy

Propose a mechanism to enable the protection of personally identifiable information while maintaining a dataset's utility for analysis.

Challenge Overview

The digital revolution has radically changed the way we interact with data. In a pre-digital age, personal data was something that had to be deliberately asked for, stored, and analyzed. The inefficiency of pouring over printed or even hand-written data made it difficult and expensive to conduct research. It also acted as a natural barrier that protected personally identifiable information (PII) --  it was extremely difficult to use a multitude of sources to identify particular individuals included in shared data.

Our increasingly digital world turns almost all our daily activities into data collection opportunities, from the more obvious entry into a webform to connected cars, cell phones, and wearables. Dramatic increases in computing power and innovation over the last decade along with both public and private organizations increasingly automating data collection make it possible to combine and utilize the data from all of these sources to complete valuable research and data analysis.

At the same time, these same increases in computing power and innovations can also be used to the detriment of individuals through linkage attacks: auxiliary and possibly completely unrelated datasets in combination with records in the dataset that contain sensitive information can be used to determine uniquely identifiable individuals.

This valid privacy concern is unfortunately limiting the use of data for research, including datasets within the Public Safety sector that might otherwise be used to improve protection of people and communities. Due to the sensitive nature of information contained in these types of datasets and the risk of linkage attacks, these datasets can’t easily be made available to analysts and researchers. In order to make the best use of data that contains PII, it is important to disassociate the data from PII. There is a utility vs. privacy tradeoff however, the more that a dataset is altered, the more likely that there will be a reduced utility of the de-identified dataset for analysis and research purposes.

Currently popular de-identification techniques are not sufficient. Either PII is not sufficiently protected, or the resulting data no longer represents the original data. Additionally, it is difficult or even impossible to quantify the amount of privacy that is lost with current techniques.

This competition is about creating new methods, or improving existing methods of data de-identification, in a way that makes de-identification of privacy-sensitive datasets practical. A first phase hosted on HeroX will ask for ideas and concepts, while later phases executed on Topcoder will focus on the performance of developed algorithms.


What Can You Do Right Now?

  • Click ACCEPT CHALLENGE above to sign up for the challenge
  • Read the Challenge Guidelines to learn about the requirements and rules
  • Share this challenge on social media using the icons above. Show your friends, your family, or anyone you know who has a passion for discovery.
  • Start a conversation in our Forum to join the conversation, ask questions or connect with other innovators.
Updates 28

Challenge Updates

Your Questions Answered

Feb. 1, 2019, 11:46 a.m. PST by Kyla Jeffrey

Did you attend the Match #2 Webinar? 

We answered your most pressing questions. If you missed it, you can watch the recording below and review the written Q&A here.


As a reminder, we are quickly approaching the Match 2 deadline on February 9, 2019 at 21:00 EST (New York). 

If you have any additional questions, please head over to the Topcoder Forums and we will do our best to answer them for you.

Q & A Webinar

Jan. 14, 2019, 5:48 p.m. PST by Kyla Jeffrey

We are hosting a Q & A webinar with the NIST and Topcoder on Tuesday, Jan 15 at 12:30 pm ET (New York). This will be a great opportunity to meet the sponsors, learn more, and ask your burning questions. 


Save Your Seat

Match 2 Launches Today!

Jan. 11, 2019, 6:30 p.m. PST by Kyla Jeffrey

We are thrilled to announce that the second marathon match officially opens for submission today!

The Differential Privacy Synthetic Data Challenge entails a sequence of three marathon matches run on the Topcoder platform, asking contestants to design and implement their own synthetic data generation algorithms, mathematically prove their algorithm satisfies differential privacy, and then enter it to compete against others’ algorithms on empirical accuracy over real data, with the prospect of advancing research in the field of Differential Privacy.

Anyone is welcome to participate in Match 2, regardless of whether or not you participated in Match 1. If you’re not a differential privacy expert, and you’d like to learn, we’ll have tutorials to help you catch up and compete!

For more information on this data challenge funded through Public Safety Communications Research (PSCR) at NIST, or any federal government challenge, go to

We are also hosting a webinar on Tuesday, Jan 15 at 12:30 pm ET (New York). This will be a great opportunity to meet the sponsors, learn more, and ask any questions. You can register for the webinar here.


Head over to the Topcoder Challenge Page to view the full details

Congratulations to the Winners in Match #1 of the Differential Privacy Synthetic Data Challenge

Jan. 1, 2019, 8:41 p.m. PST by Kyla Jeffrey

Congratulations to the Top 5 winners, and to all contestants in this Challenge!  The winners in Match #1 of the Differential Privacy Synthetic Data Challenge are:

1st ($10 000) - 781 953 - jonathanps
2nd ($7 000) - 736 780 - ninghui
3rd ($5 000) - 664 623 - rmckenna
4th ($2 000) - 93 955 - manisrivastava
5th ($1 000) - 82 414 - privbayes

For more details on the final review and scoring, head over to the Topcoder forum.

Differential privacy is an emerging research area. The work that each of you have done and will continue to do is critical to developing the knowledge and resources essential to expanding research in this area. You are helping NIST grow a diverse community that is maturing into robust solutions.  Your participation in this data challenge is incredibly important…highlighting different approaches that will become the basis for future growth and innovation while helping NIST establish a measurement-based approach to fostering data-driven R&D in this area.  Your involvement with PSCR ensures that this community considers the practical, applied use of differential privacy in public safety applications of the future.

Deadline Tomorrow!

Nov. 29, 2018, 2:21 p.m. PST by Kyla Jeffrey

There are less than 24 hours left to submit, or update your submission, for the first Marathon Match in the NIST Differential Privacy Synthetic Data Challenge! The deadline is November 30 at 15:00 ET (New York).


Head over to Topcoder to submit your entry --> 


If you haven't already read our 4 tips for success, we recommend checking them out here.

Forum 8
Community 631