submission voting
voting is closed.
Improving Privacy Strength Yet Preserving Utility
short description
This proposal presents a technique to achieve unlinkability by scaling the privacy strength as the number of data owners increases.
Solution Overview - Describe how your approach optimizes the balance of differential privacy and data analysis utility.
Current privacy deployments are only able to estimate heavy-hitter populations (top k) and are unable to satisfy both privacy and utility. Privacy leakage is measured regarding the set difference of at least 1 data owner. Such a model is only able to reduce privacy leakage at the cost of utility.

However, the query is not evaluated or taken into consideration by differential privacy alone. Thus, even a small privacy leakage may completely violate data owners privacy due to a query.

For example, suppose we query everyone at the Brooklyn Bridge to understand how many people are currently at the Brooklyn Bridge. Regardless of the cryptographic technique or privacy mechanism used, the act of responding signals to an adversary that the data owner was indeed at the Brooklyn Bridge.

Rather, K-Privacy's adversary model is in regards to the data owners themselves and the difficulty in linkability. Thus, privacy strength increases as the number of data owners increases. The additional data owners blend with and provide privacy protection.

For example, for location privacy a single data owner will submit simultaneous locations at once. Though using K-Privacy the noise is cancelled out. The adversary must now figure out which location a particular data owner is at, which becomes more difficult as the number of data owners increases.
Which randomization mechanisms does your solution use?
If other, please list and explain
K-Privacy which is a newly proposed privacy mechanism. Please refer to "K Privacy: Towards improving privacy strength while preserving utility" to appear in Ad Hoc Networks. Volume 80, November 2018, Pages 16-30.

A single data owner will submit multiple locations (or no locations), making it challenging to figure out the exact location. However, utility is preserved due to the multiple round estimation of K-Privacy. We have deployed this as CrowdZen to the UCLA campus.
Is your proposed solution an improvement or modification of previous algorithms in differential privacy or a substantially new algorithm.
new algorithm
If other, please explain
K-Privacy, a new privacy mechanism that increases in privacy strength while preserving utility.
Provided that there is a known relationship between the fields and the analysis (research question such as regression, classification, clustering), how does your approach determine the number and order of the randomized mechanism being utilized?
Calibration of the privacy noise and the number of rounds executed.
Provided that there is NOT a known relationship between the fields and the analysis (unknown research question), is there a prescribed sequence of privacy techniques that will always perform the best regardless of data?
Yes, as long as there is a minimum number of data owners participating.
How does your proposed solution differ from existing solutions? What are the advantages vs existing solutions? Disadvantages?
Takes into account the formulated query rather than considering the dataset alone and creating a privacy leakage metric.
How well does your solution optimize utility for a given privacy budget (the utility-privacy frontier curve) and how does it accomplish this for each of the research classes (regression, classification, clustering, and unknown research question) and each of the data types (numeric, geo-spatial, and class)?
Our adversary model is in terms of unlinkability, privacy budget is just 1 characteristic.

We increase the number of data owners, this does not add noise as compared to other models that focus on heavy-hitters only. Increasing the number of data owners is part of our adversary model that provides unlinkability. Other models must alternatively add more noise to achieve stronger privacy.
Describe other data types and/or research questions that your Solution would handle well. How would performance (in terms of privacy and utility) be maintained and why? Describe other data types and/or research questions that your Solution would not handle well. How would performance (in terms of privacy and utility) degrade and why?
Research question of increasing privacy strength as the number of data owners increases. Our published results show that the utility remains constant while privacy strength increases.
How do the resource requirements for your Solution scale with the amount of data? Describe how the computational requirements of your Solution at different volumes of data can be handled using current computing technological capabilities. If your Solution requires advances in technology, describe your vision and anticipated availability for the types and scope of technological advances necessary.
The computational requirements are efficient and minimal demands.

We have deployed our system CrowdZen to the University of California - Los Angeles campus.
Please reference a dataset you suggest utilizing as a use case for testing algorithms. Is there existing regression, classification, and clustering analysis of this data? If so, Please describe.
Location data
Propose an evaluation method for future algorithm testing competitions.
unlinkability of location traces
Document upload - Submit a supporting PDF. Note that the submission should stand alone without the attachment.

comments (public)