“When we were presenting, I got the feeling this was something everyone had been waiting for. So, I hope we can contribute something to the community.” says Elisabet Lobo-Vesga, PhD student at the Information Security division, Department of Computer Science and Engineering.“
People are more and more concerned about privacy, and organisations are trying different mechanisms to help preserve the user’s privacy when analysing their data.” A part of the solution is the concept called differential privacy. It’s a method of protecting the privacy by changing the response from a database slightly when requesting data.
“The response is less exact than the actual data in the database” Elisabeth explains, “Let's say we have a database with personal information and we want to know the number of people that pick their noses. In our data, Alice, Bob, and Charlie are the ones who do that, so the real answer should be 3. However, we don't want to expose them, how do we protect their privacy? We want our response to be as similar as possible to what the answer would be if one of them weren’t in the database. We do this by adding randomly -1, 0 or 1 to the answer. That way we ensure that, with some probability, an attacker, by looking at the answer, cannot identify if in particular Bob was or wasn't included in the count. This is the whole concept of differential privacy. But in practice, the noise we add is a bit more complex.”
Sought after accuracy
Elisabet Lobo-Vesga has created a programming language that allows non-privacy experts to create algorithms that preserve privacy.
Differential privacy is a strong mechanism used by companies like Amazon and Google. The down side is that you don’t know how accurate the information is when making a query to a database.
“Our programming language makes it possible to also get a notion if you can trust the information. It will tell you the error margin, so you know if you can actually publish the result or if it’s just random nonsense.”
For a single variable accuracy can quite easily be determined with a mathematical equation. The tricky part is combining or doing operations between multiple variables, which is common when analysing data.
“Knowing the accuracy of just one random variable is easier than knowing the accuracy of an operation of several random variables. What’s important is to know is if the variables are dependent or not, since you need to handle them differently, with different bounds. To keep track of the dependency or independency we are putting a tag on each query so we can see where the variables come from. That way, we know how to handle them.”
Elisabet and her colleagues are now making the final tests of the implementation of the language before the release.
“I think in every scenario where you can apply differential privacy, this could be useful.”
A rewarding experience
Presenting at the conference both reassured Elisabet that her research is relevant and gave her some valuable feedback from the programming language community.
“People where friendly and really interested in my research. It was awesome to see that people care about what I’m doing. I also got some good feedback and realised some problems that may come with my implementation, so I could start looking for solutions for those problems.”
18 graduate students participated in the ACM Student Research Competition by submitting posters explaining their research. The top three students got to present their research before a panel of judges and conference attendees, and receive prizes of $500, $300, and $200, respectively.
See the prize winning poster (pdf)