English version (French version here)
Every week, Assas Legal Innovation meets professionals to exchange on the theme of the legal innovation.
For this new edition, Mihnea Dumitrascu met Hamza Harkous, researcher to the Ecole polytechnique of Lausanne and developer of systems based on the artificial intelligence in the domains of the private life and the security.
All his articles are available on his blog.
Who are you? What did you study ?
I finished my PhD on the topic of “Data-Driven, Personalized Usable Privacy” in June 2017. I’m currently a postdoctoral researcher at the EPFL in Switzerland. Generally, I’m a researcher working in the intersection of privacy, machine learning, and human-computer interaction. I love building usable products which have an impact on a wide user base.
You’ve developed several AI-drive systems, could you tell us more about them?
I’ve worked on these systems:
- PrivySeal : PrivySeal tells users what information apps have access to by showing them the unnecessary information they obtain, through access permissions to their data stored on the cloud (i.e. their files on Google Drive or Dropbox). By using machine learning and data visualization techniques, we show users : what topics the apps would learn that they are interested in, who appears in photos with them, what their opinions are, etc. We call these “Far-Reaching Insights”. In other words, we use the user’s own data (images and documents) as a language / tool to inform them of the risks these applications pose to their privacy.
- Modemos : My colleague Rémi Lebret and I created this site in collaboration with Privately SA. It shows the tools developed to help children’s privacy online (from hate speech detection to emotion recognition and image safety classification).
- PriBot/Polisis : These are apps that allow for analyzing privacy policies with the power of AI.
I think the last system has the most impact, and it’s also the one that I prefer.
Let’s focus on Pribot and Polisis, how did the idea of creating such a system come about?
The idea started when we were brainstorming for a submission to the workshop on the future of privacy notices at SOUPS 2016, with my colleague Kassem Fawaz. Chatbots and AI assistants were big at that time and we thought that it would be great if they could answer questions about privacy policies for us. We came up with some mockups, and it looked very promising if they could work. This then started all the subsequent research on how to analyze privacy policies to answer these questions, which gave birth to both Polisis and PriBot.
Did you work with lawyers on this project?
We did not work with lawyers directly. However, one of the datasets we used was annotated by graduate law students within the Usable Privacy project.
Can you explain more precisely what each system consists of?
PriBot is the first automated question-answering (QA) chatbot for privacy policies. You can ask it questions about any privacy policy (given that it can attain it). It then uses the policy to answer, in real time with high accuracy and relevance questions that are posed in free-form.
Polisis is a unique way of visualizing privacy policies. Using deep learning, it allows you to know what information a company is collecting about you, what it shares, and much more. You don’t have to read the full policy with all the legal jargon to understand what you are signing up for.
But if you’re interested in getting a brief overview of the technical details of how these systems work, I’d recommend checking my blog post, more precisely the sections (No Magic Pill) and (A Hierarchical Approach).
I know this is a tricky question, albeit important: can someone just use Pribot and Polisis and never read those long conditions anymore? How trustful is this system?
We have an extensive accuracy evaluation in our paper. For example, we reached more than 80% average accuracy in classifying policy segments in different categories. However, as with any machine learning system, it can make mistakes. Even a system with 99% accuracy will make mistakes. So they cannot be legally binding or a complete replacement of the full privacy policy. They are a way to have a quick glance without reading the policy. So to answer your question, yes someone could just rely on them if their only interest is getting a quick look. We’ve already provided evidence from the policy upon hovering over the graph in Polisis. So people can already dig into the policy from our interface. However, if some people are more interested in the more precise details, then they’d have to read the full policy.
What will happen if firms decide to introduce a new regulation that has never been used before?
That’s an important point. If these new aspects fall under one of the categories we detect, we can still classify it. If it is a totally new aspect, that’s different. Let’s say all policies start talking in detail about using blockchain for data privacy. We can’t give an insight about this without adding new annotated data to our system. We’ll have to train our system on this new data. Only then will it be able to process these new topics.
Comments 1