Skip links

The first step to data privacy is admitting you have a problem, Google

Opinion One of the joys of academic research is that if you do it right, you can prove the truth. In the case of computer science professor Douglas Leith, this truth is that Google has been taking detailed notes of every telephone call and SMS message made and received on the default Android apps.

It didn’t tell us, it didn’t give us the option to stop it, and it didn’t say what it was doing with all that data.

Google didn’t have a leg to stand on. It coughed to the caper and promised to do better. As it has done previously, such as when its Street View mapping mobiles were shown to be veritable black holes of Wi-Fi suckage. That history of leglessness suggests the company loves data a little too much for self-control. Google’s dataholic behavior may take more than promises to fix.

That Google’s problem is our problem is eloquently illustrated by Leith’s paper on his research. The question he asks is simple: what do the Android Messages and Dialer apps send to Google? The answer could only be found by an impressive display of mid-to-high-level infosec skills, backed up by lots of hard work and determination.

In brief, Leith set up a man-in-the-middle attack on his phones to crack open the data links’ HTTPS/SSL encryption. He dug out as much as possible about the services Google was using to log this data, which involved doing the sort of things we’re not supposed to do, like side-loading APKs from third-party app stores. This can be safe if you know what you’re doing, which Professor Leith does. And that’s all just the start: there’s a ton of raw binary to analyse next.

Because it follows the protocols of science, the paper is a splendid how-to on hacking your own phone. It stands as witness to why attempts to limit security analysis should be fiercely resisted. Looking at you, Governor Mike Parson. Yet as Leith admits, even after all that, he couldn’t fully answer the question he asked himself.

The amount of data, the number of different ways it can move across the network, and the potential extra layers of in-app encryption mean you’re going to bail once you’ve got enough, not when you’ve got it all. That’s before you factor in the constantly changing behavior of apps that are constantly updating on an OS that is itself a moving target, never mind that the telemetry actually used could change from day to day, even hour by hour.

So if an actual professor of computer science can’t find out about the full data privacy provision of just two apps, what chance do the rest of us have? If the security we demand to keep our data safe from attackers is instead shielding it from our own scrutiny, to protect abuse?

Let’s tackle that by assuming good faith, that the abuse isn’t the product of evil intent but bad habits brought on by dataholic intoxication. It’s part of a more general problem, that complex systems built by humans to achieve goals can encourage undesirable patterns. This is explicitly recognized by organizations when it comes to code security; we know we can’t always get it right, they say, so here are bug bounties to anyone who gets to the vulnerabilities before the bad guys.

That works. So let’s extend the idea to privacy violation bounties. Find us breaking the rules and endangering our customers’ privacy, and we’ll reward you. They’ll love that, right? The logic’s the same, though; if you’ve made a mistake, you want to find it before it bites you and your customers, and does it matter whether the mistake is one in code or one in process? Encouraging bright, well-motivated people to help you here means more Professor Leiths will make you better.

The other major change in behavior that would help eradicate the class of error uncovered here is simple documentation. A lot of the data involved was under cover of “analytics,” vaguely defined and never explained, some of which were deemed essential and some of which were optional. “We’re going to use this to make things work better” is about as much as you’re told.

That’s hooey. Google knows what every byte of that data is, and what it’s used for. So should we. The information exists. Let’s have it. Let’s have it in properly structured documents, available through an automated freedom of information request system – don’t want to make it too burdensome, do we? – with types, structures, APIs, and purpose. Keep it up to date. Give it to us in ways we can use to automate our checks. All this is standard DevOps scaffolding: loop us in, and do it properly. Isn’t that how DevOps is supposed to work?

In return for taking responsibilities seriously, companies who do take the pledge to handle their dataholism should get that good faith recognized if GDPR problems do subsequently occur. As with all regulation, the demonstration of bona fides goes a long way to mitigate offences. That’s a bang-up benefit.

It’s to Professor Leith’s credit that he did this work, and to Google’s that it took it on the chin. It is to nobody’s credit that an industry which already has the tools, the experience, and the motivation to vastly simplify such work has failed to do so, nor even have a proper discussion about it.

Dataholism may not be curable, but it can be controlled: sign the pledge, sober up, and fly right. ®