One lesson from Strava: AI is making data privacy harder

strava
Image credit: Nutthanun Gunthasen / Shutterstock.com

Last week, fitness app company Strava found itself in hot water after some security analysts revealed that its public “heat map” – a feature of its social media platform that leverages GPS and customer data to map out wherever they’ve been jogging or walking – was inadvertently revealing the locations of secret military facilities and the movements of personnel who use the app.

Oops!

It’s embarrassing for Strava, of course, but there’s far more to the story than a big-data privacy goof or even the security implications for such facilities. The Strava heat-map incident also provides stark evidence that the issue of data privacy is growing so complex that it’s getting harder for companies whose business model relies on data collection to maintain the balance between added value and meaningful privacy. It used to be a relatively simple matter of providing customers with an opt-out/opt-in tick box and a link to manage their privacy settings. Not anymore, it’s not.

Because apps like Strava need more data to work better, Strava benefits from keeping user opt-outs to a minimum. As Arielle Pardes at Wired points out, Strava (and other fitness apps) come out of the box with all the privacy settings preset to “opt-in”. Customers can manually opt out or adjust their privacy settings to their tastes, but it’s a cumbersome process that appears designed to discourage such adjustments.

And even if you tick all the boxes, it doesn’t necessarily mean your data is private in the sense you might think the word “private” means:

Even if you’ve turned all of your settings to private and stopped sharing your activity, it doesn’t mean that your data is “private.” Strava, and many other fitness apps, reserve the right to store and then share your data as long as its aggregated or anonymized—but as Strava’s heatmap made clear, it’s not always as anonymous as it seems.

Also, there’s no guarantee all of your data will stay anonymous:

Consider, also, that it’s possible to de-anonymize some of Strava’s data by making a request to the company’s API. And Strava doesn’t make any promises about what it won’t do with your data. In the past, it’s sold its location data to cities looking to parlay information about where cyclists bike to create better bike lanes. Mostly harmless, sure, but Strava has the potential to sell your data elsewhere too.

Again, this in itself isn’t a new issue – we’ve known for years that when your business model thrives on data collection, you need as many users participating as possible to deliver the most value for both your users and your actual customers (advertisers, content partners and companies looking to buy your big data). So from a pure business point of view, there’s more incentive in convincing users to give up their privacy – with their consent, preferably, but by quiet default if necessary.

What’s changing, writes Zeynep Tufekci in the New York Times, is that it’s increasingly difficult not only for consumers to give consent with a full understanding of the risks of allowing their data to be collected and used, but for the companies themselves to understand what exactly those risks are:

Part of the problem with the ideal of individualized informed consent is that it assumes companies have the ability to inform us about the risks we are consenting to. They don’t. Strava surely did not intend to reveal the GPS coordinates of a possible Central Intelligence Agency annex in Mogadishu, Somalia — but it may have done just that. Even if all technology companies meant well and acted in good faith, they would not be in a position to let you know what exactly you were signing up for.

One reason companies can’t advise customers on privacy risks, Tufekci adds, is the increasing prevalence of machine learning in big data analytics that can take seemingly inconsequential data and blend it to reveal facts and trends that the company isn’t expecting to find. This has been one of the touted benefits of big data – showing companies potentially monetizable patterns in their customer data it would never have occurred to their product development teams to look for. But it also muddies the data privacy waters even further:

A challenging feature of machine learning is that exactly how a given system works is opaque. Nobody — not even those who have access to the code and data — can tell what piece of data came together with what other piece of data to result in the finding the program made. This further undermines the notion of informed consent, as we do not know which data results in what privacy consequences.

Needless to say, the ramifications of all this go way beyond Strava and fitness apps. All kinds of businesses (to include telcos and other service providers) are aiming to leverage big data and machine learning/AI to some degree or other.

There’s no easy solution to this, but companies should start by revisiting their data privacy policies (assuming they have one, and if not, why the hell not?) and their big data strategies – in terms of consumer protections, consent and the unintended consequences of letting AI software crunch data in mysterious ways.

Be the first to comment

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.