January 30, 2018February 12, 2018

The more you know

A few people sent me the Times article on Strava’s global usage and paths in heat map form this week. Leaving aside the alarming headlines and shares for a second, I wanted to really think through: what are the real issues here? The data on Strava, whether in aggregate form or not, was already publicly available on their site. You can look up certain areas and paths and see all the top people that biked, ran and swam them. I think this feature has been there for as long as the site has existed. The fun of it is that you’ll discover other great athletes near you who run the same patterns – an interesting super-local social network of sorts based on paths. There’s a certain magic there that makes big cities like New York City feel more like a village.

Strava’s contract has always been: join our community, share your paths with other enthusiasts like yourself, maybe learn about new paths near you which you may not have otherwise known. If you wish to opt out and want to keep your data private, Strava’s had a way out. Strava’s settings, like any other social sharing app, make it easy for the user to block their sharing and to keep paths private if you so chose. They are right in offering up those tools and putting the privacy settings back in the hands of the end user, who ultimately should be in control of whether their data is released and shown elsewhere.

The first issue

At the same time however, people don’t know better and are too busy and can’t dig into every setting out there on every platform. There’s a lot of friction involved in getting end users to update settings. Most people just don’t know or don’t care. So if it really matters, the platforms should take care not only to inform them, but to try their best to protect them as well. Many employers, including the government, have to work harder to keep their users informed of the settings and how to protect themselves and their organizations. Can we really trust users with the default settings on any system, knowing they’ll leave the defaults on, knowing they’ll never change the password to their wireless routers?

I’m reminded of a question Apple asks you every so often – which probably makes you think of your battery life more than it does your potential privacy leaks: ‘”________” has been using your location in the background. Do you want to continue allowing this?‘ (Speaking of Apple, take a look at how many location-related settings there are in iOS; you can get very granular with this stuff and still not be able to cover it all.)

The second issue

Why do we trust all our data to Apple & Google fully knowing they can read and hear everything, but then panic over smaller companies and third parties having our information? Is it somehow different if you’re a small service or startup versus a large one?

The third issue

As another exercise: let’s fast-forward and think about a fully decentralized future (I hope), where you are fully in control of your data and you own your data security keys – individually or in aggregate. If a leak happened, would people just blame themselves? People always want someone to blame, but given the choice to manage their own security keys and data, I’m sure a lot of people would not want to deal with it, and would trade for the convenience of data-in-a-centralized-cloud instead.

The fourth issue

Data aggregates can get you into trouble no matter even if the individual data point is harmless. An example: I know where you are right now based on your location in your most recent Instagram or Snapchat story. That’s only one data point, so maybe I can’t do much with it. But if I had a direct API feed of your full history, then that exposes a lot more about your life – patterns and paths and timestamps and locations over time from which I can derive real meaning.

This isn’t new to the world of mobiles and location data either; it is something that comes up in medicine regularly. Here’s a study from PLoS Med on ‘Ethical and Practical Issues Associated with Aggregating Databases‘ from ten years ago:

Participants who consented to the collection of their data for use in a particular study, or inclusion in a particular database, may not have consented to “secondary uses” of those data for unrelated research, or use by other investigators or third parties. There is concern that institutional review boards (IRBs) or similar bodies will not approve of the formation of aggregated databases or will limit the types of studies that can be done with them, even if those studies are believed by others to be appropriate, since there is a lack of consensus about how to deal with re-use of data in this manner.

Combined databases can raise other important ethical concerns that are unrelated to the original consent process. For example, they may make it possible for investigators to identify individuals, families, and groups. Such concerns may be exacerbated in settings where there is the possibility of access to data by individuals who are not part of the original research team.

If you opted out of your individual data, so should you then also have a setting an option to opt out of the “secondary uses” of your data. It’s not enough to let others determine the fate of how they’re anonymizing keys and uniquely identifiable information in datasets. Not only should you be able to opt out of the individual level, but also out of the secondary use process, because you don’t know and don’t have control over that process (and it also likely won’t benefit you or the primary study).

These aren’t easy problems to solve for. They all trade some level of privacy for convenience, for wanting to be noticed, and for wanting free platforms (which are ad-supported, and which then take your data for secondary uses). Ultimately, to properly care for data, it will come down to a new sort of contract between an individual and a service: the individual’s right to hold their data for their own use, the individual’s right to take back their data when they so choose and their right to be forgotten altogether.