Madhumita Venkataramanan: My Identity For Sale

Madhumita Venkataramanan’s story about the lucrative trade in our so-called “anonymous” data won an Evert Clark/Seth Payne Award for Young Science Journalists in 2015. Venkataramanan is currently the European Technology Correspondent at the Financial Times.

I’m a 26-year-old British Asian woman, working in media and living in an SW postcode in London.

EVERT CLARK/SETH PAYNE AWARD FOR YOUNG SCIENCE JOURNALISTS

The Clark/Payne award recognizes outstanding reporting and writing in any science field. This story was honored in the magazine category in 2015.

I’ve previously lived at two addresses in Sussex and two others in north-east London. While I was growing up, my family lived in a detached house, took holidays to India every year, donated to medical charities, did most of the weekly shopping online at Ocado and read the Financial Times. Now, I rent a recently converted flat owned by a private landlord and have a housemate. I’m interested in movies and startups, have taken five holidays (mostly to visit friends abroad) in the last 12 months and I’m going to buy flights within 14 days. My annual income is probably between £30,000 and £39,999. I don’t have a TV or like watching scheduled television, but enjoy on-demand services such as Netflix and NOW TV. I passed through Upper Street in north London every day last week. I can cook a little but tend to eat out or get takeaways often; foreign foods (Thai and Mexican) are my favourite. I don’t own any furniture and don’t have children. I’ve never been married. I often eat with my university friends on weeknights. I don’t care for cars or own one. I dislike any form of housework, and have a cleaner who lets herself in when I’m at work. I shop for groceries at Sainsbury’s, but only because it is on my way home. I am not attached to my neighbourhood, and have no contact with my neighbours; I like the idea of living abroad some day. I prefer working as part of a team rather than alone; I’m ambitious and it is important to me that my family thinks I’m doing well. I often go to the pub on Fridays after work. At home, I am far more likely to be browsing restaurant reviews than managing my finances or looking at property prices online. I am rarely swayed by others’ views.

This motley set of characteristics, desires, thoughts and attitudes comes very close to defining me as a person. It’s also a precise and accurate description of what a group of companies I had never heard of — personal-data trackers — has learned about me.

Earlier this year, I became curious about the personal-data economy. It has grown relentlessly into a multibillion-pound business of tracking, packaging and selling data picked up from our public records and our private lives. As I dug deeper into the world of trackers, it reinforced my anxieties about a profit-led system designed to log behaviour every time we interact with the connected world. I was aware that the data generated by apps and services I use daily — from geolocation and cookies to social-media tracking and credit-card transactions — was building a record of my past.

Combine this with public information such as Land Registry, council tax and voter-registration data, daily location routes and social-media posts, and these benign data sets reveal a lot — such as whether you’re political, outgoing, ambitious, pessimistic, uptight or a risk taker.

Even as you’re reading this — you may be sedentary, but your smartphone can reveal your location and even your posture — your life is being converted into such a data package; once it has been compiled into lists (interested in technology, subscribes to magazines, probably male, professional, high earner) by intermediaries known as data brokers, it’s sold on to data aggregators and analysts and eventually any company.

Ultimately, you are the product.

Under the EU’s Data Protection Directive, implemented into UK law in 1998, personal data can be sold to third parties only with your consent and once it has been stripped of your name and any unique identifiers such as National Insurance number. Data is “personal” when third parties are able to link the information to an individual, even if the person holding the data cannot make this link. Simple examples of “personal data” are: full address, credit-card number, bank statements, criminal record etc. Third parties can process the data for their own interests as long as the data subject has consented (this can be assumed, not explicit) and can access or rectify any incomplete and inaccurate entries (Article 12).

But particulars such as your postcode, age and gender can be traded — because they are not personal, but “pseudonymous”, defined by the EU Data Protection Regulation (a draft law that has yet to be adopted) as “personal data that cannot be attributed to a specific data subject without the use of additional information”.

But the data business has outgrown the directive. With the globalisation of data flows, there is so much additional information available that the risk of “pseudonymous” data being identifiable has multiplied. In the hands of commercial organisations with basic statistics skills, subsets of data can be unlawfully cross-referenced with other data points about you, to identify you with ease. “Removing someone’s name [from a list] to make them anonymous, that’s an idea that went out in computer security 15 years ago,” says computer scientist Joss Wright, from the University of Oxford’s Internet Institute, who focuses on data anonymisation and privacy-enhancing technologies. “If I know where you live and your salary and how many children you have and your medical conditions and your location patterns, then what’s in a name?”

In other words, truly anonymising data about an individual is much more difficult than removing their name. In fact, the more data points collected for an individual, the more likely their record is to be unique. According to Pew Research, an average adult Facebook user has 338 friends – that’s at least 338 columns or “dimensions” of data per user.

A data set of your mobile-phone locations over an hour will have over 500 dimensions (a phone beams its location to a cellular tower every seven seconds or so); health data, depending on how many types of records you have, could have thousands of dimensions; and genetic data is one-million dimensional (we have about 1.3m genes in total). Once this is matched with any other data set that contains constants such as your age, sex and address, you’ve been found. In other words, unexpected information can become “personal” when combined with enough other relevant bits of data.

Latanya Sweeney, director of the Data Privacy Lab at Harvard University, has shown that roughly 87 per cent of people in the US can be uniquely identified by the combination of just three facts about them — zip code, age and sex. Given that Sweeney was referring to a five-digit zip code in the US, which addresses about 320 million US citizens, rather than the 65 million UK citizens serviced by our longer postcodes, this likelihood is far higher for UK citizens.

Your data on the market: electoral-register details, house prices, driving records, credit-card purchases, NHS hospital data, phone locations and app details.
 

Telecoms providers such as Vodafone, Telefonica, EE and Verizon have disclosed that they sell anonymised location data (without MAC numbers) packaged into user categories to retailers who want to know their customers’ footfall patterns. Free apps are the most effective way for third parties to acquire your locations. “Apps likeFlashlight,Mirror, the gaming apps — any that ask you for your location are sending it back to the developer continuously, who can sell it to advertisers,” says Schoen. “That’swhythey’re free.”

In March 2013, researchers from the MIT Media Lab studied 15 months of anonymised locations, donated by an unspecified mobile provider, of 1.5 million people in an unnamed European country. They had customers’ location stream on an hourly basis, but didn’t know who they were. “We found that we needed just four approximated places and times to uniquely identify 95 per cent of people. About 50 per cent could be identified from just two points,” says Yves-Alexandre de Montjoye, first author of the resulting paper and PhD student at the Human Dynamics lab in the Media Lab.

The Snowden files showed that the NSA and GCHQ piggyback on Google’s ad-tracking cookies to identify and monitor specific users.
 

An example on its website captured by MedConfidential shows female patient OS060900, aged 81-85, who had five conditions diagnosed in October 2010. She had 257 hospital visits, mostly as an outpatient, but also had a five-day stay at which point eight conditions were diagnosed. If OmegaSolver had her postcode too, it could easily match this with electoral records for her. The company declined to comment.

Care.data was put on hold in February, but will be restarted soon across roughly 500 GP practices, according to NHS England. Tim Kelsey, NHS national director for patients and information, was unavailable for comment.

American bank Capital One is using personalised data to decide which financial products to show first-time customers. It uses the services of New York-based data-tracking firm [x+1], which claims it can determine personal details (not names) of a website visitor within 200 milliseconds.

According to cofounder Ted Shergalis, it analyses information about users’ devices, browsers and location through IP addresses; it also buys postcode data and profiles about users’ hobbies and interests from online tracking companies and data brokers.

In 2010, the Wall Street Journal approached [x+1] for a piece on how its customer profiling worked. Capital One allowed [x+1] to identify two volunteers — Thomas Burney, with no children, a graduate who worked in management, owned his home; and Carrie Isaac, a young small-town mother with a midscale income. The Journal’s exposé showed how, based on [x+1]’s assessment, Capital One promoted to Isaac some of its least generous cards, while Burney saw only one, the Capital One Prestige Platinum, which included no initial interest or annual fee. Both participants had assumed they were offered the only available options, and were unaware of the tailoring.

During an average day, I wake to my phone, fumble for it on the bedside table and take in the data dump. Today, there are 12 emails, three tweets, four Facebook notifications, two Endomondo prompts and a reminder about a meeting.

One of the largest data brokers, Acxiom, which works with Facebook, has data on over 750 million consumers, with up to 1,500 data points on each.
 

As long as my phone is on, my movements can be tracked. Until August 2013, the bins at Bank tube station were picking up phones’ unique MAC numbers via their Wi-Fi, without the knowledge of phone owners — a technology built by London startup Presence Orb and implemented by Renew London. While it was tracking passers-by, Renew — now in administration — would know if I was the same person who passed by yesterday, my specific route and how fast I was travelling.

That’s exactly what the sensors in the Godiva shop do. ShopperTrak, a Chicago company, counts the people passing by Godiva via their phones, and modifies the display accordingly. According to Russell Evans, VP of global marketing, ShopperTrak also uses in-store Wi-Fi sensors to track customers’ phones, so it knows if they came back.

The screen in Tesco’s forecourt has a camera installed by London-based Amscreen, as it has in 500 stores around the UK. “The screens are not just passive TVs — they are like giant mobiles with sim cards, 3G and GPS. The algorithm, made by French company Quividi, knows someone is standing in front of it and figures out your sex and age, while also recording the time and location,” says Mike Lemmings, head of marketing and product development at Amscreen. Demographic patterns can then be sold to advertisers. Tesco now knows I am a 26-year-old female and, if combined with my Clubcard data, it could find out what I buy and where I live, as well as my demographic details.

I decided to piece together what these companies really knew about me. I spoke to Eyeota, a data-analytics profiler with offices in Singapore, Berlin and Sydney. It has -partnerships with a range of websites, so when I visit their pages, it can place a cookie on my browser. It also buys data from Experian to enhance each cookie profile. Using my browsing activity, Eyeota assigns my cookie up to a thousand attributes ranging from sex and region to type of job, whether I have children, own a car, like to buy Star Wars — memorabilia and so on. It never finds out my name, but it knows more about me than my neighbours do.

My Eyeota cookie knows that I’m a 26- to 35-year-old female, working in the media/internet industry.

I’m interested in entertainment, particularly movies, and in entrepreneurship and startups. I intend to buy flights in the next 14 days.

Eyeota also buys data from Experian’s Mosaic database: a collection of 15 demographic groups and 66 lifestyle types based on your postcode and calculated from a variety of data sources which Experian would not disclose (it could mention only voter-registration records and census records as examples). It matches you up to a type.

My Mosaic type runs to 16 pages, with predictions about everything from my financial circumstances to my view on the world. It includes an accurate description of my ethnicity, age, education, profession, home life and financial circumstances.

Because Eyeota buys this profile, it knows I probably take taxis rather than buses when I get home late, that I am unlikely to visit DIY stores, that I spend a large part of my income on eating out and long-haul, use sites such as Airbnb, and that most of my friends are people I met at university. It can then sell this information to the highest bidder.

Profiling individuals has been a gold rush. “It expanded post-9/11 because governments were trying to prevent the next 9/11,” Sparapani says. “This was all to try to predict who is related to whom, engaged in terrorist activity, or laundering money. A new ecosystem was established.” Now, the data is traded commercially and sustained by advertisers — the Interactive Advertising Bureau says revenues in the US last year hit an all-time high of $42.8bn. A large part of it is based on selling targeted user data on Facebook and Google. A range of more sensitive data is now available, leaving us much more exposed.

They’re going to know more about us than we know ourselves. 
 

Now our tax records could be exposed in a similar way to our HNS records. “[In July], we had a meeting in Parliament about a proposal from HM Revenue and Customs to sell our tax records to people in the City and remove names and addresses from it but assign a unique ID,” Ross Anderson says. Such a plan to sell off anonymised tax records is described as “borderline insane” by David Davis. “If they’re [like] the NHS, the unique ID might be postcode. If all the tax records at a postcode are lumped together, there are only a few houses it could be,” says Anderson. HMRC has already launched a pilot, releasing data, such as names and addresses of companies that are VAT registered, to three credit-ratings agencies, including Experian. “It’s really hard to see how this is anything but commercial, frankly.

[Companies] are going to make value by adding together tax records, social records, existing databases and health records. They’re going to know more about us than we know ourselves,” says Davis. “This is statewide identity theft.”

And the more data that becomes available, the easier it will be to identify you. “All biometric sensor data sources are going to be pretty easy to re-identify,” says Scott Peppet, privacy lawyer at the University of Colorado. “Think about your heartbeat or how you walk, or your pattern of exercise. There’s gonna be no one who has the identical patterns to you.”

In June 2013, the CIA’s chief technology officer Ira Hunt gave a talk at GigaOM’s Structure:Data conference in New York about the importance of wearables such as the Fitbit to the security services. He said not only could you infer an individual’s sex, height and weight from Fitbit data, but that they were “100 per cent guaranteed to be identified simply by [their] gait”. When contacted by WIRED, Fitbit representative Katie Henry said, “We use Mixpanel (and other analytics providers) to understand how customers use Fitbit. Mixpanel, just like all our third-party service providers, is prevented from using any personally identifiable data for any other purposes.”

As the data we generate about ourselves continues to grow exponentially, brokers and aggregators are moving on from real-time profiling — they’re cross-linking data sets to predict our future behaviour. Decisions about what we see and buy and sign up for aren’t made by us any more; they were made long before. The aggregate of what’s been collected about us previously — which is near impossible for us to see in its entirety — defines us to companies we’ve never met. What I am giving up without consent, then, is not just my anonymity, but also my right to self-determination and free choice. All I get to keep is my name.

 You may read this story in its original format on Wired.