One of the most existential questions of the modern web is how online companies should generate revenue. The web of today reflects primarily one answer to that question: that of a web where everything is free, but we pay for it through our privacy. The web has become a dystopian surveillance state in which companies stalk their unsuspecting victims across the web, extracting maximal profit from removing any shred of privacy or dignity and socializing the risk of data breach or damage to the user, while privatizing all the monetary benefit of exploiting them. Social media platforms often generate the majority of their revenue through selling hyper targeted advertising based on algorithmically mining every second of their unwilling and unwitting users' lives. Yet those same companies go to great lengths to argue that they are not "selling" their users' data. What does it mean for companies to "sell" our data in today's data hungry world?
The question of whether social media companies "sell" their users' data was thrust back into the spotlight last week when a trove of internal emails from Facebook's senior executives was released by the British Parliament. Among them was a chain featuring none other than Mark Zuckerberg himself proposing the idea of actually charging a monetary fee for developers to access user data, which they could repay either by purchasing advertising, selling items or simply writing Facebook a check. While the company took great pains to emphasize last week that it never ended up following through with the proposal, the mere fact that the company's founder had openly discussed quite literally charging a per-user fee to access user data really drove home how the company views its users as monetizable entities being exploited by a for-profit company, rather than a benevolent company trying to connect the world and generating revenue only where it would not conflict with its public good vision.
Moreover, Zuckerberg's equivalency between advertising revenue and writing a check demonstrates that the company sees little difference between selling access by advertising and selling access by check.
It is worth noting the stark difference between Facebook's internal descriptions of its two billion account holders compared with how it describes them in its public materials. In public statements about commercializing account holders, Facebook goes to great lengths to use humanizing language like " people," " person" and " anyone." Yet in the 250 pages of internal emails released last week that show the unvarnished genuine language used by Facebook's executives internally to describe account holders, the word "customer" shows up only once, in the context of the Royal Bank of Canada's "customers." The word "people" shows up in only two brief passages, both in the context of account holders providing content (whether posts or engagements) of value to Facebook. The word "human" never makes a single appearance. In contrast, the word "user" appears throughout the 250 pages, most notably when Zuckerberg himself refers to Facebook's two billion account holders as "users" when proposing that the company could directly charge developers a fee of "$0.10/user per year."
It seems that Facebook's account holders are not "customers" since that would afford them a certain level of dignity and a relationship based around the company providing them a valuable service in a mutual transaction. They are only "people" when it comes to public statements and in the context of extracting monetizable behaviors from them. The rest of the time they are dehumanized through the term "user" to remind us that we are merely datapoints and login accounts to Facebook, not real human beings whose lives are being exploited and monetized for its benefit.
Over the last two weeks the company's executives have sought to draw a distinction between monetizing users through advertising and monetizing them through boxing up their data for download like the data brokers they formerly purchased from.
Zuckerberg offered that "we've never sold anyone's data" while Facebook's Vice President of Advertising Rob Goldman argued "we don't sell peoples' data. Period. That's not a dodge or semantics, it's a fact. We don't sell or share personal information."
Of course, it is important to caveat that Facebook has previously argued that providing access to outside companies did not constitute "sharing" so long as it considered them to be " partners."
Are Zuckerberg and Goldman right that Facebook does not "sell" user data? The answer revolves around what it means to "sell" data.
When we think about "selling" user data we typically think of a company boxing up the personal information of its customers and selling them as downloadable ZIP files with per user and flat rate pricing. Indeed, the enormous world of data brokers exists to do precisely this. Many companies we do business with, from the grocery stores and brick and mortar stores we shop at to the newspapers and magazines we subscribe to, box up their subscriber information and sell those lists for a profit.
Verizon reminded us this summer that even paying a subscription fee doesn't mean a company won't turn a side profit by further monetizing its customers by selling ads and even outright selling their data. For all the naïve talk about how a fee-based Facebook would end surveillance, Verizon reminds us that even those companies that charge a fee for their services will still monetize their users on the side.
Walgreens offers a useful comparison to Facebook's definition of "selling" user data. While most Americans likely believe that their drug prescriptions are protected from any form of exploitation under medical privacy laws like HIPAA, it turns out that those laws permit pharmacies like Walgreens to monetize their users through advertising. Specifically, pharmaceutical companies can pay Walgreens to send an advertisement for a drug trial to all its customers that suffer from a particular medical condition. The pharmaceutical company itself is never given a list of patients, it merely hands the ad over to Walgreens and pays a fee and Walgreens sends the mailers itself.
For all intents and purposes, Walgreens has created an offline physical mail advertising model that mimics the hyper targeted digital ads that clog the online world. Like Facebook, the company is careful to argue that it does not "sell" its customer data, it merely sells access to those customers to show them advertisements. To a Walgreens customer that receives a mailer on behalf of a third-party company they've never heard of targeting them because of a prescription they filled at Walgreens and thought was confidential, the distinction between "selling data" and "selling access" is likely unimportant. As far as they are concerned, Walgreens sold their data. Notably, when asked why the company does not explicitly inform customers at purchase time that it will use their prescriptions to sell access to them, the company noted that under HIPAA, selling access to customers does not "require patient authorization."
Facebook is therefore in good company when it comes to businesses drawing a distinction between selling access to their users for advertising versus boxing up their data and offering downloadable ZIP files.
Just what is Facebook selling? In his statement last week, Zuckerberg compared Facebook to a cloud computing company like Amazon and Google. Yet, developers turn to cloud vendors to purchase access to unique hardware and software environments, not data. As an Amazon or Google or Microsoft customer, you are renting empty computers to fill with your own data, the cloud companies don't offer any access to their customer data of any kind.
In contrast, Facebook is in reality renting access to data. Its sole value proposition to developers is access to its two billion users. A giant manufacturer building solar power arrays doesn't turn to Facebook to rent petabytes of storage and tens of thousands of processors and GPUs to run simulations and neural models. It turns to an actual cloud computing vendor.
The developers that turn to Facebook are there for one sole purpose: to reach Facebook's two billion users.
Does that count as Facebook "selling" the data of two billion users? It certainly constitutes "selling access."
To put it another way, if Facebook genuinely believes that developers view it as a traditional cloud computing vendor and that it is not "selling" its users' data, then it could simply shut down all of its user APIs and allow developers to run their applications on Facebook without any ability to publish, consume or otherwise interact with its users. If access to users is genuinely not any part of Facebook's value proposition to developers, then this would not have the slightest impact on usage of its platforms.
After all, Amazon has a robust cloud computing business without offering its cloud customers any access to the personal private information of its Amazon.com customers.
In arguing that Facebook's business model does not count as "selling data," the company offered the defense that "It's how the internet works, not just how Facebook works." In short, when asked whether its business model was morally defensible, the company responded not by arguing that it was, but rather by arguing that "everyone else does it" so it is ok for it to do it too.
This is noteworthy because it is exactly the same defense it offered me when I asked about its former practice of purchasing intimate data about its two billion users from commercial data brokers. Asked about the ethics of doing so and especially the opacity around its practices and its failure to provide users with more information about what was happening with their data, the company argued that everyone else does the same thing so it is ok for it to do it too.
Of course, the idea that Facebook does not "sell" its data belies the fact that it is often compelled by governments to " provide " its users' private intimate data under court order.
In addition to merely "selling access" to advertisers and developers to reach its two billion users, Facebook also makes data available in other ways. Demographers wishing to create maps of specific combinations of traits and interests or understand their temporal changes can use advertising campaigns to create population scale insights.
Similarly, advertisers running ads that link back to their sites know that every person following that link possesses the specific traits the ad targeted. An ad targeting Catholic women 25-30 interested in football will result in click throughs of precisely those individuals to the advertiser's site.
A New York Times editorial this week argued that such click throughs constitute a form of data sale in that advertisers can pay Facebook to receive traffic from specific demographics and that the resulting IP addresses that visit their site are thus known to be users with those traits. The author argues that this in effect constitutes a form of external data sale.
In other words, if Facebook considers giving a data broker a phone number and getting back demographic selectors about that person to be " buying " data, then a company paying Facebook to get IP addresses and demographic selectors would seem to fall under a similar category of a data transaction.
Facebook pushed back against the editorial, arguing that because advertisers only have an IP address and not the person's name or contact details, that such data is in effect "anonymous." In essence, as long as a person's name and contact information are not attached to a record, that their IP address alone is not a unique identifier in Facebook's view. As Goldman put it, "what makes it anonymous is that you won't know who those people are," only their IP address.
In reality, there are countless ways outside companies can reidentify an IP address to a specific user. There are numerous data brokers that sell the most recent IP address used by each person in their database, tying IP addresses to the address information those users enter into sites across the web, such as ordering products or entering surveys. Though, as with all data broker datasets, it is unclear how updated or accurate this information is.
Larger advertisers, including data brokers themselves, already track their customers across the web using cookies and know the most recent IP address each of their customers used to access their website or mobile app. They can run tens of thousands or even millions of ad campaigns on Facebook targeting each demographic of interest and simply cross reference the IP addresses of the clickthroughs from each campaign against their own records of which IP address is associated with each customer. While imperfect, such linking is no more error prone than the processes data brokers and companies use already.
Even if a clickthrough is not an existing customer, the demographic information implied by that clickthrough can be used to vastly enrich the customer's website experience and purchasing record.
Imagine a user visits the site of a consumer products company out of the blue. The company knows absolutely nothing about that user other than inferring their geographic location from their IP address and estimating their rough demographics and purchasing power from the kind of computer and browser they are using. Now, imagine instead that that user came through a referral from the company's Facebook ad targeting female millennial Bernie Sander supporters in New York City who rent, have a dog, work in the financial industry and love luxury coffee. The company now knows quite a lot about that person and can tailor the landing page to present a hand selected set of extremely relevant products. If the person purchases a product, they can then append all of those demographic selectors from Facebook to the customer's profile to use for future customization and marketing.
Does the fact that this third-party company received demographic selectors from Facebook that it used to customize its site and enrich its customer record mean that Facebook "sold" it that data? The company would not have received that demographic information from Facebook without paying for it.
At the same time, Facebook's argument is that since the data they sent to advertisers is identified by IP addresses rather than mailing addresses, phone numbers or person names, it should be considered "anonymous" data and thus doesn't count as "selling" data.
Under this justification, Facebook could box up the totality of two billion users' personal data and sell it at $0.10 a user per year as downloadable ZIP files so long as those ZIP files have the person's name, address and phone number stripped out and uses only their IP address as their identifier.
As any data scientist or privacy expert realizes, however, the wealth of online data available means that an IP address is frequently enough to connect an "anonymized" record back to a real person.
Arguing that a customer record is "anonymous" and thus does not constitute "selling" data merely because it uses an IP address instead of phone number as an identifier is simply an absolute falsehood in today's data drenched world. Facebook of all companies knows this.
Even if a record was stripped of all identifiers, including its IP address, unique combinations of characteristics could be used to readily reidentify customer records by comparing them against other holdings like data broker archives. In essence, the unique pattern of our behaviors acts as the equivalent of a digital fingerprint that can be used to reidentify us merely from our behavioral traces.
Facebook's stance that stripping common identifiers is sufficient to render data "anonymous" even with an IP address attached helps explain its view towards its academic research initiative Social Science One and that it is acceptable to make its two billion users' private intimate information available to academics across the world so long as they are "anonymous."
Asked last month about its perspective on data sales, the company did not respond. It also did not immediately respond to a request as to how it views the threshold of anonymity of user data.
Putting this all together, in the end companies like Facebook may attempt to draw legal differences between "selling data" and "selling access" and that IP addresses still constitute "anonymity" but the reality is that the general public sees all of these monetization behaviors as the same exploitation of their personal privacy for monetary gain. Instead of arguing semantics, companies should take genuine steps towards regaining the trust of their users, starting with coming clean about all of the ways they exploit their users' data and all of the ways they have considered using their data and no longer hiding behind arcane legal definitions. In the end, companies that ask the public to trust them must earn that trust.