The 23andMe data breach reveals the vulnerabilities of our interconnected data

Users' genetic information was accessed during a hacker attack on the 23andMe's user databases. (Shutterstock)
Users' genetic information was accessed during a hacker attack on the 23andMe's user databases. (Shutterstock)

On Oct. 6, news broke that 23andMe, the Google-owned company that collects genetic material from thousands of people for ancestry and genetic predisposition tests, had a massive data breach.

But as it turns out, the company’s servers were not hacked. Rather, hackers targeted hundreds of individual user accounts — allegedly those that had weak or repeated passwords. After gaining access to the accounts, hackers could leverage the “DNA relatives matches” function of 23andMe to get information about thousands of people who didn’t use the service.

This data breach challenges how we think about privacy, data security and corporate accountability in the information economy.

Shared information

Genetic information databases have a notable feature: anyone’s DNA data also reveals information about others who share part of their genetic code with them. When someone sends a sample to 23andMe, the company has genetic information about that person and their relatives even if those relatives didn’t send a sample or consent to any data collection. Their data is inevitably intertwined.

This isn’t just a characteristic of genetic data. Most data is about more than one person because data often describes shared features between people.

The ramifications of overlooking how personal data affects others extend to the entire information economy. Every individual choice about personal data has spillover effects on others. People are exposed to consequences — ranging from financial loss to discrimination — stemming from data practices that depend not only on information about themselves, but also on information about others.

User data-collection agreements can lead to indirect harm to third parties. For example, the negative impacts of the Cambridge Analytica scandal extended far beyond those whose data the company collected.

This predicament underscores the collective impact of individual data decisions.

Data analytics

Algorithms powered by artificial intelligence draw inferences by analyzing the relationships between data points. AI algorithms rely on databases containing information about multiple people to learn things about a particular person or a particular group.

Companies draw conclusions about people by analyzing data collected from others, making probabilistic assessments based on personal characteristics and relationships. Companies continue to add information about people to their datasets daily. And, the more people a dataset like the one built by 23andMe includes, the less someone’s choice not to be part of it matters.

AI-powered algorithms analyze user information and the connections and relationships with other people’s data. (Shutterstock)
AI-powered algorithms analyze user information and the connections and relationships with other people’s data. (Shutterstock)

Similarly, every time a user agrees to the collection, processing or sharing of personal information, it also affects others who share similarities with the user. These collective assessments make data processing profitable, such as through marketing, data sales and business decisions based on consumer behaviour.

Equity issues

The interconnected nature of data isn’t a coincidence — it’s at the core of how businesses operate in the information economy. This also creates equity issues.

In the 23andMe case, hackers are offering the assembled genetic information for sale, with lists that include thousands of people. Hackers reportedly assembled and put up for sale lists of people with Ashkenazi Jewish ancestry.

Individuals on the list now face increased risk of discrimination or harassment, as leaked data includes names and location. They could do the same for people with a propensity for type 2 diabetes, Parkinson’s disease or dementia — all of which 23andMe measures — putting them at risk of other harms, from raised insurance premiums to employment discrimination.

Data’s collective risks

We often fail to acknowledge the interconnected nature of data because we’re fixated on each individual. As a consequence, companies can exploit one person’s agreement to legitimize data practices involving others. Companies’ legal obligations to obtain individual agreements for data collection fail to recognize broader interests beyond those of the person who agreed.

We need privacy laws attuned to how the information economy works. Providing consent on behalf of others, as 23andMe users did when they clicked “I agree,” would be illegitimate under any meaningful notion of consent. To contain group data harms like those this hack produced, we need substantive rules about what companies can and can’t do.

Prohibitions on indiscriminate data collection and risky data uses avoid leaving unsuspecting individuals as collateral damage. Because corporate data practices can impact everyone, their safety obligations should too.

This article is republished from The Conversation, an independent nonprofit news site dedicated to sharing ideas from academic experts. Like this article? Subscribe to our weekly newsletter.

It was written by: Ignacio Cofone, McGill University.

Read more:

Ignacio Cofone does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.