The limits of ancestry DNA tests, explained

,

23andMe wants to sell you vacations based on your DNA. But what are they really basing that on?

Danush Parvaneh/Vox

Identical twins have virtually identical DNA. So you’d think if a set of twins both sent in a DNA sample for genetic ancestry testing, they’d get the exact same results, right?

Not necessarily, according to a recent investigation by the Canadian Broadcasting Corporation. In fact, the journalists demonstrated that twins don’t often get the same results from a single company. And across the industry, estimates of where an individual’s ancestors lived can differ significantly from company to company.

In one instance, the consumer genetics company 23andMe told one twin she was 13 percent “Broadly European.” The other twin’s test, meanwhile, showed she had just 3 percent “Broadly European” ancestry, and had more DNA matched to other, more specific regions in Europe. What’s more, when the twins had their DNA tested by five companies, each one gave them different results.

One computational biologist told the CBC that the differences in the results were “mystifying.”

So what accounts for these differences? Overall, discrepancies in ancestry testing don’t mean that genetic science is a fraud, and that the companies are just making up these numbers. They have more to do with the limitations of the science and some key assumptions companies make when analyzing DNA for ancestry.

The companies that provide ancestry testing, like Ancestry.com and 23andMe, deliver the precise ancestral breakdown of their customers’ DNA. They might say, for example, someone’s ancestry is 25 percent Italian, 74 percent East Asian, and 0.1 percent Sardinian. They also market their product in a way that suggests their test reveals something deeply meaningful about you.

In its ancestry reports, 23andMe says “your DNA tells the story of who you are, and how you’re connected to populations around the world.” The company is now even partnering with AirBnb to help customers plan “heritage” vacations in places where their ancestors lived.

What’s not always obvious from these reports is that they’re based on estimates that can vary from company to company, and have built-in sources of error. Your results from one company can even change over time as the company signs up more users, and gathers more data.

The companies do have webpages that explain these limitations, but you have to dig a bit to find them. You could very easily purchase one of these kits without coming across it.

Consumer genetic testing is growing explosively. According to the MIT Technology Review, 26 million people or more have taken a genetic ancestry test. Tech Review also found that in 2018, the number of tests purchased surpassed sales of all previous years combined.

These genetic tests commonly cost $60 to $100 — or more if they contain health information. As the market grows, consumers need to be aware of what exactly these tests are telling them (and even more so when it comes to information about health and wellness).

So what does it really mean when a company tells you you’re 40 percent Greek? To answer that question, it’s useful to understand what exactly these tests are looking for, and the assumptions they have to make to guess your ancestry.

“What’s important to understand is that genetics can guide answers” about ancestry, says Joe Pickrell, a geneticist and the CEO of Gencove, a company that sells genetic testing hardware and software to other companies. “There’s no time machine, no crystal ball.”

Even though genetic ancestry tests deliver precise percentages about our heritage, the reports are best thought of estimates, based on imperfect data.

Step by step, let’s walk through why.

First step: spit in a tube

There are about 3 billion base pairs — the individual letter instructions of our genetic code — that make up the human genome. When you spit into a tube and send it off to a company like 23andMe, Ancestry.com, or MyHeritage, they don’t bother looking at every single letter. That would be overkill.

All humans have about 99.9 percent of their DNA in common. So instead, to speed up the process, the tests look out for the locations on the genome where people commonly vary from one another.

These are spots where you might have the nucleotide (the molecule that forms one half of a base pair) adenine and I have thymine. That’s it. In all, these single-letter changes in our DNA can help explain why one person is taller than another, or why one has brown eyes and another green.

In science jargon, these variations are called single nucleotide polymorphisms, or SNPs (pronounced “snip”). Companies can analyze half a million SNPs or more in an ancestry test.

When a genetic testing company gets a tube of your saliva in the mail, it first has to extract the DNA from it. “You remove the cell debris, proteins, all of the things that are not DNA,” Yaniv Erlich, the chief science officer at MyHeritage, explains. They make copies of your DNA, then break those stands up into shorter chunks.

The chunks are then fed into a machine called a genotyping array. These arrays kind of — and this is an absolute simplification — work like a coin sorter, but for SNPs. They’ll tell the companies which versions of SNPs you’ve inherited, and at what location in the genome.

Many SNPs are meaningless when it comes to our health. But they can be useful starting points for tracing ancestry. That’s because, like everything else in our genome, SNPs are passed down through the generations. The more SNPs we share in common with another person, the more likely we share a similar, and more recent, ancestry. Your ancestry is estimated by comparing your SNP results with a genetic database of people with known ancestries (more on this in the next section).

“We’re talking about 99.9 percent accuracy for these arrays,” Erlich says. But even with that high level of accuracy, when you process 1 million places in the genome, you might get 1,000 errors. Those small errors alone can help explain why one twin might have slightly different results from another. (This source of error is why the health results you get back from genetic testing companies may show discrepancies too.)

Second step: your DNA is compared to the DNA of other people with known ancestries

Errors aside, the genotyping we get from each of the consumer testing companies should be just about identical to one another (that is, if the companies are looking at the same set and number of SNPs). But how companies analyze that raw data varies. And that’s why one company’s ancestry results might look a bit different from another’s.

Here’s how it works.

Companies like 23andMe, Ancestry.com, and MyHeritage compare your set of SNPs to known reference groups (SNPs that tend to be found in people of, say, Greek origin). The tests are looking for evidence that you have common ancestors with people in the reference group.

But the reference group each company uses can be different. And the reference groups are changing all the time.

As STAT news reports, people who used these tests just a few years ago are now finding their results have changed. The companies say this is a feature of their product, and that as they get better at predicting genetic ancestry, they’ll pass that information on to consumers. Yet it also undercuts their marketing, which implies that their tests reveal something fundamental about you.

Another limitation: These reference groups are largely based on people who are self-reporting their ancestry. These people may be pretty confident that they know where their families come from, but it’s not a perfect measure.

Ancestry DNA companies can often track down European DNA to specific countries. But if you’re a minority, your report might be vaguer. Prior to this past summer, 23andMe could only match people to just three broad regions in sub-Saharan Africa, which is an enormous area with a lot of geographic and ethnic diversity. And that’s just because there aren’t as many African people in these company’s reference data sets.

“Imagine you’re from a small town in Spain,” Pickrell says. “If [the testing companies] have a bunch of people from that small town, they can match you against them really effectively.” But if they don’t have people from that specific small town, they might just determine you’re broadly Spanish, or European.

Step 3: a computer program takes a best guess at your ancestral makeup

One last step remains: These companies don’t just match you to ancestors; they assign you very specific percentages. So an ancestry DNA test might reveal you are 23 percent European, 24 percent Chinese, and so on.

This is where computer programs come in. “The algorithm says, ‘Let’s try to put ancestors together in different combinations, to get a similar variation [of SNPs] that you have,’” Erlich says.

And it’s imperfect especially in differentiating among ancestries that look very genetically similar.

“There is very little [genetic] variation in Europe,” Erlich says, as an example. People from England, for instance, really don’t look all that genetically different than people from Ireland, he says. So the computer is more likely to make the mistake that a person’s ancestors were English (and not Irish), than to make the mistake that a person is English (and not Taiwanese).

What’s more, the programs have to make some guesses about how far back in time your ancestors in a particular place lived. This also is imperfect, with a range of error.

The computer programs are also sensitive to the small errors built into the genotyping process. And, again, the program’s output depends on the reference DNA the company has in its database.

Remember: DNA ancestry isn’t the same as heritage

Here’s something else that’s important to remember: Ancestry DNA tests don’t tell you where each member on your family tree lived. Instead, they tell you how much of their DNA you’ve inherited.

That’s why siblings can get different reports from DNA ancestry services (even though they share the exact same relatives). “It’s possible that your brother might have inherited a piece of DNA from one of your ancestors that you did not,” Pickrell says.

Recall that you inherit half your DNA from your mom and half your DNA from your dad. But your dad may not pass on to you all the genes he inherited from, for example, the Sardinian side of his family.

As you move further and further back in time on your family tree, “there’s some possibility that you’ve inherited no DNA from one of your ancestors,” Pickrell says. Does that mean you’re not related to that person? No. Does that mean you’re barred from making pierogis with their time-worn recipe? Of course not. They’re still a part of your family tree, and a part of your heritage.

DNA is not the same as heritage. DNA ancestry tests sort your DNA by the geographic regions you likely inherited it from. But not everything about our family histories is geographic.

These tests don’t tell us about the languages our ancestors spoke, the food they ate, or whether they were celebrated or persecuted. They don’t say much about how our ancestors lived or traveled. For instance “Ashkenazi [a.k.a. Eastern European] Jewish populations, who were very migratory, tended to marry within the group,” Pickrell says. 23andMe could match you to an Ashkenazi heritage but maybe not a specific geographic location.

Human history is a messy, migratory affair, much too complicated to track simply using our DNA. And the exact percentages of where our DNA comes from may not matter either. If your sibling inherits slightly more Scandinavian DNA than you, does that make them more Scandinavian? No.

For these reasons, many are uncomfortable with the idea of heritage as something that needs to be corroborated with DNA evidence — or that people belong to a certain ethnic group based on a trivial amount of ancestry. DNA ancestry opens a small door into our past. We can learn things like the fact that many tens of thousands of years ago, humans and Neanderthals mated, though we can only speculate (and in fascinating ways) as to why.

But it’s just a small door, an imperfect guide.

“It’s valuable information, but it’s never going to be, on its own, definitive,” Pickrell says.