(ORDO NEWS) — In June 2000, two rival teams of researchers shook hands as a token of their shared success with a major milestone in biology, the blueprint of the human genome.
What started as an incomplete map of our chromosomes has since grown into a vast trove of individual sequences from all over the globe and in many cases far into the past.
Somewhere in this ocean of deciphered DNA is the history of our common humanity.
Unfortunately, it’s easier to read than to do. The problem is not only the huge amount of data, but also the subtle differences between samples, different formats and analysis methods that give priority to different types of errors – all this creates obstacles to a unified interpretation.
Now researchers at the Big Data Institute (BDI) at the University of Oxford in the UK have taken a significant step forward by merging a forest of over 3,600 individual sequences from 215 populations into one huge tree.
The branches of the tree are made up of a mind-boggling 231 million ancestral lines. At its base are roots represented by eight ancient, highly detailed sequences of the human genome, thousands of small fragments of which confirm their place in the deep past.
Among them are three Neanderthal genomes, one Denisovan genome and a small family that lived in Siberia more than four thousand years ago.
Essentially, we reconstruct the genomes of our ancestors and use them to form a series of related evolutionary trees, which we call “tree sequences,” says geneticist Anthony Wilder Wons, who led the study while doing his PhD at BDI.
“Then we can estimate when and where those ancestors lived.”
Their tree-sequence method uses what is known as a concise data structure, a computational concept that aims to present data in an optimal amount, which also limits the amount of time needed to examine it with questions.
We could apply a similar mindset when saving files on our own computer, balancing between compressing documents and cramming them into long lists of folders, or just keeping everything on the desktop.
In this particular case, the sequence of trees finds relationships between different branches of the tree in order to facilitate the study of large amounts of information.
By turning data into graphs with nodes representing different lineages and displaying mutations at the edges, massive genetic databases can not only be squeezed into a relatively small space, but they can also be easily accessed by algorithms designed to find interesting statistics.
“The strength of our approach is that it makes very few assumptions about the underlying data and can include both modern and ancient DNA samples,” says Vons, who describes his work in the video below.
Including labels in the geographic location of the sequences allowed the team to estimate where certain common ancestors might once have lived and how they moved.
This not only reveals events we already suspect, such as how human populations migrated out of Africa, but also hints at changes in population density in ancestral groups that we are yet to learn about, such as the Denisovans.
Thanks to the efficiency of this process, the already impressive tree has plenty of room to grow as more genetic data becomes available in the future.
The addition of millions of new genomes will make any further results more accurate, pinpointing the new sequence’s place in a genealogy that stretches all over the world.
“This genealogy allows us to see how each individual’s genetic sequence is related to each other, across all points of the genome,” says BDI evolutionary geneticist Ian Wong.
More broadly, there is no reason why the same approach could not be applied to other species, which may one day help create a global tapestry of life on Earth.
“While the focus of this study is on humans, this method is applicable to most living creatures, from orangutans to bacteria,” says Wons.
“It can be especially useful in medical genetics to separate true associations between genetic regions and diseases from contrived associations arising from the common history of our ancestors.”
Contact us: [email protected]