(ORDO NEWS) — A lot of conspiracy theories about the origin of SARS-CoV-2 have accumulated over two and a half years. Some of them did not go beyond rumors and local chats, others made it to the US Senate.
But none of the theories of the laboratory origin of the new coronavirus has yet received recognition in the scientific community. Recently, one German and two American biologists have formulated another proof of the man-made virus.
At the request of N + 1, biochemist Georgy Kurakin, an associate member of the Royal Biological Society, talks about how they tried to prove the laboratory origin of SARS-CoV-2 this time through statistics – and why this attempt did not convince scientists.
Although it is believed that SARS-CoV-2 came to humans from bats, it has not yet been possible to catch that fatal bat. We don’t even know who caught the new disease first (read more about the search for “patient zero” in the material “Gifts of Love”).
Therefore, some politicians are still investigating in the hope of catching the hand of the hapless laboratory assistant – or, conversely, the evil genius – who created a new virus and allowed him to escape to freedom.
And virologists are peering into the only reliable source of information about the origin of the coronavirus – its genome.
From time to time, individual scientists announce that they have finally discerned traces of laboratory manipulations in it (read about previous statements of this kind in the material “You yourself are artificial”). But on closer examination, all these suspicions turned out to be groundless.
What are they looking for
At the first stages, they tried to look for the most obvious sign of the laboratory origin of SARS-CoV-2 – the so-called “gluing”.
If it suddenly turned out that the genome of a virus consists of pieces of viruses that are unrelated to each other, which they could hardly exchange in natural conditions, then this would be a strong argument in favor of the fact that a chimera created in the laboratory escaped into the world.
The new coronavirus doesn’t look “glued” in the lab – the differences from known viruses are scattered throughout its genome. Another thing is that a small piece of it could be glued together – the S-protein gene, by which the virus binds to the ACE2 receptor on the host cell.
The nucleotide sequence that corresponds to the S-protein binding site differs from its homologue in the closest relative RaTG13 (bat coronavirus) more than the rest of the genome.
But in itself this fact does not say anything about the origin. Differences can also arise during natural recombination between viral genomes, which occurs very often.
At the beginning of the pandemic, virologists suggested that the S-protein gene could have come from SARS-CoV-2 from the Java pangolin coronavirus. But when scientists compared the S-protein sequences of different viruses and built a phylogenetic tree for them, they saw no signs of recent recombination.
So, if it was, then a long time ago – that is, the common ancestor of SARS-CoV-2 and the pangolin virus received this site. And this ancestor exchanged genes with the great-grandfather of the first SARS-CoV.
So the version that someone purposefully pasted a gene section from the pangolin virus into the bat coronavirus turned out to be a dummy – the site is similar, but not the same, and its pedigree is lost in the past.
When the junction could not be found, suspicions fell on the furin site, a short section in the S protein that is cleaved by the protease furin. The presence of this site makes the coronavirus more pathogenic.
The closest relatives of SARS-CoV-2 do not have this site – this allowed supporters of the laboratory theory to present the furin site as a sign of interference in the virus genome.
But even then it was known that many other wild coronaviruses have similar furin sites. And that they arose many times independently – so that SARS-CoV-2 could well acquire a furin site without outside help.
Now vigilant analysts of viral genomes have come up with a new argument in favor of the “laboratory” nature of the coronavirus.
What have you found now
One of the main tools used by genetic engineers is restriction enzymes (they are also restriction endonucleases). This is a large group of enzymes that are isolated from bacteria and are able to cleave DNA in a strictly defined place.
Each restriction enzyme can bind to DNA only where it sees a specific short sequence of nucleotides (4–8 pieces). It is called a restriction site. The restriction enzyme sits on it and cuts the DNA strand either in the site itself or to the side of it.
And it cuts so that one of the two DNA strands is a little longer. And with this longer end, the thread can stick to some other DNA molecule – if the same restriction enzyme has worked on it.
Molecular geneticists use restriction enzymes if they need to, for example, insert a new piece into the chain or replace one fragment with another. In this case, two different endonucleases are usually used in each experiment: one makes a cut on one side of the gene, the other on the other.
After them, different sticky ends remain, which helps to correctly connect the cut gene with other pieces of DNA.
If someone were to genetically manipulate SARS-CoV-2, they would probably also use restrictases. But these enzymes, alas, do not leave any traces of their work on DNA. But you can look at the restriction sites in the virus genome.
By themselves, restriction sites cannot be considered a sign of man-made organism. They are in any genome, because bacteria protect themselves from viruses by cutting out suspicious sequences from themselves – and they use hundreds of different enzymes to do this.
There are now over 800 different restriction enzymes available to molecular biologists, so there are plenty of places in your genome and in your cat’s genome that some of the molecular scissors can recognize and cut. But that doesn’t mean your cat has escaped from the lab. And, moreover, you.
The authors of the new preprint (that is, the unpeered scientific article) said that the location of the restriction sites – which they dubbed the “endonuclease fingerprint” – in the coronavirus genome allegedly indicates that this genome has been edited.
In general, this term already exists in molecular genetics, but it means something completely different. “Endonuclease fingerprinting” is a method in which the presence or absence of restriction sites helps to detect mutations in the DNA of organisms (here are examples: one, the other).
This method has nothing to do with the bioinformatic analysis of the genome and the search for traces of “artificiality”. Apparently, the authors of the work either came up with a new term, or used the old one not quite correctly.
Either way, they calculated that the sites for one particular pair of endonucleases (BsaI/BsmBI) in the genome of the new coronavirus are “more regular” than would be expected based on statistics from other coronaviruses.
According to the authors, molecular geneticists, when creating a new virus, would necessarily operate on relatively short sections of DNA – and the sites for cutting restrictases would be distributed more evenly than in “wild” viruses.
The researchers plotted the length of the largest restriction site interval versus the total number of such intervals. It turned out that according to this parameter, SARS-CoV-2 really stands out from the crowd.
In total, it found five sites for BsaI / BsmBI, which divide the genome of the new coronavirus into approximately equal sections – which, according to the authors of the article, are suspiciously short.
Against the background of other coronaviruses, SARS-CoV-2 really stands out. But before drawing far-reaching conclusions from this, it would be good to make sure that the anomaly is really significant. And here questions arise to the methodology of the article.
First of all: why BsaI/BsmBI? Molecular biologists have more than 800 restrictases in their arsenal. The choice of a specific tool depends on exactly where you need to cut and glue the genome, which genes to insert into it.
It is not clear why it is worth paying attention to these two enzymes. Moreover, if you look at some other couple, there is a great opportunity to see a completely different picture.
Imagine that you have five coins in your hands and you toss them several hundred times. Probably one day they will form some recognizable shape: for example, four of them will fall in a rectangle, and the fifth will lie somewhere in its center. And you will not perceive it as an anomaly – you just made many attempts.
Each pair of restriction endonucleases is also an “attempt”, which could give a completely different location of restriction sites.
But the authors of the work, for some reason, devoted all their attention to the one that gives an abnormally accurate picture. This statistical error is called selective data representation, or cherry picking.
By the way, Denialists and conspiracy theorists of all stripes have repeatedly come across this, from creationists to climate change deniers: the former clutched at point anomalies of geological layers, the latter at short-term changes in the trends of the average annual temperature.
But even assuming that this anomaly is statistically significant, does it prove the laboratory origin of the coronavirus? This could be the case if the hypothetical authors were engaged in the creation of a virus with a neat genome. But malevolent biologists would behave differently and seek to increase virulence.
To do this, one would have to shuffle the genes or change their important parts – and the genes in all organisms are located without much regularity, which is rather ugly.
And besides, they have different lengths. And the regions important for virulence can even be very short, so it is problematic to spread them evenly throughout the genome.
As a result, the distribution of restriction sites in a man-made virus can be different, depending on the length and number of DNA segments inserted into it – and it does not have to look neat at all.
Therefore, many scientists believe that it is useless to look for neatly placed restriction sites to prove laboratory intervention.
Professor Benjamin Neumann of Texas A&M University even likens this kind of search to numerology: “It’s like you convert the genome sequence into numbers, find the sum of those numbers, and compare it to the number of the Beast.”
Finally: are the authors of the work sure that other viruses do not carry such anomalies in their genome? Judging by the graphs from the preprint, some natural coronaviruses, which the authors themselves do not suspect, turned out to be even more “artificial” than SARS-CoV-2 according to the same metric.
Then, in a good way, it was necessary to raise the question of their origin (however, in this case, it would be necessary to solve the difficult question of cut-off points: after what value should a virus be considered artificial?).
In addition, all five suspicious restriction sites have homologues in evolutionarily related coronaviruses. “If you try to reconstruct the common ancestor of SARS-CoV-2 and [its] closest relatives (which they didn’t seem to do in the original work), it turns out that he also had all five sites”.
This means that the coronavirus could also acquire its neatly located restriction sites without outside help, simply by inheriting them from its ancestor.
The authors of the preprint are resurrecting the gluing hypothesis in a new guise – they are only looking in the virus genome not for glued details, but for places where they could be placed. But at the same time they do not explain what, in fact, was glued together.
Artificially assembled genomes can be identified phylogenetically – different sections turn out to be descended from different viruses. Moreover, most likely, genetic engineers would take some known viruses, and the coincidence of individual sections of an artificial virus with them would be one hundred percent.
But almost all parts of the SARS-CoV-2 genome show maximum (though not complete!) identity with the same bat coronavirus, RaTG13. The exception is the already mentioned binding site of the S-protein, where recombination could just happen.
But the authors of the preprint do not look for any traces of editing in this controversial area, focusing instead on the uniform distribution of restriction sites throughout the genome. “In general, I think that evolutionary arguments are also against the original work,” sums up Bazykin.
This preprint, like many others, caused a slight stir in the scientific community – other virologists began to recalculate and double-check its results. But even he, most likely, will share the fate of his conspiracy predecessors – and will not be accepted for publication by any prestigious journal.
This happened earlier, for example, with a sensational preprint about insertions in the genome of a new coronavirus, allegedly originating from HIV-1. After disputes and reviews, the authors even withdrew it from bioRxiv themselves.
The new preprint is still in place – but the scientific community has already made its verdict. As microbiologist Alex Krits-Christoph put it, “There are a lot of things in science that we say are ‘wrong’, but this preprint is just a Misconception.”
And attempts to find something suspicious in the coronavirus genome will certainly not end there. There will be new works – and you can start to guess what the next theory about man-made coronavirus will be based on. We are already mentally preparing to disassemble it. Stay on the line!
Contact us: [email protected]