The coronavirus pandemic postponed Stephanie Goya’s plan to finally defend her dissertation and complete her PhD in genomics. Like many scientists around the world, Goya, a virologist in Buenos Aires, quickly changed course to focus on the new coronavirus.
“A lot of hands were needed, so with my expertise in viral genomics, I could help with different projects,” said Goya, who works for Argentina’s Dr. Ricardo Gutiérrez Children's Hospital. “I love it. I love to help society to bring expertise in something helpful, and this is the most helpful work I have ever done.”
The pandemic’s deadly grip has sparked a global race to understand how the virus is evolving and spreading — and the clues are in its genetic code. Goya is now part of a worldwide network of thousands of scientists trying to map and understand the genomic makeup of SARS-CoV-2, the scientific name for the new coronavirus, in near real time. They’re drilling into the virus’s structure to uncover clues about how it works, how it spreads, and ultimately, how it can be treated.
There is a lot to learn because this pathogen is new in humans. What excites Goya and other scientists is the unprecedented level of information scientists are sharing at a rapid pace. It’s all possible through advances in genomic sequencing technology and improvements in the scientific culture of sharing.
Their work is helped along by online initiatives such as GISAID that facilitates the analysis and exchange of information. More than 20,000 people around the globe are registered to use its data.
“It’s a beginning of a new era.”
“It’s a beginning of a new era,” Goya said.
SARS-CoV-2 is complex. It contains a code of nearly 30,000 letters that represent the tiny structural units, or nucleotides, that make up the genome. Newer sequencing technology and algorithms have enabled the coding of virus samples in a matter of days and even hours, whereas in the past it would have taken weeks.
Scientists have used full genomic sequencing to understand and respond to other outbreaks, most recently Ebola. But never before has it been used at this speed and scale.
Critical to comprehending the nature of this virus is scientists’ willingness and ability to share information from the start, as opposed to delaying the release of data until the full publication of its analysis, which can take months or years.
Related: COVID-19: The latest from The World
One of Africa’s leading scientists, Christian Happi, is heading the effort to map the genome of the new coronavirus across the continent. He directs the African Center of Excellence for Genomics of Infectious Diseases at Redeemer’s University in Nigeria and had helped sequence the Ebola genome — then used sequencing technology to track its spread. In February, Happi got a sample of SARS-CoV-2 from a patient in his lab in Ede. He immediately got to work and shared the results.
“We had the whole genome of the virus lined up, the whole genetic map. That was unprecedented because we were able to do it in 48 hours.”
“We had the whole genome of the virus lined up, the whole genetic map,” Happi said. “That was unprecedented because we were able to do it in 48 hours.”
Tracking how the virus spreads
Halfway across the world in New York City, geneticist Harm van Bakel has been racing to map the new coronavirus, too. The lab he runs at the Icahn School of Medicine at Mount Sinai is collecting samples from infected patients across New York City and countries that don’t yet have the lab capacity.
“Given the number of samples we’re currently processing, we sequence maybe 100 viruses every two to three days,” van Bakel said.
Van Bakel normally studies the spread of other pathogens such as seasonal influenza. He stressed that the speed at which researchers are able to sequence and understand so many samples of the new coronavirus allows them to track its transmission. That’s because as the virus spreads, “it accumulates small changes in its genetic code,” van Bakel said.
These changes occur because when a virus infects someone new, it makes lots of copies, creating new virus particles. The machinery that does this isn’t perfect. It can make small mistakes as it replicates. Those mistakes — or mutations — give each virus its own unique tag, like a scratch on a car.
“It doesn’t necessarily impact how the car functions, but it allows you to differentiate one particular car from a different car of the same type,” van Bakel said.
These scratches help scientists identify the path of this virus, while also tracking whether any of those changes impact the virus’s behavior, which scientists continue to monitor.
When pieced together through this global sharing of sequencing information, Happi has been able to see how the virus spread to Nigeria from China. Van Bakel was able to glean that the virus in New York appeared very similar to the one that was circulating in Europe.
“And what that tells us in return is that as the virus spread from Asia, it didn’t come directly to New York — but rather, it took a detour through Europe.”
“And what that tells us in return is that as the virus spread from Asia, it didn’t come directly to New York — but rather, it took a detour through Europe,” van Bakel said.
Data generated by scientists like Happi and van Bakel is helping other researchers understand where variations of the coronavirus have spread around the world. That piece of the puzzle could help policymakers respond to new outbreaks.
Emma Hodcroft, a molecular epidemiologist at the University of Basel in Switzerland and Nextstrain, has been downloading that data to create a kind of global map of the virus called a phylogenetic tree.
The branches represent evolutionary relationships of the virus. The whole map currently includes more than 10,000 sequences of the new coronavirus.
“So, if we can find out what were the dangers beforehand, how did this virus spread effectively between different states or between different cities, we can keep an eye on that as we come out of lockdown to make sure that we don't give the virus that advantage when we try and start re-allowing movement and reopening shops and this kind of thing,” Hodcroft said.
A growing culture of sharing
Being able to source and analyze all this data is no small feat: It requires a credible system for sharing this information and scientists who are willing to participate. Several platforms now exist — such as Genbank, EMBL-EBI, and a global consortium, the International Nucleotide Sequence Database Collaboration. One of the main public-private initiatives that Hodcroft and others take part in is GISAID, the nonprofit Global Initiative on Sharing All Influenza Data.
With scientific advisers across the world, GISAID was already a well-oiled system when the coronavirus hit.
“It's an exponential growth that is staggering.”
“We were called earlier this year, the first week of January, by our partners in China and various public health laboratories to see if we could assist with the sharing of a newly emerging coronavirus,” said GISAID’s founder, Peter Bogner. “It's an exponential growth that is staggering.”
Anyone can access GISAID, so long as they register and agree to credit the scientist whose data in any resulting research. Bogner said those conditions helped relieve tension among scientists who may have been reluctant to share data prepublication because “they were worried about being scooped.”
The initiative has existed since 2008. It emerged from a system of labs around the world that track and share genetic data for flu viruses. GISAID is a critical source for identifying strains for developing annual flu vaccines.
Global health hinges on this kind of collaboration. But there are major gaps: Not all countries have the capacity to collect and sequence the new coronavirus, which leads to blind spots in tracing it.
For example, it wasn’t until late April that researchers in Argentina had the necessary ingredients to sequence and share the first genomes of the virus there, said Goya, the Buenos Aires virologist. Bogner said GISAID is working with scientists in Tehran, to help the country begin sequencing the genome and sharing it.
For Happi, the scientist in Nigeria, another question looms: Who benefits when these sequences lead to an effective vaccine or treatment?
“The companies that are developing tools and diagnostics and vaccines should understand that because we shared the data that we should share in terms of the benefit,” he said.
Happi worries that communities vital to effort may not have equal access to lifesaving treatment once genomic data is used to successfully develop it. Or that the treatment might be too expensive. Scientists and policymakers haven’t solved that problem — at least, not yet.