SARS-CoV-2 is probably one of the most sought-after genome sequences on the planet. This 30,000-letter long sequence is the genome of what we commonly call Covid-19, the genetic instructions needed to build the virus. This genomic data can help scientists track genetic mutations and new variants across the globe and scientists are asking for the data to be shared more freely. Why are other scientists so hesitant to do so?
GISAID (Global Initiative on Sharing Avian Influenza Data) was formed in 2008 and encourages scientists to share genomic data on influenza strains. Thirteen months after the first complete SARS-CoV-2 sequenced genome, 360 thousand sequenced Covid-19 genomes are on the platform. These Covid-19 genomes help detect new mutations of the coronavirus and can help estimate how fast the virus is mutating (around once or twice a month). The UK sequences around 10% of all Covid-positive tests, therefore almost half of all sequenced genomes are uploaded by the UK. These genomes help identify new variants such as the Kent Variant, known as B.1.1.7 within the scientific community.
The openness of SARS-CoV-2 sequence data is crucial for the rapid response against the biggest health threat to humankind in a very, very long time
Rolf Apweiler, co-director of the European Bioinformatics Institute (EBI) says that: “the openness of SARS-CoV-2 sequence data is crucial for the rapid response against the biggest health threat to humankind in a very, very long time.” GISAID, however, does not currently allow sequences to be shared publicly.
After confirming their identity, scientists can use the data provided on the GISAID platform. However, the full study produced from the data cannot be published publicly, which means that studies cannot be peer-reviewed – readers are advised to register their identity on the GISAID platform to see the original data for themselves. Scientists cannot republish the platform’s genomic data without the data provider’s permission; this security feature within GISAID is there for a reason.
Researchers can publish their genomic data on GISAID without the worry of other scientists using their data without permission or credit. For example, in the past, nations such as Indonesia had concerns that if they shared Avian Flu data on data sharing platforms, then pharmaceutical firms could use this data without permission and for no direct Indonesian benefit. The current GISAID platform protects scientists from their work being exploited. In fact, Sebastian Maurer-Stroh, a bioinformatician for Singapore’s Agency for Science, Technology and Research, says that “the reason so many labs have provided SARS-CoV-2 genomes to GISAID is precisely because of the data-access agreement that restricts public resharing”.
We really want to share our data, but it is heart-breaking and demotivating when we know we worked so hard to generate data, but we don’t get the credit for it
A focus on an increasingly open genomic sharing platform is of particular concern for scientists working in low- and middle-income countries. A platform where studies no longer need to ask for permission to use data increases opportunities for wealthy countries to use data without first consulting or even crediting the data source. This might encourage the original data providers to analyse their data first before adding it to the platform or to not even add it at all. Senjuti Saha, microbiologist at the Child Health Foundation in Dhaka says: “We really want to share our data, but it is heart-breaking and demotivating when we know we worked so hard to generate data, but we don’t get the credit for it.”
Relaxing access terms within GISAID does not seem an appropriate way forward, as it is likely to damage confidence within the platform for many scientists who previously felt protected by their rights as data providers. Before Covid-19, GISAID had been a platform which collected important genomic sequences for other influenza strains such as the H7N9 Avian Flu, published by China in March 2013. The head of EBI’s European Nucleotide Archive (ENA), Guy Cochrane, confirms that the EBI is actively looking at ways to allow data to be used more freely, whilst also empowering countries which are most likely to be negatively impacted by a more open data platform.
In the meantime, a letter signed by over 500 scientists is pressuring platforms to “remove barriers that restrain effective data sharing.” The letter encourages scientists to share their genomic data on platforms with unrestricted distribution policies, a collection of databases known collectively as International Nucleotide Sequence Database Collaboration (INSDC).
In the Covid-19 pandemic this may be the best resolution. However, as vaccine researcher Marie-Paule Kieny explains, the letter “seems to me like an initiative from European and high-income countries not fully informed on the critical need to ensure that low-resource countries accept to share sequences freely.”
The hesitation of some scientists to freely publish data is something which requires more thought than a letter encouraging the sharing of data. After all, this pandemic brought worldwide research collaboration to the forefront. Greater sharing of data is desirable in the long term, however as of now we should celebrate the diverse number of countries providing data online, not dissuade them from doing so.