Life in Research

Minimizing wheel reinvention: creating an open data culture via infrastructure in India

Last month the Global Biodata Coalition and the University of Delhi jointly organized the 1st Indo-GBC virtual seminar, titled "Data Sharing at a Global Level: Evolving perspectives amidst challenges".

Published in Research Data

Nov 24, 2020

Varsha Khodiyar, Ph.D

Minimizing wheel reinvention: creating an open data culture via infrastructure in India

Liked by Grace Baynes and 1 other

The Data Sharing at a Global Level: Evolving perspectives amidst challenges seminar was organized by Saurabh Raghuvanshi (Associate Professor, University of Delhi) and Chuck Cook (Program Manager, Global Biodata Coalition). The aims of this event were twofold:

to discuss global data conservation and sharing models with a view to developing a robust data sharing ecosystem at a national level for India.
to encourage participation by Indian funding agencies in international coordinating activities such as the GBC.

Introducing the seminar, Raghuvanshi noted that India is emerging as one of the largest global producers of life science data consequently extensive efforts are underway to develop a robust life science data collection, interpretation and curation framework across the nation. Concerns about data curation and data stewardship capacity were reiterated several times during the course of the seminar.

In her message to seminar attendees Renu Swarup (Secretary, Department of Biotechnology, Govt. of India), noted the active efforts of Indian funding agencies in developing a robust ecosystem for efficient storage and sharing of life science data generated in India. She also highlighted the need for national efforts to sync with global agency efforts on research data sharing.

Eric Green kicked off the seminar with an introduction and overview of the Global Biodata Coalition (GBC). The GBC is a coalition of funding agencies with the explicit aim of coordinating funding for biodata (life science and medical science data) repositories, to ensure the longevity and sustainability of the core resources which underpin much of life sciences and biomedical research. Green spoke of the need to act across borders, as although research data are generated nationally they are used internationally. The GBC’s main aim is to encourage research funders to work together on tackling data science challenges. Green noted that there are around 3,000 biodata resources globally, and of these around 100 of these are considered ‘core’. These biodata resources are associated with a budget of around $500m. Biodata resources are highly interconnected which is ideal for data discovery, but also means that they are susceptible to issues caused by failure of weak links.

Alongside the exponential growth in biodata generation, we are seeing an increase in the emergence of open data policies from publishers, funders and other research stakeholders. Green described how this is increasing the demands on biodata repositories, and how current funding of these resources is fragmentary, fragile and haphazard. Green described how poor international coordination is leading to duplication, waste and lack of sustainability planning for biodata resources. There is also the growing threat of biodata resources retreating behind subscription firewalls for sustainability, with The Arabidopsis Information Resource mentioned as an example of this.

Niklas Blomberg presented an overview of ELIXIR, which brings together European biodata resources. ELIXIR is an example of an internationally collaborative approach for data resources, which brings together 23 nodes, 55 commissioned projects and 397 teams. Blomberg noted the philosophy behind ELIXIR as being “the rising tide lifts all the boats”. The ELIXIR project has identified a set of core resources, which are of fundamental importance to the broader life science community, and act for the long-term preservation of biological data. These core resources are considered to be fundamental research infrastructure, and the long-term sustainability of the resources is therefore vital for bioscience research. The intention is for the core resources to be funded differently to the usual grant-based academic projects. In order to be considered as a core resource, the key requirement is for the data to be completely open to use by all. Blomberg showed that the top three countries accessing EMBL-EBI resources are the USA, China and India, and that the majority of these resources represent international collaborations.

Blomberg described some of the international collaborations which have been established in the bioscience discipline, for example the International Nucleotide Sequence Database Collaboration (INSDC) which has been key in developing and implementing regular data exchange, common data standards, and open and unrestricted access to data on an international level. Blomberg called for Indian funding agencies to participate in these types of scientific collaborations, which are not research projects but research infrastructure initiatives. Blomberg shared his experience that funding bodies need to take decadal view in considering investment and governance for data infrastructure. Blomberg’s view is that the complexity of these data and the accompanying requirement for skilled data professionals mean that a national focus for biodata resources is unlikely to be sustainable over the longer term.

I was invited to speak about the importance of research data repositories and how the use of repositories can enable data to be FAIR (findable, accessible, interoperable and reusable). There are essentially two types of repositories, those which are discipline-specific and those which are generalists. Discipline-specific repositories are the ideal location for data, as these repositories usually implement data standards, data tools and data visualizations best suited for their holdings. Discipline-specific repositories also are more likely to be staffed by specialists for that disciplinary area, who are able to provide technical guidance for and validation of, deposited datasets. All of this means that discipline-specific repositories are able to maximize the findability, accessibility, interoperability and reusability of their data holdings. For data types and disciplines which do not have specialist repositories, generalist repositories are vital for enabling researchers to ensure their data are maximally findable and accessible.

In the panel discussion, Anurag Agarwal (CSIR-Institute of Genomics and Integrative Biology) discussed the challenges of sharing human-derived data in an ethical way, while, Akhilesh K. Tyagi (University of Delhi) highlighted need for controlled access repositories in view of emerging biodiversity-related bio-economy issues. The need for policies and data access frameworks developed by a range of Indian stakeholders (public and private) was also discussed.

India is taking important steps to embed an open data culture for its research community, and pulling in international expertise to minimize wheel reinvention is certainly a sensible way forward.

Photo by Shalvi Raj on Unsplash

Varsha Khodiyar, Ph.D

As part of the Research Data team working on research data publishing initiatives at Springer Nature, Varsha leads the curation team at Springer Nature, and contributes to the design, development and delivery of Springer Nature’s research data training workshops. She is also responsible for curating and maintaining the Scientific Data and Springer Nature recommended repository lists. Varsha is an Executive Advisor of FAIRsharing.org, a member of CODATA’s International Data Policy committee, programme chair for the Better Research through Better Data conference series, and a co-author of the TRUST principles.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Nagendra Kumar Singh

over 5 years ago

Controlled data sharing is very important for innovation.

Follow the Topic

Research Data

Research Communities > Community > Research Data

What are the FAIR data principles, and how can they benefit you?

News and Opinion

The basics of data citation

Life in Research

Looking back at Better Research through Better Data 2020

News and Opinion

Data Repository Selection - Request for Comments

News and Opinion

Is China ready for Open Data?

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Minimizing wheel reinvention: creating an open data culture via infrastructure in India

Share this post

Share with...

...or copy the link