#SciData19 Writing Competition: Winning Entry #4

We are proud to publish the fourth of this year's four winning entries for this years Better Science through Better Data writing competition - congratulations to Thu (Mi) Nguyen
Published in Research Data
#SciData19 Writing Competition: Winning Entry #4

Read the paper

Eventbrite Eventbrite

Better Science through Better Data 2019

In ‘Better Science through Better Data’ (#scidata19) Springer Nature and The Wellcome Trust partner to bring together researchers to discuss innovative approaches to data sharing, open science, and reproducible research, together with demonstrations of exemplary projects and tools. If you are a researcher, this event will give you the chance to learn more about how research data skills can aid career progression, including how good practice in data sharing can enable you to publish stronger peer-reviewed publications. Tickets for the event have now sold out - but you can register for the live stream to watch our keynote talks as they happen from wherever you are in the world. Keynote speakers Shelley Stall Senior Director, Data Leadership American Geophysical Union (AGU) Shelley Stall is the Senior Director for the American Geophysical Union’s Data Leadership Program. She works with AGU’s members, their organizations, and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research data is managed and valued. Better data management results in better science. Shelley’s diverse experience working as a program and project manager, software architect, database architect, performance and optimization analyst, data product provider, and data integration architect for international communities, both non-profit and commercial, provides her with a core capability to guide development of practical and sustainable data policies and practices ready for adoption and adapting by the broad research community. Shelley’s recent work includes the Enabling FAIR Data project, engaging over 300 stakeholders in the Earth, space, and environmental sciences to make data open and FAIR, targeting the publishing and repository communities to change practices by no longer archiving data in the supplemental information of a paper but instead depositing the data supporting the research into a trusted repository where it can be discovered, managed, and preserved. Her talk is entitled: Your Digital Presence Mikko Tolonen Assistant Professor Faculty of Arts at the University of Helsinki Mikko Tolonen is an assistant professor of Digital Humanities at the University of Helsinki. He is the PI of Helsinki Computational History Group (COMHIS). In 2015-17 he also worked in the National Library of Finland on digitized newspapers as professor of research on digital resources. He is the chair of Digital Humanities in the Nordic Countries (DHN). His current main research focus is on an integrated study of early modern public discourse and knowledge production that combines bibliographic metadata and full-text sources. In 2016, he was awarded an Open Science and Research Award by the Finnish Ministry of Education and Culture. His talk is entitled: Integrating Open Science in the Humanities: the Case of Computational History David Stillwell Lecturer in Big Data Analytics and Quantitative Social Science Judge Business School, University of Cambridge David is Lecturer in Big Data Analytics and Quantitative Social Science at Cambridge University’s Judge Business School. David’s research uses big data to understand psychology. He published papers using social media data from millions of consenting individuals to show that the computer can predict a user’s personality as accurately as their spouse can. This research has important public policy implications. How should consumers’ data be used to target them? Should regulators step in, and if so how? David has spoken at workshops at the EU Parliament and to UK government regulators. David has also published research using various big data sources such as from credit card data and textual data to show that spending money on products that match one’s personality leads to greater life satisfaction, that people tend to date others whose personality is similar, and that people who swear seem to be more honest. His talk is entitled: Getting Big Data: Social scientists must strive to be autonomous from corporate charity. Tomas Knapen Assistant Professor Vrije Universiteit Amsterdam - Cognitive Psychology Tomas is a cognitive neuroscientist whose research focuses on the role sensory topographies (visual retinotopy, auditory tonotopy and bodily somatotopy) play in the detailed organization of the human brain and cognition. For this work, Tomas uses state of the art 7-Tesla MRI techniques. Early-career experiences where he ‘failed to replicate’ previous findings have impressed upon him the need to make research reproducible from top to bottom. Because of this, his lab uses only open methods and puts all their data and methods online. Having invested in these methods, Tomas is convinced that, in the end, it is not a burden to perform open science, rather it provides researchers with great opportunities for ground-breaking science. His talk is entitled: How I learned to stop worrying and love Open Science See the event programme. Meet the Programme Committee. Register for the live stream.

Question: How should Findable, Accessible, Interoperable and Reusable (FAIR) data work in practice?


Thu Nguyen - University of Illinois

The concept of FAIR (findability, accessibility, interoperability, and reusability) data was first introduced in 2016 [1], but its popularity has been slow growing. The main challenges for FAIR data in practice, in my opinion, include the amount public knowledge on this topic and the idea of interoperability. Jon Brock in Nature Index, commented on the State of Open Data survey and that only about 19.2% of all researchers are familiar and about 30.7% have only heard of FAIR principles. As the idea of FAIR aims to incorporate and integrate data from different fields of research, the implementation process cannot be done efficiently by a small group of people. The more people are familiar with this idea, the better they are willing to try, and the more people can contribute their effort into implementing FAIR. Additionally, significant effort is needed in defining “interoperable.” 42% of responders from the same survey above said the I in FAIR is unclear to them. Does interoperable mean to be for all fields of research to be able to cross talk? How do we pick a common data type for all research fields? Just within the field of mass spectrometry, different vendors already have different data type and software, which most of the time do not crosstalk. I imagine it would be hard and is quite unrealistic for all scientists to come to consensus. Understanding each data type from different subfield can be a challenge in itself, which brings up a different-yet-related challenge of the implementation of FAIR faces. Kate LeMay, senior research data specialist at the Australian Research Data Commons field of science, commented through an interview with Nature Index that “different culture and requirement for data and metadata.” It was noted, on the other hand, by Wilkinson et. al. from the original article, that it is unsustainable to create a computer parser for all data types. Given how far the current situation is from the ideal machine interoperable future, uniting all different science fields, it almost sounds impossible. However, maybe more realistic for interoperability in each subfield of research. Given how many new data repositories are being created, I think many of them are close to achieve FAIR criteria. Take the field of mass spectrometry on natural products for example. An accessible, interoperable, and reusable database would be the Global Natural Products Social Networking (GNPS) which functions as an open-access tandem mass spectrometry data to natural product scientists. Information from various laboratories, various studies, and targeted subjects can all be incorporated for continuous identification of published compounds. Additionally, an entire study dataset can be published with the Mass Spectrometry Interactive Virtual Environment (MassIVE) data repository, which allows users to browse and reanalyze published datasets. It would be hard to change the language or data file of this whole field, so that it is compatible with data from microarray or NGS. In fact, I would argue that it is not useful. So instead of putting time into advocating for a common data file between all research, we can focus on bringing each one closer to the ideal FAIR repository in their own way. Then, related fields, can find a common way to communicate their finding. For instance, genomics and proteomics can communicate through gene ontology pathway analysis. In conclusion, to implement FAIR data, I would keep the discussion about these requirements at conferences, which would then create the chance for each research society to achieve their own FAIR repository, or come together to solve the problem with interoperable data. 

1. Landry, Jonathan J M et al. “The genomic and transcriptomic landscape of a HeLa cell line.” G3 (Bethesda, Md.) vol. 3,8 1213-24. 7 Aug. 2013, doi:10.1534/g3.113.005777 

2. Wilkinson, M.D., et al., The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 2016. 3: p. 160018.

Don't forget to register for the live stream of Better Science through Better Data 

Meet the other writing competition winners here.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in