The lot of any epidemiologist

Managing the reporting and validation of metadata together with data sharing was time consuming – which is the unavoidable lot of any epidemiologist.

Published in Research Data

The lot of any epidemiologist
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The format of the database for reporting metadata was defined as a simple Microsoft Excel sheet. Microsoft Excel enables categories for variables to be standardized through its “Data validation” function, which were applied to all variables. Colleagues from different institutions had to download the metadata sheet from the COMPARE share site, report the metadata into the sheet and upload the metadata sheet to the COMPARE share site thereafter. Information was not always added to the same version of the metadata sheet and the format and standardization of categories defined for the different variables were often violated due to the use of different software versions. Many working hours were therefore spent combining metadata into one sheet and validating it. Validation included simple corrections such as formatting every primary source to be spelt identical (because many computer programs distinguish between e.g. upper- and lower case letters), streamline date of sampling because this can be reported in many different formats and ensuring that at least information about time, place and source were reported by the variables: data provider, country of sampling origin, date of sampling, Salmonella serotype, travel information and whether a Salmonella case was travel related or part of an outbreak or not. Managing, sharing and storing data in Microsoft Excel files is not sustainable and risk of mistakenly introducing errors is high. Setting up the metadata in a sustainable and internet-based database format would have simplified the metadata reporting and validation to some extent. Here, sustainable refers to a database in which:

1. information can be added by different partners at the same time

2. the database can combine information added by different partners into one
version automatically

3. dropdown menus of possible categories are linked to every variable in the
metadata

4. information about date is divided into three separate variables: Year, month
and day

5. open text fields are only included if strictly necessary

6. new variables can be added if needed

7. subsets of data can be extracted easily and without having to extract the entire
metadata sheet.

8. allows the user to filter on more than one single variable

9. a unique identifier links metadata and sequence

10. data is easily available for all involved partner

11. notifications are send to partners when the database is updated

Future projects of this type are recommended to incorporate a sustainable and internet-based
database and a data-curator.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data
Research Communities > Community > Research Data

Related Collections

With collections, you can get published faster and increase your visibility.

Clinical informatics

This Scientific Data Collection presents descriptions of a series of datasets for use in clinical informatics fields. Datasets in clinical informatics are vital for improving healthcare quality, efficiency, and patient outcomes.

Publishing Model: Open Access

Deadline: Sep 19, 2025

Genetic markers, variants and recombination data

This Scientific Data Collection contains descriptions of sequence, genetic marker, variant, and recombination data relevant to genetics, population genetics and population genomics.

Publishing Model: Open Access

Deadline: Oct 22, 2025