The lot of any epidemiologist

Managing the reporting and validation of metadata together with data sharing was time consuming – which is the unavoidable lot of any epidemiologist.
Published in Research Data
The lot of any epidemiologist
Like

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The format of the database for reporting metadata was defined as a simple Microsoft Excel sheet. Microsoft Excel enables categories for variables to be standardized through its “Data validation” function, which were applied to all variables. Colleagues from different institutions had to download the metadata sheet from the COMPARE share site, report the metadata into the sheet and upload the metadata sheet to the COMPARE share site thereafter. Information was not always added to the same version of the metadata sheet and the format and standardization of categories defined for the different variables were often violated due to the use of different software versions. Many working hours were therefore spent combining metadata into one sheet and validating it. Validation included simple corrections such as formatting every primary source to be spelt identical (because many computer programs distinguish between e.g. upper- and lower case letters), streamline date of sampling because this can be reported in many different formats and ensuring that at least information about time, place and source were reported by the variables: data provider, country of sampling origin, date of sampling, Salmonella serotype, travel information and whether a Salmonella case was travel related or part of an outbreak or not. Managing, sharing and storing data in Microsoft Excel files is not sustainable and risk of mistakenly introducing errors is high. Setting up the metadata in a sustainable and internet-based database format would have simplified the metadata reporting and validation to some extent. Here, sustainable refers to a database in which:

1. information can be added by different partners at the same time

2. the database can combine information added by different partners into one
version automatically

3. dropdown menus of possible categories are linked to every variable in the
metadata

4. information about date is divided into three separate variables: Year, month
and day

5. open text fields are only included if strictly necessary

6. new variables can be added if needed

7. subsets of data can be extracted easily and without having to extract the entire
metadata sheet.

8. allows the user to filter on more than one single variable

9. a unique identifier links metadata and sequence

10. data is easily available for all involved partner

11. notifications are send to partners when the database is updated

Future projects of this type are recommended to incorporate a sustainable and internet-based
database and a data-curator.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Research Data
Research Communities > Community > Research Data

Related Collections

With collections, you can get published faster and increase your visibility.

Ecological data for tracking biological diversity and environmental change

This collection presents data contributions addressing topics in biodiversity and ecology.

Publishing Model: Open Access

Deadline: Jan 31, 2024

Medical imaging data for digital diagnostics

This Collection presents a series of articles describing annotated datasets of medical images and video. All medical specialities are considered and data can be derived from study participants, tissue samples, electronic health records (EHRs) or other sources.

Publishing Model: Open Access

Deadline: Dec 20, 2023