Behind the Paper

Unifying clinical data to reveal influenza imprint on immune system

Every exposure leaves an imprint on our immune system. Application of machine learning can identify unique imprints of pathogens or vaccines. In this blog, we share our approach to unify data acquired from individuals undergoing influenza vaccination and to generate the FluPRINT database.

Published in Research Data

Oct 21, 2019

Adriana Tomic

Marie Curie Fellow, University of Oxford

Unifying clinical data to reveal influenza imprint on immune system

Like Liked by Christopher Uchechukwu Sonde

Explore the Research

Similar to how a burglar leaves fingerprints on a crime scene, viruses also leave different imprints in our immune system. To recognize such patterns and to identify clues left behind, we need to collect a variety of measurements of our immune system to capture the parameters specific for the pathogen in question. In recent years, we have witnessed an increase in publicly available data regarding influenza, mostly due to new technological advancements and open data initiatives. However, to obtain a systemic view of the influenza virus imprint, it is necessary to combine datasets across clinical studies.

The goal of our research is quite straightforward—we wanted to apply machine learning to identify patterns in the data gathered at the Human Immune Monitoring Center at the Stanford University to determine why some individuals fail to mount antibody response after influenza vaccination. Well, it seemed quite straightforward in the beginning, but soon, things started to become more and more complicated.

The data were generated using different immune assays and were available in hundreds of different files. Simply taking the collected data and running machine learning algorithms immediately was not possible at all. Data had to be preprocessed, merged, cleaned, standardized and fully integrated before starting with the machine learning process. This process required that we build an automated process to transform data in the database so that they are easily searchable, and therefore usable.

Only by using good quality, standardized and cleaned data can one gain good and useful insights into them. The most important step in obtaining good quality data requires researchers and clinicians to work closely together with informaticians and data scientists before starting the project. There are some issues that we, as researchers, do not think about but that significantly affect data quality. For example, if the same type of vaccine is written differently even in a small way, the computer will treat it as two different vaccines (e.g. Fluzone with capital letter is different from fluzone in lowercase). For this reason, standardization is one of the most important steps when designing a study.

By releasing our database, we want to encourage other researchers to open their data as well, since high quality data that can be utilized “off the shelf” in useful way, without spending months on cleaning and standardization, is rarely available. We, as researchers, have the responsibility to openly share data of high quality since, in most cases, we are the ones who understand such data the most, and it is easy for us to prepare them in high quality. We therefore hope to see more and more standardized data published.

To read more about how FluPRINT database was built, please read our recently published manuscript and check out the website dedicated to the project at fluprint.com. If you are interested to know how you can take advantage of the data, please check out SIMON, our machine learning pipeline and take part in the open source society dedicated to make SIMON free for everyone at genular.com.

Adriana Tomic

Marie Curie Fellow, University of Oxford

Marie Curie Fellow in systems immunology, applying machine learning to understand how vaccines work, specifically focused on human immunology and data-driven research. Co-developer of SIMON, an open source platform for the application of machine learning to biological and clinical data.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data

Research Communities > Community > Research Data

Scientific Data

Scientific Data

A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Data for crop management

This Scientific Data Collection welcomes submissions of Data Descriptors associated with datasets for crop management, which are essential for optimising agricultural productivity, sustainability, and food security.

Publishing Model: Open Access

Deadline: Apr 17, 2026

Explore this Collection

Data to support drug discovery

This Scientific Data collection aims to gather data descriptors on high-quality, reusable datasets relevant to the drug discovery and development process.

Publishing Model: Open Access

Deadline: Apr 22, 2026

Explore this Collection

Bridging the Data Gap in Orthopedic AI: The Story of the PlaTiF Dataset

Behind the Paper

Improving Watershed Models with Tile and Rotation-Enhanced Cropland (TREC) dataset

Opportunities, From the Editors

Call for papers: Datasets for language sciences Collection

Opportunities, From the Editors

Call for papers: Trophic webs Collection

News and Opinion

Highlights from Mathematics, Physical and Applied Sciences Communities   

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Unifying clinical data to reveal influenza imprint on immune system

Share this post

Share with...

...or copy the link