Behind the Paper

Unifying clinical data to reveal influenza imprint on immune system

Every exposure leaves an imprint on our immune system. Application of machine learning can identify unique imprints of pathogens or vaccines. In this blog, we share our approach to unify data acquired from individuals undergoing influenza vaccination and to generate the FluPRINT database.

Published in Research Data

Oct 21, 2019

Adriana Tomic

Marie Curie Fellow, University of Oxford

Unifying clinical data to reveal influenza imprint on immune system

Like Liked by Christopher Uchechukwu Sonde

Explore the Research

Similar to how a burglar leaves fingerprints on a crime scene, viruses also leave different imprints in our immune system. To recognize such patterns and to identify clues left behind, we need to collect a variety of measurements of our immune system to capture the parameters specific for the pathogen in question. In recent years, we have witnessed an increase in publicly available data regarding influenza, mostly due to new technological advancements and open data initiatives. However, to obtain a systemic view of the influenza virus imprint, it is necessary to combine datasets across clinical studies.

The goal of our research is quite straightforward—we wanted to apply machine learning to identify patterns in the data gathered at the Human Immune Monitoring Center at the Stanford University to determine why some individuals fail to mount antibody response after influenza vaccination. Well, it seemed quite straightforward in the beginning, but soon, things started to become more and more complicated.

The data were generated using different immune assays and were available in hundreds of different files. Simply taking the collected data and running machine learning algorithms immediately was not possible at all. Data had to be preprocessed, merged, cleaned, standardized and fully integrated before starting with the machine learning process. This process required that we build an automated process to transform data in the database so that they are easily searchable, and therefore usable.

Only by using good quality, standardized and cleaned data can one gain good and useful insights into them. The most important step in obtaining good quality data requires researchers and clinicians to work closely together with informaticians and data scientists before starting the project. There are some issues that we, as researchers, do not think about but that significantly affect data quality. For example, if the same type of vaccine is written differently even in a small way, the computer will treat it as two different vaccines (e.g. Fluzone with capital letter is different from fluzone in lowercase). For this reason, standardization is one of the most important steps when designing a study.

By releasing our database, we want to encourage other researchers to open their data as well, since high quality data that can be utilized “off the shelf” in useful way, without spending months on cleaning and standardization, is rarely available. We, as researchers, have the responsibility to openly share data of high quality since, in most cases, we are the ones who understand such data the most, and it is easy for us to prepare them in high quality. We therefore hope to see more and more standardized data published.

To read more about how FluPRINT database was built, please read our recently published manuscript and check out the website dedicated to the project at fluprint.com. If you are interested to know how you can take advantage of the data, please check out SIMON, our machine learning pipeline and take part in the open source society dedicated to make SIMON free for everyone at genular.com.

Adriana Tomic

Marie Curie Fellow, University of Oxford

Marie Curie Fellow in systems immunology, applying machine learning to understand how vaccines work, specifically focused on human immunology and data-driven research. Co-developer of SIMON, an open source platform for the application of machine learning to biological and clinical data.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data

Research Communities > Community > Research Data

Scientific Data

Scientific Data

A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Computer vision in plant science and agriculture

This Scientific Data Collection invites Data Descriptors documenting the generation, curation, and validation of datasets that underpin computer vision applications across plant biology, crop science, and agricultural systems.

Publishing Model: Open Access

Deadline: Oct 10, 2026

Explore this Collection

Wearable and Computer Vision Data for Health and Behaviour Research

This Scientific Data collection of articles focuses on data from wearable and non-wearable devices, including data from devices that monitor health and computer vision data.

Publishing Model: Open Access

Deadline: Aug 08, 2026

Explore this Collection

Latest Content

News and Opinion

Institutional Intelligence: Evidence, Judgment and the Capacity to Act

Tomorrow’s Table: Food Systems in the Era of Climate Change

Reading oral cancer’s molecular warning signs without a biopsy

Spacetime Curvature Inside a Stationary Volume Completely Enclosed by a Near-Light-Speed Energy Shell: The Börekci Energy Field Apparatus, the Redesigned Börekci Metric and Antimatter Production

Farmers’ questions changed my research agenda: the case of knowledge sources in regenerative agriculture

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Unifying clinical data to reveal influenza imprint on immune system

Share this post

Share with...

...or copy the link