A community-based protocol for the statistical analysis of non-targeted metabolomics data

Non-targeted metabolomics is rapidly advancing towards the goal of characterizing the vast array of small molecules that play critical roles in biological systems. However, the complexity of data from such experiments presents significant challenges, particularly for less experienced researchers.
Published in Chemistry and Protocols & Methods
A community-based protocol for the statistical analysis of non-targeted metabolomics data
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Non-targeted metabolomics is distinguished from its targeted counterpart by its exploratory nature, which aims to capture the entire spectrum of small molecules present in a sample. This approach typically generates large, complex datasets that require sophisticated analysis tools to identify and interpret the relevant chemical signatures that reflect underlying biological processes. Several tools and platforms have been developed to aid in this process, notably Feature-based Molecular Networking (FBMN) within the Global Natural Products Social Molecular Networking (GNPS) metabolomics cloud ecosystem (https://gnps2.org/). FBMN has become a cornerstone in metabolomics research, enabling researchers to annotate and connect features across samples. However, the subsequent statistical analysis of these features has remained a significant roadblock, particularly for those who are not experts in computational methods. The fragmented nature of available tools, scattered across different platforms and requiring customized scripts, adds to the challenge, especially for newcomers to the field. The need for a comprehensive, user-friendly guide that integrates multiple statistical approaches into a cohesive analysis pipeline became increasingly apparent.

To address these challenges, we developed a detailed protocol that guides researchers through the entire process of analyzing FBMN results. This protocol, designed to be an end-to-end solution, begins with feature detection and continues through data clean-up, statistical analysis, and spectrum annotation. By providing ready-made code for the popular statistical platforms R and Python, as well as a graphical user interface (GUI), we aimed to make the tool kit accessible to a wide range of users. The protocol is fully integrated with FBMN, and the input files can be directly loaded from GNPS , ensuring seamless workflow compatibility. For users who prefer a more interactive approach, we developed a web application with a GUI, available both online (https://fbmn-statsguide.gnps2.org/) and as a downloadable application (https://www.functional-metabolomics.com/resources). This tool is designed not only for experienced researchers but also for educational purposes, making it an ideal resource for students and early-career scientists.

This protocol was developed with the support of the Virtual Multiomics Lab (VMOL), a community-driven, open-access virtual laboratory (https://vmol.org/). Initiated in 2022, this project aims to democratize access to non-targeted metabolomics analysis strategies, workflows, and expertise, making computational mass spectrometry accessible to researchers worldwide, regardless of their background or resources.

The Role of Virtual Labs in Democratizing Computational  Metabolomics

The development of this protocol was initiated during a summer school for non-targeted metabolomics at the University of Tuebingen in 2022, for which we had developed a series of R notebooks for the statistical analysis of metabolomics results. During this summer school we further launched a virtual working group, which we called the the Virtual Multiomics Lab. VMOL is an open initiative that seeks to break down the barriers to scientific collaboration and education, in which this protocol and ultimately this paper were further developed.

VMOL is open to everyone and connects computational biologists, chemists, and bioinformaticians from around the world in a virtual research group. By removing physical and economic barriers, VMOL provides training in computational mass spectrometry and bioinformatics/data science and launches virtual research projects as a new form of collaborative science. A central component of VMOL is its mission to train any interested researcher in mass spectrometry and computational metabolomics (regardless of background, circumstance, geographical location, etc.). The emphasis on inclusivity and accessibility is central to VMOL’s mission. The initiative recognizes that while diversity is valued in the scientific community, economic barriers often prevent many researchers from participating in critical events such as conferences and workshops. These events are vital for exchanging ideas, fostering collaborations, and creating opportunities, but they remain out of reach for many due to the costs involved.

 The VMOL Seminar series, a key component of this initiative, offers interactive, hands-on training in mass spectrometry and data analysis. These seminars are designed to start from basic principles and quickly bring participants to a level where they can independently execute various mass spectrometry data analysis tasks. The content of the seminars is dynamic, evolving based on the interests and needs of the participants. Guest lectures from experts in the field and participant-led presentations ensure that the seminars remain relevant and engaging.

 The COVID-19 pandemic dramatically altered the landscape of scientific communication. With in-person meetings and conferences canceled, researchers turned to online platforms to share their work. Surprisingly, this shift to remote communication was well-received and highlighted the potential for more inclusive and accessible scientific interactions. VMOL capitalized on this shift by organizing online workshops and seminars free of economic or social barriers of entry. These events reach a global audience, including many researchers who would not have been able to attend in-person conferences due to distance or funding limitations. The success of these online events is demonstrated through more than thousand researchers who subscribed to our mailing list and YouTube channel (www.youtube.com/@functionalmetabolomics) and more than hundred thousand views of the freely available recordings. We are fully convinced that these events demonstrate the tremendous potential for remote training resources to democratize science, making it more accessible to a broader audience.

 

Towards a More Inclusive Scientific Community

 The development of this protocol for the statistical analysis of FBMN results from non-targeted metabolomics analysis, in the frame of a community-driven effort, represents a significant step forward in making complex analytical tools more accessible to the global scientific community. By providing a comprehensive, user-friendly educational resource, we aim to lower the barriers to entry for researchers new to the field, enabling them to make use of and contribute to the rapidly advancing field of metabolomics. Our community aims to provide open access to our knowledge, protocols and findings (e.g. through Github, chemRxiv and YouTube) and we are committed to democratize science and foster a more diverse and innovative research community.

 As the scientific community continues to leverage and adapt to the opportunities presented by the digital age, we hope that initiatives like VMOL will spark new ways of scientific engagement that benefit scientific progress and make it more equitable around the world.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Metabolomics
Physical Sciences > Chemistry > Analytical Chemistry > Mass Spectrometry > Metabolomics
Mass Spectrometry
Life Sciences > Biological Sciences > Biological Techniques > Mass Spectrometry
Computational and Systems Biology
Life Sciences > Biological Sciences > Biological Techniques > Computational and Systems Biology
Bioinformatics
Life Sciences > Biological Sciences > Biological Techniques > Computational and Systems Biology > Bioinformatics