Behind the Paper

Computational Approaches to Support Identification of Chemicals in the Environment

Co-authored by Andrew D. McEachran and Antony J. Williams

Published in Research Data

Aug 02, 2019

Andrew McEachran

Application Scientist, Agilent Technologies

Computational Approaches to Support Identification of Chemicals in the Environment

Like Be the first to like this

Explore the Research

The number of chemicals detected in the environment continues to increase. These range from expected pollutants such as pesticides and pharmaceuticals (for example, opioids and cannabinoids) to metabolites and degradants. The rapid identification of small molecules in environmental monitoring studies generally utilizes high resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) techniques. NTA analysis generally combines the acquisition of HRMS spectral signatures for hundreds to thousands of chemicals with informatics approaches that perform searches against databases containing “known” chemicals.

Freely available public online databases can contain 10s of millions of chemicals (for example the PubChem and ChemSpider databases contain 96 million and 74 million substances, respectively, as of August 2019). While these large databases are useful for broad chemical searching, more focused databases are better-suited for identifying chemicals in the environment. At the US-EPA we have been building a more focused data collection to support our computational toxicology research for almost 20 years (the DSSTox Database [https://www.sciencedirect.com/science/article/pii/S2468111319300234]) and it now contains over 875,000 substances (as of August 2019). The “CompTox Chemicals Dashboard” (https://comptox.epa.gov/dashboard) is a freely available web interface accessing the data contained in DSSTox and has specific functionality that can support our mass spectrometry analyses and the identification of “known unknowns” (https://pubs.acs.org/doi/abs/10.1021/acs.est.7b01908).

When attempting to identify an unknown chemical in an environmental sample, most search techniques use either a generated molecular formula or an observed molecular mass to determine what are potential candidate chemicals for that unknown. In many cases tens to hundreds of chemicals can match a molecular formula or mass within the database. For example, the chemical formula for Bisphenol A (or BPA that many of us will know from the emphasis on “BPA-free” in commerce) corresponds to over 200 chemicals out of the collection of 875k substances (https://comptox.epa.gov/dashboard/dsstoxdb/multiple_results?inputs=C15H16O2&input_type=exact_formula&no_filters=true). The challenge is how to identify which of these chemicals is a more likely “candidate”. One of the approaches that has proven to be of value to date is “metadata ranking” (https://link.springer.com/article/10.1007/s00216-016-0139-z) that uses available data such as the number of consumer products containing the chemical, or the number of scientific articles in PubMed mentioning the article, to prioritize the candidates.

To further increase the confidence in an identification beyond metadata, researchers use spectral “fragmentation patterns” (how a chemical structure breaks apart in a high energy collision) to match what was observed on an analytical instrument to what has previously been observed for that same structure. These data, when available, can boost the confidence in identifying chemicals and there are an increasing number of freely available spectral databases available online (for example, MassBank (https://massbank.eu/MassBank/)). However, overall there is low availability of fragmentation data, limiting generalized high-throughput application in routine identifications. The goal in our reported work (https://www.nature.com/articles/s41597-019-0145-z) was to fill a crucial gap by predicting and storing the fragmentation patterns of the entirety of the EPA’s DSSTox database to enable easy access to both the rich metadata and fragmentation patterns for broad, high-throughput use to boost confidence in chemical identifications. We hope that individuals, research groups, and analytical chemistry vendors will find the data of value, informative, and effective.

Disclaimer: The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

Andrew McEachran

Application Scientist, Agilent Technologies

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data

Research Communities > Community > Research Data

Scientific Data

Scientific Data

A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Data for crop management

This Scientific Data Collection welcomes submissions of Data Descriptors associated with datasets for crop management, which are essential for optimising agricultural productivity, sustainability, and food security.

Publishing Model: Open Access

Deadline: Apr 17, 2026

Explore this Collection

Data to support drug discovery

This Scientific Data collection aims to gather data descriptors on high-quality, reusable datasets relevant to the drug discovery and development process.

Publishing Model: Open Access

Deadline: Apr 22, 2026

Explore this Collection

Bridging the Data Gap in Orthopedic AI: The Story of the PlaTiF Dataset

Behind the Paper

Improving Watershed Models with Tile and Rotation-Enhanced Cropland (TREC) dataset

Opportunities, From the Editors

Call for papers: Datasets for language sciences Collection

Opportunities, From the Editors

Call for papers: Trophic webs Collection

News and Opinion

Highlights from Mathematics, Physical and Applied Sciences Communities   

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Computational Approaches to Support Identification of Chemicals in the Environment

Share this post

Share with...

...or copy the link