Behind the Paper

The story behind AMAISE (A Machine learning Approach to Index-free Sequence Enrichment)

Why we decided to develop a machine learning-based tool to computationally remove host sequences from samples

Published in Bioengineering & Biotechnology

Jun 09, 2022

Meera Krishnamoorthy

Graduate Student, University of Michigan

Liked by Evelina Satkevic and 1 other

Explore the Research

In 2017, Robert Dickson and other collaborators published work on rapidly identifying bacterial pneumonia using real-time DNA sequencing technology and a metagenomic classification method. Despite this breakthrough, bedside clinical diagnostics based on DNA sequencing technologies and metagenomic classification methods remain uncommon. This is due in part to the fact that current real-time sequencing technologies output long-read data, which is often noisy. Applied to these noisy data, existing metagenomic classification methods are memory intensive and inaccurate.

In 2019, my thesis advisor Jenna Wiens and I started working with Robert Dickson, John Erb-Downward and Piyush Ranjan to determine how machine learning (ML) could improve the accuracy and memory efficiency of the metagenomic classification pipeline. Recognizing that ML methods can perform classification without relying on large reference databases, we first looked into replacing existing metagenomic classification methods with ML methods. Past work has developed ML methods to classify both long read and short read input data. However, outperforming existing non-ML based approaches (e.g., Kraken2, Centrifuge, Minimap2, and Bowtie2) in terms of memory efficiency and accuracy on the task of multi-class classification proved challenging.

Ultimately, our exploration led us to a realization: much of the inefficiency of existing metagenomic pipelines is due to the necessity to classify (i.e., remove) a large amount of host DNA because many microbiomes are dominated by host data. Thus, instead of entirely replacing existing metagenomic classification methods, we focused on the gains that could be achieved by computationally eliminating host sequences from samples before using them as input to metagenomic classification methods. To this end, we developed, AMAISE (A Machine learning Approach to Index-free Sequence Enrichment) an ML based pre-processing method that computationally removes host data and in turn improves the accuracy and memory efficiency of the metagenomic classification pipeline.

Existing metagenomic classification methods function by identifying exact matches between k-mers within sequences and their reference databases. In contrast, ML methods learn more general patterns. This flexibility provides an advantage when applied to noisier (i.e., long-read) data. Thus, we focus on long-read classification. Given a long-read sequence, AMAISE outputs a classification label determining whether it belongs to a host or a microbe (0 for microbe and 1 for host). When used to augment existing metagenomics pipelines, AMAISE improved accuracy and memory efficiency by over 10% and 14% respectively, while achieving comparable if not better speed to the original pipelines. Furthermore, AMAISE was developed such that it could be applied before time-consuming quality control steps that are typically applied in the current metagenomic classification pipeline.

Accuracy, speed, and memory efficiency are all important in developing bedside clinical diagnostic technologies. High accuracy is critical to ensuring that the correct pathogens are identified and the appropriate treatments selected. Speed ensures that treatment can be given in a timely manner. And memory efficiency is important to the accessibility of the technology. Given the improvements provided by adding AMAISE to the current metagenomic classification pipeline, we believe that this work brings us one step closer to clinical diagnostics via DNA sequencing technologies at the bedside.

Meera Krishnamoorthy

Graduate Student, University of Michigan

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biotechnology

Life Sciences > Biological Sciences > Biotechnology

Communications Biology

Communications Biology

An open access journal from Nature Portfolio publishing high-quality research, reviews and commentary in all areas of the biological sciences, representing significant advances and bringing new biological insight to a specialized area of research.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Signalling Pathways of Innate Immunity

In this cross-journal Collection, we invite research into the complex signalling pathways of innate immunity, emphasising the activation and regulation of pattern recognition receptors in response to microbial and endogenous triggers.

Publishing Model: Hybrid

Deadline: Feb 28, 2026

Explore this Collection

Forces in Cell Biology

Cell generate forces to maintain normal tissue morphology and function. Cells can also sense and process forces appropriate to their correct tissue context. With this cross-journal Collection between Communications Biology and Nature Communications, we welcome the submission of primary research articles exploring molecular mechanisms underlying how cells react to external mechanical stimuli, to forces between cells, and to intercellular forces

Publishing Model: Open Access

Deadline: Apr 30, 2026

Explore this Collection

Latest Content

Behind the Paper

African green growth initiatives have a positive impact on agricultural productivity but not on fisheries

Behind the Paper

An Integrated Modified Failure Mode Effects Analysis Shannon Entropy Combined Compromise Solution Approach to Safety Risk Assessment in Stone Crusher Unit of Ceramic Sector

Circulatory Existence Theory

Theoretical study of linear and nonlinear optical properties of ethanamide derivatives

Retrieval of Fractured Abutment Screw of Dental Implant. Case Report

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

The story behind AMAISE (A Machine learning Approach to Index-free Sequence Enrichment)

Share this post

Share with...

...or copy the link