A synthesis of bacterial and archaeal phenotypic trait data

Open-access code that merges 26 data sources, reconciles conflicting data and condenses multiple records into a single record per species.
Published in Research Data

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The field of “trait ecology” has emerged over the past two decades. It grew mainly out of plant ecology, beginning from an interest in understanding the spread of ecological strategies across species. Up to the 1990s, discussion of ecological strategies revolved around concepts such as stress-tolerance and ability to compete. But such concepts proved hard to define and therefore also to measure. For example, there was not a way to ask whether plant species in Cape Province tended to be more stress-tolerant than in northern England, in absence of an agreed method for measuring stress tolerance. Trait ecology solved this by positioning species along measurable axes such as seed mass, leaf mass per area (LMA) and potential height; measures that can be made on any species at any location, enabling direct comparison of species via common measures. The data synthesis described in our paper is a starting point for exploring trait axes in bacteria and archaea.

Global comparisons via quantitative traits have been notably productive. Major dimensions of variation have been characterized, and model species positioned against the constellation of variation (Díaz et al. 2016McWilliam et al. 2018). Traits have been used as predictors for decomposition rates (Cornwell et al. 2008) and for growth and response to competition (Gibert et al. 2016Kunstler et al. 2016). The communal TRY plant trait database (www.try-db.org) has found use in 291 publications so far, many from large collaborations with 10 or more authors from multiple research groups. An Open Traits Network (opentraits.org) aims to broaden this collaborative research style across all taxa (Gallagher et al. 2020). 

Bacteria and archaea are different from plants in many ways, but they do have this in common: that ecological strategies are largely discussed by reference to more or less abstract concepts such as the oligotrophy-copiotrophy spectrum; concepts that are hard to define and measure and thus presenting a similar problem to that of plant ecology in the 1990s. Accordingly we initiated a part-time project within the framework of a small Macquarie University collaborative network called the Species Spectrum Research Centre. The project aimed to gather as much trait information on bacteria and archaea as we could find and consolidate this information into a dataframe that could easily be probed to explore different ecological questions. 

Workshop at Macquarie University
Workshop at Macquarie University, November 2018. From left: Jennifer Martiny, Sasha Tetu, Phil Hugenholtz, Josh Madin, Frank the Bear, Daniel A Nielsen, TBK Reddy, Michael Gillings, Jemma Geoghegan, Mark Westoby and Andrew Bissett.

The core group consisted of Westoby (background in plant trait ecology), Gillings, Moore, Paulsen, Tetu and Nielsen (all microbiologists), and Madin who is a data scientist and coral biologist. The group met weekly over several years, progressively discovering data sources and discussing issues, papers and preliminary analyses. Due to the many decisions that had to be made in order to combine and process the data, it was decided early that data synthesis should take the form of code that was open access. Data was imported and stored as it came from original sources, and the various decisions made subsequently are all recorded in the code. These include reconciling units, merging columns that described the same trait but with different words, correcting or removing observations that were assessed to be errors for one reason or another, and condensing multiple records into a single summary record per species. Because the code is accessible, users can add in further data sources or modify any of these decisions as they see fit, and also add new trait information as it becomes available. 

To date, the data frame contains mostly phenotypic traits such as cell diameter and length, maximum growth rate, oxygen use, gram stain, growth temperature etc., but also some basic genome related traits such as genome size, number of coding genes and gc content (Madin et al. 2020). In total, the data frame covers 23 traits for over 15,000 unique species, although most species do not have a record for all of the traits. We decided not to include traits that were deduced through genome analyses, since this is a fast-developing field and annotations are continuing to improve rapidly. 

Our group’s aim was not only to build a dataset of species traits, but to answer a variety of ecological questions from it. Those questions will be addressed in separate papers, we hope. But we think other research groups will also find this data merger useful. Ideally it may continue to develop as a community resource.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Research Data
Research Communities > Community > Research Data

Related Collections

With collections, you can get published faster and increase your visibility.

Ecological data for tracking biological diversity and environmental change

This collection presents data contributions addressing topics in biodiversity and ecology.

Publishing Model: Open Access

Deadline: Jan 31, 2024

Medical imaging data for digital diagnostics

This Collection presents a series of articles describing annotated datasets of medical images and video. All medical specialities are considered and data can be derived from study participants, tissue samples, electronic health records (EHRs) or other sources.

Publishing Model: Open Access

Deadline: Dec 20, 2023