When I began my PhD, I never imagined that one of its most impactful outcomes would be a data paper. Published in Scientific Data, this paper presents the first metagenomics-based genetic catalogue of estuarine sediments and summarises years of laboratory and sequence data analysis work into a few paragaphs. We consider it important that this resource was disseminated and designed not just to support our own research, but also to be shared in a way that is openly available to the global scientific community.
Why a Data Paper?
The idea of publishing a data paper came late in my PhD thesis. Over the course of my PhD work, we collected 92 sediment samples from 12 estuaries along the Basque coast and sequenced 3.35 terabases of eDNA. The result: over 108 million unique genes from bacteria, archaea, eukaryotes, and viruses and as well as 471 reconstructed microbial genomes (MAGs). When we looked at the vast amount of metagenomic data we had generated, it dawned on us that it would be better to make all this accessible and reusable by others, and to ourselves for future studies in a manner that could be better documented. We hope that this catalogue can be a fundamental resource for future research in microbial ecology, environmental monitoring, and bioprospecting. It could help scientists to explore ecological functions, detect environmental stress indicators, and even identify potential biotechnological applications in an understudied and complex environment such as the microbial estuarine benthic community.
A Personal Anecdote
One unexpected outcome of this project was a shift in perspective for my thesis advisor. As we worked through the complexities of organizing and working with such a massive dataset for a department mainly focused on metabarcoding, he remarked, "Now I understand why TARA Oceans took so long to publish their data!". That moment underscored the scale and challenge of metagenomics for me, and the importance of making data available in a structured, meaningful way for open science.
The Broader Impact
While the catalogue itself is a significant scientific achievement, its immediate impact on conservation policy may be limited. Environmental DNA (eDNA) techniques, though powerful, are still not widely accepted for routine monitoring frameworks. Nevertheless, we believe that resources like this catalogue can help bridge the gap between cutting-edge science and practical application. By making our data openly available, we hope to accelerate the adoption of molecular tools in ecosystem management.
Looking Ahead
This is just the beginning. The catalogue lays the groundwork for future studies on microbial dynamics, estuarine health, and the role of coastal and estuary sediments in global biogeochemical cycles like pollutant bioremediation or carbon sequestration. It also opens doors for interdisciplinary research, combining genomics with network theory, machine learning, and ecological modeling.
We invite researchers from all fields to explore the dataset, reuse it, and build upon it. Open science thrives when data is shared, and we’re proud to contribute to that vision.
Visual Highlights from the Study
Figure 1. Sampling Stations Map
Map showing the 29 sampling stations across 12 estuaries along the Basque coast. These locations span a gradient of salinity and anthropogenic pressures, providing a diverse ecological context for the study.
Figure 2. Functional and Taxonomic Classification of Genes
Bar plots and pie charts illustrating the distribution of over 108 million genes across functional databases (COG, KEGG, PFAM) and taxonomic groups (bacteria, archaea, eukaryotes, viruses).
Fieldwork in Action
Sampling estuarine sediments is a muddy, meticulous process. Here, members of our team collect samples during low tide, ensuring consistency across sites and seasons.
Article link: https://rdcu.be/eS72r