Wolfset: A High-Quality Underwater Acoustic Dataset for Algorithm Development and Analysis
Published in Earth & Environment and Physics
General Notes:
Collecting underwater acoustics is costly and time-consuming, so we built Wolfset to provide a ready-to-use benchmark containing about 1.5 GB, 168 WAV files, and roughly 5 hours of recordings, all validated for consistency and quality. All the data were analyzed correctly and validated before being added to the final dataset, as illustrated in Figure 1.
Controlled Facility:
All sounds were recorded inside the anechoic tank at Lisbon Naval Base, as illustrated in Figure 2. The tank, built in 1976, measures 8 m × 5 m × 5 m, is lined with cork-rubber absorbent panels, and is equipped with two movable bridges that position sensors and sources anywhere within the water volume.
Instrumentation Chain:
Signals were recorded using a calibrated Brüel & Kjær 8104 hydrophone (0.1 Hz – 200 kHz, with constant directivity up to 20 kHz), placed 2.5 meters deep at the center of the tank, as illustrated in Figure 3. The hydrophone output was connected to a two-stage, adjustable-gain Brüel & Kjær 2636 amplifier set with a 22.4 kHz low-pass filter, followed by a 16-bit sound card sampling at 44.1 kHz. Levels were monitored with an HP oscilloscope and spectral analyzer.
Target sources:
Five propulsion units were tested: four Mercury outboard engines rated at 3.6, 4.5, 8, and 18 horsepower (Figure 4), along with an electric motor from a radio-controlled ship model (Figure 5). Each unit was recorded under various operating conditions, from idle disengaged to medium forward, with additional cyclic accelerations where specified.
Background noise and transients:
To mimic coastal clutter, we introduced controlled disturbances: intense and mild compressed-air bubbling, low- and high-flow water hoses, water-bucket pours, metallic-tube impacts with a mallet or hammer, and discrete air-rifle shots, as illustrated in Figures 6 and 7. Pure-noise and pure-transient segments were recorded separately to support data augmentation and detection tasks.
Extended scenarios and use cases:
Wolfset includes all ten pairwise motor combinations plus one triple-motor case, recorded without added noise or transients to isolate interaction tones. These controlled mixes, along with the single-source clips, support research in source separation, classification, and domain adaptation between pristine and cluttered underwater environments. The strict control of space, hardware, and metadata makes Wolfset a reproducible reference for benchmarking modern machine learning models in underwater acoustics.
Follow the Topic
-
Scientific Data
A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.
Related Collections
With Collections, you can get published faster and increase your visibility.
Data for crop management
Publishing Model: Open Access
Deadline: Jan 17, 2026
Computed Tomography (CT) Datasets
Publishing Model: Open Access
Deadline: Feb 21, 2026
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in