Behind the Paper

Wolfset: A High-Quality Underwater Acoustic Dataset for Algorithm Development and Analysis

Wolfset is a high-quality acoustic dataset recorded in an anechoic tank using a Bruel & Kjaer 8104 hydrophone. It features a variety of outboard and electric motor sounds, combined with realistic noise sources to create data for developing and testing sound classification algorithms.

General Notes:
Collecting underwater acoustics is costly and time-consuming, so we built Wolfset to provide a ready-to-use benchmark containing about 1.5 GB, 168 WAV files, and roughly 5 hours of recordings, all validated for consistency and quality. All the data were analyzed correctly and validated before being added to the final dataset, as illustrated in Figure 1.

Controlled Facility:
All sounds were recorded inside the anechoic tank at Lisbon Naval Base, as illustrated in Figure 2. The tank, built in 1976, measures 8 m × 5 m × 5 m, is lined with cork-rubber absorbent panels, and is equipped with two movable bridges that position sensors and sources anywhere within the water volume.

Instrumentation Chain:
Signals were recorded using a calibrated Brüel & Kjær 8104 hydrophone (0.1 Hz – 200 kHz, with constant directivity up to 20 kHz), placed 2.5 meters deep at the center of the tank, as illustrated in Figure 3. The hydrophone output was connected to a two-stage, adjustable-gain Brüel & Kjær 2636 amplifier set with a 22.4 kHz low-pass filter, followed by a 16-bit sound card sampling at 44.1 kHz. Levels were monitored with an HP oscilloscope and spectral analyzer.

Target sources:
Five propulsion units were tested: four Mercury outboard engines rated at 3.6, 4.5, 8, and 18 horsepower (Figure 4), along with an electric motor from a radio-controlled ship model (Figure 5). Each unit was recorded under various operating conditions, from idle disengaged to medium forward, with additional cyclic accelerations where specified.

Background noise and transients:
To mimic coastal clutter, we introduced controlled disturbances: intense and mild compressed-air bubbling, low- and high-flow water hoses, water-bucket pours, metallic-tube impacts with a mallet or hammer, and discrete air-rifle shots, as illustrated in Figures 6 and 7. Pure-noise and pure-transient segments were recorded separately to support data augmentation and detection tasks.

Extended scenarios and use cases:
Wolfset includes all ten pairwise motor combinations plus one triple-motor case, recorded without added noise or transients to isolate interaction tones. These controlled mixes, along with the single-source clips, support research in source separation, classification, and domain adaptation between pristine and cluttered underwater environments. The strict control of space, hardware, and metadata makes Wolfset a reproducible reference for benchmarking modern machine learning models in underwater acoustics.