A deep learning framework based on structured space model for detecting small objects in complex underwater environments

Underwater target detection plays a crucial role in monitoring the marine ecological environment. In this paper, we propose a deep learning framework combining Structured Space Model(SSM) and CNN, specifically designed for small target detection tasks in complex underwater environments.
Published in Sustainability
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Research Background

Regular monitoring of marine life is crucial for maintaining the stability of marine ecosystems, and effective marine monitoring relies on accurately counting the species and quantities of marine organisms. Therefore, underwater target detection algorithms play a significant role in assessing the stability of marine ecosystems and have profound research significance. However, current underwater target detection algorithms face three main challenges:

  1. Underwater scenes are often affected by a blue-green light shift, causing confusion between the target and background, which increases the difficulty of detection.
  2. The underwater environment contains numerous small targets, which are prone to stacking and occlusion, making it difficult for existing detection algorithms to identify them accurately.
  3. Since underwater robots are the main tools for ocean exploration and documentation, their computational capacity is limited by hardware constraints, meaning underwater target detection algorithms must be lightweight to meet real-time detection requirements.

Why Consider Applying the Mamba Model to Object Detection Tasks?

The Mamba model is based on the  structured space model (SSM), and its core advantage lies in its ability to model globally, effectively addressing the limitations of traditional convolutional neural networks (CNNs) in terms of local receptive fields. While Transformer models also offer global modeling capabilities, their computational complexity grows quadratically, which places high demands on hardware resources. In contrast, the Mamba model overcomes this limitation with its selective scanning mechanism and linear computational complexity, making it particularly well-suited for resource-constrained environments.

Despite Mamba's strong global modeling capabilities in object detection tasks, we found that relying solely on SSM for feature extraction did not achieve the desired results. This is because, as a causal modeling method similar to RNNs, the Mamba model processes each image block sequentially, lacking sensitivity to long-range dependencies between non-adjacent pixels. To address this issue, we attempted to combine SSM with CNNs, aiming to provide richer local information to the image through CNNs, thereby enhancing the feature representation before processing with SSM.

Summary and Future Directions

This paper proposes an underwater small target detection method, UWNet, which combines the Mamba model with the YOLO framework. By introducing the Mamba model and a multi-scale implicit feature fusion module, we significantly improve detection accuracy for small underwater targets, particularly in handling complex underwater scenes, demonstrating stronger robustness and accuracy compared to traditional detection algorithms. Experimental results show that UWNet outperforms existing object detection methods across several test sets.

Although UWNet has achieved good accuracy in underwater target detection, further optimization of the method is still possible in future research. First, model pruning and knowledge distillation techniques can be employed to further reduce computational costs and model complexity, enhancing real-time detection capabilities. Second, underwater image enhancement techniques can be considered to improve image clarity and reduce the impact of color distortion on detection results. Alternatively, diffusion models can be used to augment underwater datasets by generating underwater images in various scenes and styles, thereby enhancing data diversity and improving the model's generalization ability.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Sustainability
Research Communities > Community > Sustainability

Related Collections

With collections, you can get published faster and increase your visibility.

Smart Manufacturing

This collection aims to foster the integration of artificial intelligence and data science with manufacturing for enhanced productivity, sustainability, and economic performance.

Publishing Model: Open Access

Deadline: May 31, 2025

Wearable Devices for Assisted Mobility

This collection of articles from Communications Engineering, Nature Communications and Scientific Reports reflects the variety of different research directions towards wearable devices for monitoring, rehabilitating, restoring and even augmenting physiological movement.

Publishing Model: Open Access

Deadline: May 31, 2025