CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection
Jin Lian, Zhongyu Wan, Ming Gao, JunFeng Chen
TL;DR
CFMD addresses two key issues in salient object detection: boundary degradation from upsampling and inefficiency in multi-scale feature fusion. It introduces CFLMA, a Mamba-based context-aware aggregation for dynamic cross-layer weighting, and CLFDD, a dynamic upsampling distribution that uses content-informed offsets to preserve spatial details during resolution recovery. Together, these modules form a two-stage, architecture-agnostic framework that improves pixel-level accuracy and boundary segmentation, with ablations showing strong gains on challenging datasets and across backbones. The results suggest substantial practical benefits for real-time and robust saliency detection in complex scenes, while future work points to RGB-D extension, mobile adaptation, and 3D extensions of the long-range dependency modeling.
Abstract
Cross-layer feature pyramid networks (CFPNs) have achieved notable progress in multi-scale feature fusion and boundary detail preservation for salient object detection. However, traditional CFPNs still suffer from two core limitations: (1) a computational bottleneck caused by complex feature weighting operations, and (2) degraded boundary accuracy due to feature blurring in the upsampling process. To address these challenges, we propose CFMD, a novel cross-layer feature pyramid network that introduces two key innovations. First, we design a context-aware feature aggregation module (CFLMA), which incorporates the state-of-the-art Mamba architecture to construct a dynamic weight distribution mechanism. This module adaptively adjusts feature importance based on image context, significantly improving both representation efficiency and generalization. Second, we introduce an adaptive dynamic upsampling unit (CFLMD) that preserves spatial details during resolution recovery. By adjusting the upsampling range dynamically and initializing with a bilinear strategy, the module effectively reduces feature overlap and maintains fine-grained boundary structures. Extensive experiments on three standard benchmarks using three mainstream backbone networks demonstrate that CFMD achieves substantial improvements in pixel-level accuracy and boundary segmentation quality, especially in complex scenes. The results validate the effectiveness of CFMD in jointly enhancing computational efficiency and segmentation performance, highlighting its strong potential in salient object detection tasks.
