FIPGNet:Pyramid grafting network with feature interaction strategies
Ziyi Ding, Like Xin
TL;DR
This work tackles the localization difficulty in pyramid graft networks for salient object detection by introducing FIPGNet, a pyramid graft network augmented with Feature Interaction Strategies (FIA). FIA employs Spatial Agent Cross Attention (SACA) for multi-scale spatial interaction and Channel Agent Cross Attention (CCM) to align backbone features with processed representations, addressing cross-scale correlation gaps. The approach includes token generation, an MLP residual refinement, and a CCM to ensure consistent feature fusion, yielding improved salient-region localization. Experiments on six datasets demonstrate competitive performance against twelve state-of-the-art methods, highlighting the practical value of cross-scale attention-driven feature interaction for robust SOD.
Abstract
Salient object detection is designed to identify the objects in an image that attract the most visual attention.Currently, the most advanced method of significance object detection adopts pyramid grafting network architecture.However, pyramid-graft network architecture still has the problem of failing to accurately locate significant targets.We observe that this is mainly due to the fact that current salient object detection methods simply aggregate different scale features, ignoring the correlation between different scale features.To overcome these problems, we propose a new salience object detection framework(FIPGNet),which is a pyramid graft network with feature interaction strategies.Specifically, we propose an attention-mechanism based feature interaction strategy (FIA) that innovatively introduces spatial agent Cross Attention (SACA) to achieve multi-level feature interaction, highlighting important spatial regions from a spatial perspective, thereby enhancing salient regions.And the channel proxy Cross Attention Module (CCM), which is used to effectively connect the features extracted by the backbone network and the features processed using the spatial proxy cross attention module, eliminating inconsistencies.Finally, under the action of these two modules, the prominent target location problem in the current pyramid grafting network model is solved.Experimental results on six challenging datasets show that the proposed method outperforms the current 12 salient object detection methods on four indicators.
