KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection
Xingyuan Li, Ruichao Hou, Tongwei Ren, Gangshan Wu
TL;DR
This work tackles RGB-T salient object detection under data-limited conditions by leveraging a prompt-learning approach that extends the Segment Anything Model to multitmodal inputs. It introduces KAN-SAM, which integrates Kolmogorov-Arnold Network adapters to inject thermal prompts into SAM2, and employs a mutually exclusive random masking strategy to reduce RGB over-reliance. The method achieves state-of-the-art performance on VT5000, VT1000, and VT821 datasets, while keeping most SAM2 components frozen to maintain efficiency. The results demonstrate that visual foundation models, augmented with lightweight, interpretable adapters and targeted masking, can robustly fuse RGB and thermal information for salient object detection.
Abstract
Existing RGB-thermal salient object detection (RGB-T SOD) methods aim to identify visually significant objects by leveraging both RGB and thermal modalities to enable robust performance in complex scenarios, but they often suffer from limited generalization due to the constrained diversity of available datasets and the inefficiencies in constructing multi-modal representations. In this paper, we propose a novel prompt learning-based RGB-T SOD method, named KAN-SAM, which reveals the potential of visual foundational models for RGB-T SOD tasks. Specifically, we extend Segment Anything Model 2 (SAM2) for RGB-T SOD by introducing thermal features as guiding prompts through efficient and accurate Kolmogorov-Arnold Network (KAN) adapters, which effectively enhance RGB representations and improve robustness. Furthermore, we introduce a mutually exclusive random masking strategy to reduce reliance on RGB data and improve generalization. Experimental results on benchmarks demonstrate superior performance over the state-of-the-art methods.
