Table of Contents
Fetching ...

KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection

Xingyuan Li, Ruichao Hou, Tongwei Ren, Gangshan Wu

TL;DR

This work tackles RGB-T salient object detection under data-limited conditions by leveraging a prompt-learning approach that extends the Segment Anything Model to multitmodal inputs. It introduces KAN-SAM, which integrates Kolmogorov-Arnold Network adapters to inject thermal prompts into SAM2, and employs a mutually exclusive random masking strategy to reduce RGB over-reliance. The method achieves state-of-the-art performance on VT5000, VT1000, and VT821 datasets, while keeping most SAM2 components frozen to maintain efficiency. The results demonstrate that visual foundation models, augmented with lightweight, interpretable adapters and targeted masking, can robustly fuse RGB and thermal information for salient object detection.

Abstract

Existing RGB-thermal salient object detection (RGB-T SOD) methods aim to identify visually significant objects by leveraging both RGB and thermal modalities to enable robust performance in complex scenarios, but they often suffer from limited generalization due to the constrained diversity of available datasets and the inefficiencies in constructing multi-modal representations. In this paper, we propose a novel prompt learning-based RGB-T SOD method, named KAN-SAM, which reveals the potential of visual foundational models for RGB-T SOD tasks. Specifically, we extend Segment Anything Model 2 (SAM2) for RGB-T SOD by introducing thermal features as guiding prompts through efficient and accurate Kolmogorov-Arnold Network (KAN) adapters, which effectively enhance RGB representations and improve robustness. Furthermore, we introduce a mutually exclusive random masking strategy to reduce reliance on RGB data and improve generalization. Experimental results on benchmarks demonstrate superior performance over the state-of-the-art methods.

KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection

TL;DR

This work tackles RGB-T salient object detection under data-limited conditions by leveraging a prompt-learning approach that extends the Segment Anything Model to multitmodal inputs. It introduces KAN-SAM, which integrates Kolmogorov-Arnold Network adapters to inject thermal prompts into SAM2, and employs a mutually exclusive random masking strategy to reduce RGB over-reliance. The method achieves state-of-the-art performance on VT5000, VT1000, and VT821 datasets, while keeping most SAM2 components frozen to maintain efficiency. The results demonstrate that visual foundation models, augmented with lightweight, interpretable adapters and targeted masking, can robustly fuse RGB and thermal information for salient object detection.

Abstract

Existing RGB-thermal salient object detection (RGB-T SOD) methods aim to identify visually significant objects by leveraging both RGB and thermal modalities to enable robust performance in complex scenarios, but they often suffer from limited generalization due to the constrained diversity of available datasets and the inefficiencies in constructing multi-modal representations. In this paper, we propose a novel prompt learning-based RGB-T SOD method, named KAN-SAM, which reveals the potential of visual foundational models for RGB-T SOD tasks. Specifically, we extend Segment Anything Model 2 (SAM2) for RGB-T SOD by introducing thermal features as guiding prompts through efficient and accurate Kolmogorov-Arnold Network (KAN) adapters, which effectively enhance RGB representations and improve robustness. Furthermore, we introduce a mutually exclusive random masking strategy to reduce reliance on RGB data and improve generalization. Experimental results on benchmarks demonstrate superior performance over the state-of-the-art methods.

Paper Structure

This paper contains 16 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison of frameworks between existing multi-modal methods and our KAN-SAM method.
  • Figure 2: The framework of our KAN-SAM, which consists of the mutually exclusive random masking strategy, the KAN adapters and the SAM2.
  • Figure 3: Detail design of the KAN adapter.
  • Figure 4: Visualization comparison of KAN-SAM with the state-of-the-art methods.