Table of Contents
Fetching ...

Hybrid Mamba for Few-Shot Segmentation

Qianxiong Xu, Xuanyi Liu, Lanyun Zhu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao

TL;DR

A cross (attention-like) Mamba is devised to capture inter-sequence dependencies for FSS, including a hybrid Mamba network (HMNet), including a support recapped Mamba to periodically recap the support features when scanning query, so the hidden state can always contain rich support information.

Abstract

Many few-shot segmentation (FSS) methods use cross attention to fuse support foreground (FG) into query features, regardless of the quadratic complexity. A recent advance Mamba can also well capture intra-sequence dependencies, yet the complexity is only linear. Hence, we aim to devise a cross (attention-like) Mamba to capture inter-sequence dependencies for FSS. A simple idea is to scan on support features to selectively compress them into the hidden state, which is then used as the initial hidden state to sequentially scan query features. Nevertheless, it suffers from (1) support forgetting issue: query features will also gradually be compressed when scanning on them, so the support features in hidden state keep reducing, and many query pixels cannot fuse sufficient support features; (2) intra-class gap issue: query FG is essentially more similar to itself rather than to support FG, i.e., query may prefer not to fuse support features but their own ones from the hidden state, yet the success of FSS relies on the effective use of support information. To tackle them, we design a hybrid Mamba network (HMNet), including (1) a support recapped Mamba to periodically recap the support features when scanning query, so the hidden state can always contain rich support information; (2) a query intercepted Mamba to forbid the mutual interactions among query pixels, and encourage them to fuse more support features from the hidden state. Consequently, the support information is better utilized, leading to better performance. Extensive experiments have been conducted on two public benchmarks, showing the superiority of HMNet. The code is available at https://github.com/Sam1224/HMNet.

Hybrid Mamba for Few-Shot Segmentation

TL;DR

A cross (attention-like) Mamba is devised to capture inter-sequence dependencies for FSS, including a hybrid Mamba network (HMNet), including a support recapped Mamba to periodically recap the support features when scanning query, so the hidden state can always contain rich support information.

Abstract

Many few-shot segmentation (FSS) methods use cross attention to fuse support foreground (FG) into query features, regardless of the quadratic complexity. A recent advance Mamba can also well capture intra-sequence dependencies, yet the complexity is only linear. Hence, we aim to devise a cross (attention-like) Mamba to capture inter-sequence dependencies for FSS. A simple idea is to scan on support features to selectively compress them into the hidden state, which is then used as the initial hidden state to sequentially scan query features. Nevertheless, it suffers from (1) support forgetting issue: query features will also gradually be compressed when scanning on them, so the support features in hidden state keep reducing, and many query pixels cannot fuse sufficient support features; (2) intra-class gap issue: query FG is essentially more similar to itself rather than to support FG, i.e., query may prefer not to fuse support features but their own ones from the hidden state, yet the success of FSS relies on the effective use of support information. To tackle them, we design a hybrid Mamba network (HMNet), including (1) a support recapped Mamba to periodically recap the support features when scanning query, so the hidden state can always contain rich support information; (2) a query intercepted Mamba to forbid the mutual interactions among query pixels, and encourage them to fuse more support features from the hidden state. Consequently, the support information is better utilized, leading to better performance. Extensive experiments have been conducted on two public benchmarks, showing the superiority of HMNet. The code is available at https://github.com/Sam1224/HMNet.
Paper Structure (26 sections, 13 equations, 10 figures, 9 tables)

This paper contains 26 sections, 13 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Illustrations of (a) existing cross Mamba, (b) our support recapped Mamba (SRM); and (c) our query intercepted Mamba (QIM). In (a), the support features are firstly scanned and selectively compressed into the hidden state, which is expected to be fused into query FG. Nevertheless, (1) with the scan on query, the compressed support FG is gradually reduced, and (2) query FG is essentially more similar to itself rather than support FG. Thus, the support FG cannot well enhance the query FG features. In (b) and (c), we design (1) a SRM to periodically re-scan the support FG, so the hidden state always contain sufficient support features, and (2) a QIM to intercept the mutual interactions among query pixels, thus, they are forcibly fused with support features.
  • Figure 2: Overview of HMNet. Mamba blocks consist of alternatively appeared self Mamba blocks (SMB) and hybrid Mamba blocks (HMB). Self Mamba aims at capturing the intra-sequence correlations, while hybrid Mamba attempts to capture the support-query intra-sequence dependencies. Hybrid Mamba further includes a support recapped Mamba (SRM) and a query intercepted Mamba (QIM) to address the support forgetting and intra-class gap issues.
  • Figure 3: Illustration of HMB. (1) Based on different scanning directions liu2024vmamba, SRM arranges support and query features into 4 sequences in the form of alternatively appeared support and query patches, which are sequentially scanned with 4 sets of parameters $\Theta$. (2) After scanning support features for the first time in SRM, 4 hidden states are averaged into $H_S$. In QIM, $H_S$ is used to scan query features in parallel. Note that QIM's parameter is shared with the first SRM.
  • Figure 4: Qualitative comparisons with HDMNet peng2023hierarchical on PASCAL-5$^i$ and COCO-20$^i$.
  • Figure 5: Changes of the hidden state in SRM across time, take the first SRM as an example.
  • ...and 5 more figures