Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection

Dingwen Zhang; Liangbo Cheng; Yi Liu; Xinggang Wang; Junwei Han

Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection

Dingwen Zhang, Liangbo Cheng, Yi Liu, Xinggang Wang, Junwei Han

TL;DR

This paper first extracts the implicit latent state in mamba as capsule vectors, which abstract type-level capsules from pixel-level versions, which greatly reduce the computation and parameters caused by the pixel-level capsule routing for part-whole relationships exploration.

Abstract

The part-whole relational property endowed by Capsule Networks (CapsNets) has been known successful for camouflaged object detection due to its segmentation integrity. However, the previous Expectation Maximization (EM) capsule routing algorithm with heavy computation and large parameters obstructs this trend. The primary attribution behind lies in the pixel-level capsule routing. Alternatively, in this paper, we propose a novel mamba capsule routing at the type level. Specifically, we first extract the implicit latent state in mamba as capsule vectors, which abstract type-level capsules from pixel-level versions. These type-level mamba capsules are fed into the EM routing algorithm to get the high-layer mamba capsules, which greatly reduce the computation and parameters caused by the pixel-level capsule routing for part-whole relationships exploration. On top of that, to retrieve the pixel-level capsule features for further camouflaged prediction, we achieve this on the basis of the low-layer pixel-level capsules with the guidance of the correlations from adjacent-layer type-level mamba capsules. Extensive experiments on three widely used COD benchmark datasets demonstrate that our method significantly outperforms state-of-the-arts. Code has been available on https://github.com/Liangbo-Cheng/mamba\_capsule.

Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection

TL;DR

Abstract

Paper Structure (19 sections, 23 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 23 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Camouflaged Object Detection
Capsule Network
Vision Mamba
Preliminaries
Proposed Method
Overview
Transformer Encoder
Mamba Capsule Generation
Capsules Spatial Details Retrieval
Transformer Decoder
Loss Function
Experiment and Analysis
Experimental Settings
...and 4 more sections

Figures (7)

Figure 1: Different capsule routings for part-whole relational camouflaged object detection. The original EM routing hinton2018matrix involves a significant number of parameters and routing complexity at the pixel level due to the dense routing. Differently, our proposed MCRNet compresses spatially pixel-level capsules into type-level capsules, leading to a substantial complexity-reduction type-level capsule routing. On top of that, the capsules spatial details retrieval is used to learn the spatial details of mamba capsules for further camouflaged object detection.
Figure 2: The overall architecture of MCRNet. The long-range context from Swin Transformer liu20219992 is first fed into the designed MCG module. In MCG, each type of constructed primary capsules is scanned in four directions, which are further input into the selective SSM gu20242312 module to achieve the implicit latent state, which is treated as the type-level mamba capsules for subsequent routing to learn the high-layer mamba capsules. In the following, the proposed CSDR module is used to retrieve the spatial details of mamba capsules for final camouflaged prediction. To learn primitive object edges, the object boundary label is also taken into account for training.
Figure 3: Details of MCG. The generated primary capsules are scanned in different directions into capsule sequences, which are input to selective SSM gu20242312 module. The final latent state is chosen as mamba capsules vectors.
Figure 4: Details of CSDR. Adjacent-layer mamba capsules compute their correlation, which is integrated with the low-layer pixel-level capsules to achieve the spatial details of the high-layer mamba capsules. The bottom indicates that four scanning directions will be transformed into a uniform direction to fuse the spatial details of the high-layer mamba capsules in different scanning directions through multi-head self-attention.
Figure 5: Visual comparisons of the proposed MCRNet and other popular SOTA methods. The proposed MCRNet segments the camouflaged objects well in challenging scenes, including small objects, large objects, the objects with uncertain boundaries, the objects that are obscured, and the concealed persons.
...and 2 more figures

Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection

TL;DR

Abstract

Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)