Table of Contents
Fetching ...

Nested Unfolding Network for Real-World Concealed Object Segmentation

Chunming He, Rihan Zhang, Dingming Zhang, Fengyang Xiao, Deng-Ping Fan, Sina Farsiu

TL;DR

COS under real‑world degradations requires robust restoration and precise segmentation. The paper introduces NUN, a degradation‑robust nested unfolding network that embeds a DeRUN inside a SODUN, guided by a vision‑language model and reinforced by bi‑directional interaction and cross‑stage consistency. The method optimizes segmentation and restoration in a unified, multi‑stage framework with an IQA‑based selection mechanism, achieving state‑of‑the‑art results on both clean and degraded COS benchmarks. This work demonstrates that decoupled yet jointly optimized restoration and segmentation can greatly improve robustness and efficiency in real‑world visual perception tasks.

Abstract

Deep unfolding networks (DUNs) have recently advanced concealed object segmentation (COS) by modeling segmentation as iterative foreground-background separation. However, existing DUN-based methods (RUN) inherently couple background estimation with image restoration, leading to conflicting objectives and requiring pre-defined degradation types, which are unrealistic in real-world scenarios. To address this, we propose the nested unfolding network (NUN), a unified framework for real-world COS. NUN adopts a DUN-in-DUN design, embedding a degradation-resistant unfolding network (DeRUN) within each stage of a segmentation-oriented unfolding network (SODUN). This design decouples restoration from segmentation while allowing mutual refinement. Guided by a vision-language model (VLM), DeRUN dynamically infers degradation semantics and restores high-quality images without explicit priors, whereas SODUN performs reversible estimation to refine foreground and background. Leveraging the multi-stage nature of unfolding, NUN employs image-quality assessment to select the best DeRUN outputs for subsequent stages, naturally introducing a self-consistency loss that enhances robustness. Extensive experiments show that NUN achieves a leading place on both clean and degraded benchmarks. Code will be released.

Nested Unfolding Network for Real-World Concealed Object Segmentation

TL;DR

COS under real‑world degradations requires robust restoration and precise segmentation. The paper introduces NUN, a degradation‑robust nested unfolding network that embeds a DeRUN inside a SODUN, guided by a vision‑language model and reinforced by bi‑directional interaction and cross‑stage consistency. The method optimizes segmentation and restoration in a unified, multi‑stage framework with an IQA‑based selection mechanism, achieving state‑of‑the‑art results on both clean and degraded COS benchmarks. This work demonstrates that decoupled yet jointly optimized restoration and segmentation can greatly improve robustness and efficiency in real‑world visual perception tasks.

Abstract

Deep unfolding networks (DUNs) have recently advanced concealed object segmentation (COS) by modeling segmentation as iterative foreground-background separation. However, existing DUN-based methods (RUN) inherently couple background estimation with image restoration, leading to conflicting objectives and requiring pre-defined degradation types, which are unrealistic in real-world scenarios. To address this, we propose the nested unfolding network (NUN), a unified framework for real-world COS. NUN adopts a DUN-in-DUN design, embedding a degradation-resistant unfolding network (DeRUN) within each stage of a segmentation-oriented unfolding network (SODUN). This design decouples restoration from segmentation while allowing mutual refinement. Guided by a vision-language model (VLM), DeRUN dynamically infers degradation semantics and restores high-quality images without explicit priors, whereas SODUN performs reversible estimation to refine foreground and background. Leveraging the multi-stage nature of unfolding, NUN employs image-quality assessment to select the best DeRUN outputs for subsequent stages, naturally introducing a self-consistency loss that enhances robustness. Extensive experiments show that NUN achieves a leading place on both clean and degraded benchmarks. Code will be released.

Paper Structure

This paper contains 17 sections, 27 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Performance on clean and degraded COS data. Left: Concealed object masks are highlighted in red and purple, overlaid on the original clean data for visual clarity. "-C" and "-D" mean segmentation on clean and degraded samples. Right: The radar chart compares NUN with SOTAs over 12 COS tasks, where TOP1–TOP3 are composite baselines with the top metric scores per task. SD and RD denote synthetic and real-world degradation. Our NUN attains consistently superior results, especially under degradation.
  • Figure 2: Conceptual motivation of the proposed NUN framework. The blue and purple lines represent restoration‑oriented and segmentation‑oriented unfolding networks, which correspondingly suffer from task inconsistency and slow convergence. In contrast, the red line indicates our jointly optimized NUN, which bridges the task gap and achieves faster, more stable convergence.
  • Figure 3:
  • Figure 4: Visualization on COS tasks with real-world and synthetic degradation.
  • Figure 5: Segmentation performance in unseen degradation types.
  • ...and 1 more figures