Table of Contents
Fetching ...

Deep Learning in Concealed Dense Prediction

Pancheng Zhao, Deng-Ping Fan, Shupeng Cheng, Salman Khan, Fahad Shahbaz Khan, David Clifton, Peng Xu, Jufeng Yang

TL;DR

This paper defines Concealed Dense Prediction (CDP) as a class of dense vision tasks where targets are concealed, demanding fine-grained representations and reasoning. It provides a taxonomy of concealment mechanisms (biological, optical, artificial) and surveys 25 state-of-the-art methods across 12 concealed datasets, evaluating CDP across segmentation, detection, and edge estimation tasks. The authors propose a unified, multimodal direction with CvpINST and CvpAgent to enable instruction-tuned, cross-task concealed perception, and outline six research directions to advance data, models, and evaluation. The work highlights practical applications in industry, agriculture, medicine, and safety, and argues for integrated large-model frameworks to drive progress toward general concealed perception. Overall, the paper offers a structured landscape and actionable roadmap for advancing CDP in the era of large multimodal models.

Abstract

Deep learning is developing rapidly and handling common computer vision tasks well. It is time to pay attention to more complex vision tasks, as model size, knowledge, and reasoning capabilities continue to improve. In this paper, we introduce and review a family of complex tasks, termed Concealed Dense Prediction (CDP), which has great value in agriculture, industry, etc. CDP's intrinsic trait is that the targets are concealed in their surroundings, thus fully perceiving them requires fine-grained representations, prior knowledge, auxiliary reasoning, etc. The contributions of this review are three-fold: (i) We introduce the scope, characteristics, and challenges specific to CDP tasks and emphasize their essential differences from generic vision tasks. (ii) We develop a taxonomy based on concealment counteracting to summarize deep learning efforts in CDP through experiments on three tasks. We compare 25 state-of-the-art methods across 12 widely used concealed datasets. (iii) We discuss the potential applications of CDP in the large model era and summarize 6 potential research directions. We offer perspectives for the future development of CDP by constructing a large-scale multimodal instruction fine-tuning dataset, CvpINST, and a concealed visual perception agent, CvpAgent.

Deep Learning in Concealed Dense Prediction

TL;DR

This paper defines Concealed Dense Prediction (CDP) as a class of dense vision tasks where targets are concealed, demanding fine-grained representations and reasoning. It provides a taxonomy of concealment mechanisms (biological, optical, artificial) and surveys 25 state-of-the-art methods across 12 concealed datasets, evaluating CDP across segmentation, detection, and edge estimation tasks. The authors propose a unified, multimodal direction with CvpINST and CvpAgent to enable instruction-tuned, cross-task concealed perception, and outline six research directions to advance data, models, and evaluation. The work highlights practical applications in industry, agriculture, medicine, and safety, and argues for integrated large-model frameworks to drive progress toward general concealed perception. Overall, the paper offers a structured landscape and actionable roadmap for advancing CDP in the era of large multimodal models.

Abstract

Deep learning is developing rapidly and handling common computer vision tasks well. It is time to pay attention to more complex vision tasks, as model size, knowledge, and reasoning capabilities continue to improve. In this paper, we introduce and review a family of complex tasks, termed Concealed Dense Prediction (CDP), which has great value in agriculture, industry, etc. CDP's intrinsic trait is that the targets are concealed in their surroundings, thus fully perceiving them requires fine-grained representations, prior knowledge, auxiliary reasoning, etc. The contributions of this review are three-fold: (i) We introduce the scope, characteristics, and challenges specific to CDP tasks and emphasize their essential differences from generic vision tasks. (ii) We develop a taxonomy based on concealment counteracting to summarize deep learning efforts in CDP through experiments on three tasks. We compare 25 state-of-the-art methods across 12 widely used concealed datasets. (iii) We discuss the potential applications of CDP in the large model era and summarize 6 potential research directions. We offer perspectives for the future development of CDP by constructing a large-scale multimodal instruction fine-tuning dataset, CvpINST, and a concealed visual perception agent, CvpAgent.

Paper Structure

This paper contains 38 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Representative concealed scenarios. Targets are concealed in their surroundings, so they may remain unnoticed even when directly observed.
  • Figure 2: The sample gallery and taxonomy of concealment. The left shows samples from the different concealment categories in $\S~$\ref{['sec:cmou_taxonomy']}. The right shows a hierarchical classification of these concealed categories.
  • Figure 3: A simplified chronicle of CDP, including three major waves in the development of CDP and some of the milestones that have advanced the field. COD: camouflaged object detection. CIR: concealed instance ranking. COL: concealed object localization. COS: concealed object segmentation. IOC: indiscernible object counting. CIG: camouflaged image generation.
  • Figure 4: A tree diagram of the CDP method taxonomy. CDP methods have been categorized into three classes based on the strategy of concealment, namely extracting detailed features, countering distraction strategies, and mining cues from motion.
  • Figure 5: Samples of annotation and additional cue used in CDP method. Left to right: original image, object annotation fan2020COD10K, rank annotation lv2021simultaneously, edge annotation fan2020COD10K, texture annotations zhu2021inferringli2022findnet, fixation annotations lv2021simultaneously, discriminative annotation jia2022segment, scribble annotation he2023weakly, depth and frequency.
  • ...and 4 more figures