Table of Contents
Fetching ...

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang

TL;DR

LAKE-RED tackles camouflaged image generation without human-specified backgrounds by introducing a knowledge retrieval-augmented diffusion framework. It decouples background retrieval from reasoning, leveraging BKRM to extract background cues from a codebook using foreground features, LMP for richer foreground representation, and RCEM to guide background reconstruction via a reconstructed feature loss. The approach demonstrates superior realism and camouflage quality over SOTA methods on camouflage datasets, supported by both quantitative metrics (FID/KID) and qualitative/user studies, with minimal computational overhead. Overall, the method broadens camouflaged vision perception by enabling scalable, background-free generation across diverse foregrounds and domains.

Abstract

Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images.

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

TL;DR

LAKE-RED tackles camouflaged image generation without human-specified backgrounds by introducing a knowledge retrieval-augmented diffusion framework. It decouples background retrieval from reasoning, leveraging BKRM to extract background cues from a codebook using foreground features, LMP for richer foreground representation, and RCEM to guide background reconstruction via a reconstructed feature loss. The approach demonstrates superior realism and camouflage quality over SOTA methods on camouflage datasets, supported by both quantitative metrics (FID/KID) and qualitative/user studies, with minimal computational overhead. Overall, the method broadens camouflaged vision perception by enabling scalable, background-free generation across diverse foregrounds and domains.

Abstract

Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images.
Paper Structure (13 sections, 10 equations, 6 figures, 2 tables)

This paper contains 13 sections, 10 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: LAKE-RED synthesizes realistic camouflaged images for a given foreground object by a knowledge retrieval-augmented diffusion model. Without any human-specified background; the model automatically generates a background sufficient to conceal the foreground objects. (b) and (c) shows the image generation process of our method in two application scenarios.
  • Figure 2: Comparison of Frameworks for Camouflage Image Generation. Existing methods rely on manually specified backgrounds, which not only receive limitations in diversity and scope from the human's own cognition but also result in expensive image generation on a large scale. Without changing the texture of itself, the same target can be camouflaged to different degrees in different environments. Inspired by it, we synthesize camouflaged images through a background inpainting stream, hiding by automatically choosing a suitable background for the object.
  • Figure 3: The pipeline of our camouflaged images generation framework LAKE-RED. Our framework mainly includes three steps: (1) Extracting visual representations of foreground areas by Localized Masked Pooling (LMP). (2) The Background Knowledge Retrieval Module (BKRM) is utilized to retrieve background-related features from the codebook. (3) The Reasoning-Driven Condition Enhancement module (RCEM) allows the model to learn foreground-to-background reasoning through a background reconstruction.
  • Figure 4: Comparison with existing methods in transferring general images into camouflaged images. The first two columns are the input images and we provide camouflaged images generated by nine methods for the comparison. Note that the methods in columns 3 to 7 additionally share a randomly sampled background image as input.
  • Figure 5: User study about subjective ratings of the camouflaged image generated by 9 different methods. Our method is considered to produce the most natural and visually closest results to the real camouflage image.
  • ...and 1 more figures