Table of Contents
Fetching ...

Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models

Amira Guesmi, Muhammad Shafique

Abstract

Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to adversarial manipulations of visual content. Prior object-hiding attacks primarily rely on suppressing or blocking region-specific representations, often creating semantic gaps that inadvertently induce hallucination, where models invent plausible but incorrect objects. In this work, we demonstrate that hallucination arises not from object absence per se, but from semantic discontinuity introduced by such suppression-based attacks. We propose a new class of \emph{background-consistent object concealment} attacks, which hide target objects by re-encoding their visual representations to be statistically and semantically consistent with surrounding background regions. Crucially, our approach preserves token structure and attention flow, avoiding representational voids that trigger hallucination. We present a pixel-level optimization framework that enforces background-consistent re-encoding across multiple transformer layers while preserving global scene semantics. Extensive experiments on state-of-the-art vision-language models show that our method effectively conceals target objects while preserving up to $86\%$ of non-target objects and reducing grounded hallucination by up to $3\times$ compared to attention-suppression-based attacks.

Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models

Abstract

Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to adversarial manipulations of visual content. Prior object-hiding attacks primarily rely on suppressing or blocking region-specific representations, often creating semantic gaps that inadvertently induce hallucination, where models invent plausible but incorrect objects. In this work, we demonstrate that hallucination arises not from object absence per se, but from semantic discontinuity introduced by such suppression-based attacks. We propose a new class of \emph{background-consistent object concealment} attacks, which hide target objects by re-encoding their visual representations to be statistically and semantically consistent with surrounding background regions. Crucially, our approach preserves token structure and attention flow, avoiding representational voids that trigger hallucination. We present a pixel-level optimization framework that enforces background-consistent re-encoding across multiple transformer layers while preserving global scene semantics. Extensive experiments on state-of-the-art vision-language models show that our method effectively conceals target objects while preserving up to of non-target objects and reducing grounded hallucination by up to compared to attention-suppression-based attacks.
Paper Structure (26 sections, 14 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 14 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of Background-Consistent Re-encoding (BCR). A pixel-level perturbation $\delta$ is optimized to produce an adversarial image $\mathbf{x}^{adv}$ from a clean image $\mathbf{x}$ with a specified ROI. A frozen vision encoder extracts layer-wise hidden states, from which ROI and background tokens are identified via patch embedding. BCR enforces semantic continuity by (i) aligning ROI and background statistics, (ii) softly projecting ROI features onto background representations, and (iii) preserving background tokens between clean and adversarial images. A total variation regularizer encourages smooth perturbations. The objective is optimized across multiple transformer layers, while the language model remains frozen and is used only for evaluation.
  • Figure 2: Qualitative comparison of object concealment attacks on InstructBLIP.
  • Figure 3: Failure cases of pixel-space obfuscation methods. When the target object is masked or blurred, the resulting visual artifacts introduce ambiguous signals that vision--language models attempt to explain. Despite the absence of the original object, the model generates hallucinated descriptions containing unrelated entities such as dogs or cows.
  • Figure 4: Sensitivity analysis of targeted transformer layers. Early-layer optimization leaves object semantics largely intact, allowing the model to still recognize the target object. In contrast, targeting deeper layers removes the object semantics while preserving the surrounding scene context, resulting in successful concealment.
  • Figure 5: Additional qualitative comparisons between suppression-based attacks and BCR. While suppression-based methods often introduce hallucinated or semantically unrelated objects, BCR consistently removes the target object while preserving the overall scene structure and contextual elements.