Table of Contents
Fetching ...

GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs

Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh, Arshia Soltani Moakhar, Basim Azam, Soheil Feizi, Naveed Akhtar

TL;DR

The paper tackles object hallucination in multimodal LLMs by introducing GHOST, a fully automatic pipeline that generates hallucination-inducing images through CLIP-embedding optimization, bridged to the target model via a mapper $oldsymbol{Pi}$ and guided diffusion conditioned on the optimized embedding. The method balances inducing the target belief with preserving image realism and absence of the object, using losses such as $oldsymbol{}_{ ext{adv}}$, $oldsymbol{}_{ ext{clip}}$, and $oldsymbol{}_{ ext{reg}}$ in a total objective $oldsymbol{}_{ ext{total}} = oldsymbol{}_{ ext{adv}} + oldsymbol{}_{ ext{clip}} + oldsymbol{}_{ ext{reg}}$. Experiments across Qwen2.5-VL, LLaVA, and GLM-4.1V-Thinking show hallucination success rates around 29% (significantly higher than prior data-driven methods), with strong transferability to GPT-4o at 66.5% and the ability to mitigate through fine-tuning on generated data. GHOST provides a diagnostic and corrective capability, revealing shared vulnerabilities across models and enabling targeted defense against multimodal hallucinations. The approach advances stress-testing of MLLMs and informs robust design of future multimodal systems.

Abstract

Object hallucination in Multimodal Large Language Models (MLLMs) is a persistent failure mode that causes the model to perceive objects absent in the image. This weakness of MLLMs is currently studied using static benchmarks with fixed visual scenarios, which preempts the possibility of uncovering model-specific or unanticipated hallucination vulnerabilities. We introduce GHOST (Generating Hallucinations via Optimizing Stealth Tokens), a method designed to stress-test MLLMs by actively generating images that induce hallucination. GHOST is fully automatic and requires no human supervision or prior knowledge. It operates by optimizing in the image embedding space to mislead the model while keeping the target object absent, and then guiding a diffusion model conditioned on the embedding to generate natural-looking images. The resulting images remain visually natural and close to the original input, yet introduce subtle misleading cues that cause the model to hallucinate. We evaluate our method across a range of models, including reasoning models like GLM-4.1V-Thinking, and achieve a hallucination success rate exceeding 28%, compared to around 1% in prior data-driven discovery methods. We confirm that the generated images are both high-quality and object-free through quantitative metrics and human evaluation. Also, GHOST uncovers transferable vulnerabilities: images optimized for Qwen2.5-VL induce hallucinations in GPT-4o at a 66.5% rate. Finally, we show that fine-tuning on our images mitigates hallucination, positioning GHOST as both a diagnostic and corrective tool for building more reliable multimodal systems.

GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs

TL;DR

The paper tackles object hallucination in multimodal LLMs by introducing GHOST, a fully automatic pipeline that generates hallucination-inducing images through CLIP-embedding optimization, bridged to the target model via a mapper and guided diffusion conditioned on the optimized embedding. The method balances inducing the target belief with preserving image realism and absence of the object, using losses such as , , and in a total objective . Experiments across Qwen2.5-VL, LLaVA, and GLM-4.1V-Thinking show hallucination success rates around 29% (significantly higher than prior data-driven methods), with strong transferability to GPT-4o at 66.5% and the ability to mitigate through fine-tuning on generated data. GHOST provides a diagnostic and corrective capability, revealing shared vulnerabilities across models and enabling targeted defense against multimodal hallucinations. The approach advances stress-testing of MLLMs and informs robust design of future multimodal systems.

Abstract

Object hallucination in Multimodal Large Language Models (MLLMs) is a persistent failure mode that causes the model to perceive objects absent in the image. This weakness of MLLMs is currently studied using static benchmarks with fixed visual scenarios, which preempts the possibility of uncovering model-specific or unanticipated hallucination vulnerabilities. We introduce GHOST (Generating Hallucinations via Optimizing Stealth Tokens), a method designed to stress-test MLLMs by actively generating images that induce hallucination. GHOST is fully automatic and requires no human supervision or prior knowledge. It operates by optimizing in the image embedding space to mislead the model while keeping the target object absent, and then guiding a diffusion model conditioned on the embedding to generate natural-looking images. The resulting images remain visually natural and close to the original input, yet introduce subtle misleading cues that cause the model to hallucinate. We evaluate our method across a range of models, including reasoning models like GLM-4.1V-Thinking, and achieve a hallucination success rate exceeding 28%, compared to around 1% in prior data-driven discovery methods. We confirm that the generated images are both high-quality and object-free through quantitative metrics and human evaluation. Also, GHOST uncovers transferable vulnerabilities: images optimized for Qwen2.5-VL induce hallucinations in GPT-4o at a 66.5% rate. Finally, we show that fine-tuning on our images mitigates hallucination, positioning GHOST as both a diagnostic and corrective tool for building more reliable multimodal systems.

Paper Structure

This paper contains 38 sections, 13 equations, 21 figures, 29 tables.

Figures (21)

  • Figure 1: (Left) All models correctly answer "No" when asked if there is a knife in the image. (Right) GHOST introduces subtle cues, and all models now hallucinate the presence of a knife.
  • Figure 1: GHOST and DASH augustin2025dash results on COCO. “Samples” reflects the size of the input pool each method operates over.
  • Figure 2: (Top) Input images, the MLLM does not hallucinate the target object. (Bottom) GHOST images, the MLLM hallucinates the object, despite its absence being clear to a human observer.
  • Figure 3: a) Overview of GHOST. We optimize only the CLIP embedding, then condition unCLIP on it, see (c) for decoding details. b) Training setup for the MLP, which aligns CLIP embeddings with the MLLM vision encoder using an MSE loss. c) A partially noised latent of the original image is denoised conditioned on the optimized embedding.
  • Figure 4: Optimization steps toward hallucination. We show the model’s Yes/No probabilities for the optimized embedding at each step. Images are then generated by conditioning diffusion on that embedding. As “Yes” confidence increases, misleading cues (e.g., vase-like structures) emerge. Samples flagged by OWLv2 are discarded.
  • ...and 16 more figures