Table of Contents
Fetching ...

TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

Kejia Zhang, Keda Tao, Zhiming Luo, Chang Liu, Jiasheng Tang, Huan Wang

Abstract

Multimodal large language models (MLLMs) are prone to hallucinations, generating plausible but visually ungrounded outputs, partly because direct preference optimization (DPO) overfits to superficial linguistic cues under static preference supervision. We propose TARS, a token-adaptive preference strategy that reformulates DPO as a principled min-max optimization problem. The inner maximization selectively perturbs visual-agnostic tokens to induce worst-case distributional shifts, while the outer minimization enforces alignment with causal visual signals rather than surface-level patterns. A novel spectral alignment loss further regularizes hidden representations in the frequency domain via the Fast Fourier Transform (FFT), preserving global semantic structure without rigid token-level correspondence. We evaluate TARS across multiple hallucination benchmarks. Using only 4.8k preference samples without expert feedback, TARS reduces hallucination rates from 26.4\% to 13.2\% and cognition scores from 2.5 to 0.4, outperforming standard DPO by a large margin. Notably, TARS surpasses $5\times$ LLM-based data augmentation trained on 28.8k samples (Hal-Rate: 16.0\% vs.\ 13.2\%), demonstrating that reshaping the optimization landscape via adversarial token perturbation is fundamentally more effective than scaling training data. TARS further narrows the gap with GPT-4o on key metrics.

TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

Abstract

Multimodal large language models (MLLMs) are prone to hallucinations, generating plausible but visually ungrounded outputs, partly because direct preference optimization (DPO) overfits to superficial linguistic cues under static preference supervision. We propose TARS, a token-adaptive preference strategy that reformulates DPO as a principled min-max optimization problem. The inner maximization selectively perturbs visual-agnostic tokens to induce worst-case distributional shifts, while the outer minimization enforces alignment with causal visual signals rather than surface-level patterns. A novel spectral alignment loss further regularizes hidden representations in the frequency domain via the Fast Fourier Transform (FFT), preserving global semantic structure without rigid token-level correspondence. We evaluate TARS across multiple hallucination benchmarks. Using only 4.8k preference samples without expert feedback, TARS reduces hallucination rates from 26.4\% to 13.2\% and cognition scores from 2.5 to 0.4, outperforming standard DPO by a large margin. Notably, TARS surpasses LLM-based data augmentation trained on 28.8k samples (Hal-Rate: 16.0\% vs.\ 13.2\%), demonstrating that reshaping the optimization landscape via adversarial token perturbation is fundamentally more effective than scaling training data. TARS further narrows the gap with GPT-4o on key metrics.

Paper Structure

This paper contains 39 sections, 19 equations, 9 figures, 15 tables, 1 algorithm.

Figures (9)

  • Figure 1: Left: We present TARS, a token-adaptive preference strategy for mitigating hallucinations in MLLMs. TARS reformulates direct preference optimization (DPO) as a principled min-max optimization objective: (1) minimizes behavioral misalignment via structured preference feedback supervision and (2) maximizes distributional adaptability through controlled perturbations of visual-agnostic tokens. Right: Comprehensive evaluation on LLaVA-v1.5-13B with preference optimization (PO) llava_origin and various state-of-the-art MLLMs under the AMBER benchmark wang2023amber shows that TARS consistently surpasses PO baselines, yielding results competitive with GPT-4o hurst2024gpt.
  • Figure 2: Motivation illustration for TARS. (a) and (b) illustrate standard DPO and our token-adaptive perturbation strategy. (c) shows a VQA example where DPO hallucinates, while TARS effectively avoids ungrounded output. (d) and (e) visualize token-to-query attention maps during autoregressive decoding. DPO over-attends to spurious tokens, while TARS attends to causally grounded visual-semantic cues.
  • Figure 2: Ablation of token-level perturbation (TP), cross-modal alignment score (CAS), and spectral preference alignment (SPA).
  • Figure 3: Overview of TARS. TARS reformulates preference optimization as a Min--Max problem: (1) The maximization branch perturbs visual-agnostic tokens to simulate semantically shifted contexts (red dashed box); (2) The minimization branch fine-tunes the model to align with human preferences via the DPO objective (purple dashed box). TARS encourages the model to attend to causally grounded visual signals rather than spurious correlations, thereby reducing hallucinations.
  • Figure 4: Comparison of average scores across question categories on MMHal. TARS achieves consistently higher scores, demonstrating stronger visual grounding.
  • ...and 4 more figures