Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution
Weiming Ren, Raghav Goyal, Zhiming Hu, Tristan Ty Aumentado-Armstrong, Iqbal Mohomed, Alex Levinshtein
TL;DR
This work takes advantage of multimodal large language models by constructing a prompt that assesses hallucinatory visual elements and generates a "Hallucination Score", which is found that HS is closely aligned with human evaluations, and also provides complementary insights to prior image metrics used for super-resolution (SR) models.
Abstract
Generative super-resolution (GSR) currently sets the state-of-the-art in terms of perceptual image quality, overcoming the "regression-to-the-mean" blur of prior non-generative models. However, from a human perspective, such models do not fully conform to the optimal balance between quality and fidelity. Instead, a different class of artifacts, in which generated details fail to perceptually match the low resolution image (LRI) or ground-truth image (GTI), is a critical but under-studied issue in GSR, limiting its practical deployment. In this work, we focus on measuring, analyzing, and mitigating these artifacts (i.e., "hallucinations"). We observe that hallucinations are not well-characterized with existing image metrics or quality models, as they are orthogonal to both exact fidelity and no-reference quality. Instead, we take advantage of multimodal large language models (MLLMs) by constructing a prompt that assesses hallucinatory visual elements and generates a "Hallucination Score" (HS). We find that HS is closely aligned with human evaluations, and also provides complementary insights to prior image metrics used for super-resolution (SR) models. Finally, we propose a few efficient HS proxies and demonstrate how diffusion-based GSR models can be fine-tuned to mitigate hallucinations, leveraging HS proxies as differentiable reward functions.
