Table of Contents
Fetching ...

LargePiG: Your Large Language Model is Secretly a Pointer Generator

Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu

TL;DR

An effective way to separate content from form in LLM-generated queries is proposed, which preserves the factual knowledge extracted and integrated from the inputs and compiles the syntactic structure, including function words, using the powerful linguistic capabilities of the LLM.

Abstract

Recent research on query generation has focused on using Large Language Models (LLMs), which despite bringing state-of-the-art performance, also introduce issues with hallucinations in the generated queries. In this work, we introduce relevance hallucination and factuality hallucination as a new typology for hallucination problems brought by query generation based on LLMs. We propose an effective way to separate content from form in LLM-generated queries, which preserves the factual knowledge extracted and integrated from the inputs and compiles the syntactic structure, including function words, using the powerful linguistic capabilities of the LLM. Specifically, we introduce a model-agnostic and training-free method that turns the Large Language Model into a Pointer-Generator (LargePiG), where the pointer attention distribution leverages the LLM's inherent attention weights, and the copy probability is derived from the difference between the vocabulary distribution of the model's high layers and the last layer. To validate the effectiveness of LargePiG, we constructed two datasets for assessing the hallucination problems in query generation, covering both document and video scenarios. Empirical studies on various LLMs demonstrated the superiority of LargePiG on both datasets. Additional experiments also verified that LargePiG could reduce hallucination in large vision language models and improve the accuracy of document-based question-answering and factuality evaluation tasks.

LargePiG: Your Large Language Model is Secretly a Pointer Generator

TL;DR

An effective way to separate content from form in LLM-generated queries is proposed, which preserves the factual knowledge extracted and integrated from the inputs and compiles the syntactic structure, including function words, using the powerful linguistic capabilities of the LLM.

Abstract

Recent research on query generation has focused on using Large Language Models (LLMs), which despite bringing state-of-the-art performance, also introduce issues with hallucinations in the generated queries. In this work, we introduce relevance hallucination and factuality hallucination as a new typology for hallucination problems brought by query generation based on LLMs. We propose an effective way to separate content from form in LLM-generated queries, which preserves the factual knowledge extracted and integrated from the inputs and compiles the syntactic structure, including function words, using the powerful linguistic capabilities of the LLM. Specifically, we introduce a model-agnostic and training-free method that turns the Large Language Model into a Pointer-Generator (LargePiG), where the pointer attention distribution leverages the LLM's inherent attention weights, and the copy probability is derived from the difference between the vocabulary distribution of the model's high layers and the last layer. To validate the effectiveness of LargePiG, we constructed two datasets for assessing the hallucination problems in query generation, covering both document and video scenarios. Empirical studies on various LLMs demonstrated the superiority of LargePiG on both datasets. Additional experiments also verified that LargePiG could reduce hallucination in large vision language models and improve the accuracy of document-based question-answering and factuality evaluation tasks.

Paper Structure

This paper contains 27 sections, 10 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: The architecture of the proposed plug-in and training-free method LargePiG. Pointer Attention Distribution (§ \ref{['sec:LargePiG_AD']}) from the LLM's self-attention weights, Vocabulary Distribution (§ \ref{['sec:LargePiG_VD']}) from the output of the original LLM, Copy Probability (§ \ref{['sec:LargePiG_CP']}) from the difference between the vocabulary distribution of the model’s high layers and the last layer.
  • Figure 2: Left: Semantic similarity win rate of Qwen1.5-7B-Chat with LargePiG vs without LargePiG on TruthfulVQG. Right: Performance of Qwen1.5-7B-Chat vs Qwen1.5-7B-Chat + LargePiG on SQuAD.
  • Figure 3: Examples of query generation in real applications across different short video platforms, each of which has hundreds of millions of users.
  • Figure 4: An example of two video covers mapped to tokens, where we have ignored other irrelevant words and the "_" character before some tokens.
  • Figure 5: Results of LLaMA2-7B-Chat without LargePiG vs with LargePiG on TruthfulDQG. Left: Overall semantic similarity scores. Right: Win rate with LargePiG compared against without LargePiG.
  • ...and 1 more figures