Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Yida Mu; Peizhen Bai; Kalina Bontcheva; Xingyi Song

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Yida Mu, Peizhen Bai, Kalina Bontcheva, Xingyi Song

TL;DR

The paper tackles granularity adherence and hallucination in LLM-based topic modelling by introducing TopicMistral, a fine-tuned Mistral-7B model trained via Direct Preference Optimisation (DPO) and a reconstruction pipeline to generate accepted and rejected topics without human labels. It demonstrates that prompt-based controls of granularity offer limited gains, while the DPO-based fine-tuning and data reconstruction substantially improve topic coherence, reduce near-duplicates, and decrease hallucinations, including under adversarial prompts. The authors provide a plug-in evaluation framework with metrics for unique topic counts, inter-topic similarity, and mutual information with human labels, and show that TopicMistral generally outperforms off-the-shelf LLMs and competitive baselines, with better generalisation to out-of-distribution data. The work highlights practical implications for robust, scalable LLM-driven topic modelling and offers dynamic seed topic strategies to adapt to new corpora, supported by attention analysis that sheds light on the mechanisms behind hallucination mitigation.

Abstract

Large language models (LLMs) with their strong zero-shot topic extraction capabilities offer an alternative to probabilistic topic modelling and closed-set topic classification approaches. As zero-shot topic extractors, LLMs are expected to understand human instructions to generate relevant and non-hallucinated topics based on the given documents. However, LLM-based topic modelling approaches often face difficulties in generating topics with adherence to granularity as specified in human instructions, often resulting in many near-duplicate topics. Furthermore, methods for addressing hallucinated topics generated by LLMs have not yet been investigated. In this paper, we focus on addressing the issues of topic granularity and hallucinations for better LLM-based topic modelling. To this end, we introduce a novel approach that leverages Direct Preference Optimisation (DPO) to fine-tune open-source LLMs, such as Mistral-7B. Our approach does not rely on traditional human annotation to rank preferred answers but employs a reconstruction pipeline to modify raw topics generated by LLMs, thus enabling a fast and efficient training and inference framework. Comparative experiments show that our fine-tuning approach not only significantly improves the LLM's capability to produce more coherent, relevant, and precise topics, but also reduces the number of hallucinated topics.

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

TL;DR

Abstract

Paper Structure (44 sections, 5 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 44 sections, 5 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Related Work
LLM-Assisted Topic Modelling
LLM-based Topic Modelling
Methodology
Controlling Topic Granularity Though Prompts
Fine-tuning LLMs Towards to Topic Modelling
DPO Fine-tuning Details
DPO Framework
Developing Accepted ($y_a$) and Rejected Topics ($y_r$)
Rejected Topics ($y_r$)
Experimental Settings
Datasets and Splits
Fine-tuning Sets
Test Sets for Topic Granularity
...and 29 more sections

Figures (4)

Figure 1: Four real-world examples consist of the given document (grey), user prompt (blue), and issues associated with LLM-generated topics (see legends with different colours). Examples (a) and (b) demonstrate issues with inconsistent naming; i.e., LLMs tend to generate topics with different formats. Moreover, when prompting LLMs to generate topics related to 'hard disk' topics given an unrelated document (examples (c), we observe that LLMs might generate either hallucinated (i.e., 'Harddisks') or unwanted topics (i.e., 'electronics'). Note that we prompt LLMs to return 'No related topics' if there are no related topics in the given text.
Figure 2: An example prompt used in our work: Text enclosed by the special tokens '[/INST]' denotes the user instruction; red and blue colours denote the Granularity Description and Seed Topics, respectively.
Figure 3: Our DPO fine-tuning framework. The 'reconstruction pipeline' (in purple) denotes the approaches we used to modify the original output from LLMs. We introduce the details of the reconstruction pipeline in Section \ref{['reconstra_pipeline']}.
Figure 4: Average attention weight shift on Mistral-7B and TopicMistral. The x-axis represents the number of hidden layers and y-axis indicates the average attention weight from topic instruction to the next prediction token.

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

TL;DR

Abstract

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Authors

TL;DR

Abstract

Table of Contents

Figures (4)