Table of Contents
Fetching ...

Knowledge Boundary Discovery for Large Language Models

Ziquan Wang, Zhongqi Lu

Abstract

We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore the knowledge boundaries of the Large Language Models (LLMs). We define the knowledge boundary by automatically generating two types of questions: (i) those the LLM can confidently answer (within-knowledge boundary) and (ii) those it cannot (beyond-knowledge boundary). Iteratively exploring and exploiting the LLM's responses to find its knowledge boundaries is challenging because of the hallucination phenomenon. To find the knowledge boundaries of an LLM, the agent interacts with the LLM under the modeling of exploring a partially observable environment. The agent generates a progressive question as the action, adopts an entropy reduction as the reward, receives the LLM's response as the observation and updates its belief states. We demonstrate that the KBD detects knowledge boundaries of LLMs by automatically finding a set of non-trivial answerable and unanswerable questions. We validate the KBD by comparing its generated knowledge boundaries with manually crafted LLM benchmark datasets. Experiments show that our KBD-generated question set is comparable to the human-generated datasets. Our approach paves a new way to evaluate LLMs.

Knowledge Boundary Discovery for Large Language Models

Abstract

We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore the knowledge boundaries of the Large Language Models (LLMs). We define the knowledge boundary by automatically generating two types of questions: (i) those the LLM can confidently answer (within-knowledge boundary) and (ii) those it cannot (beyond-knowledge boundary). Iteratively exploring and exploiting the LLM's responses to find its knowledge boundaries is challenging because of the hallucination phenomenon. To find the knowledge boundaries of an LLM, the agent interacts with the LLM under the modeling of exploring a partially observable environment. The agent generates a progressive question as the action, adopts an entropy reduction as the reward, receives the LLM's response as the observation and updates its belief states. We demonstrate that the KBD detects knowledge boundaries of LLMs by automatically finding a set of non-trivial answerable and unanswerable questions. We validate the KBD by comparing its generated knowledge boundaries with manually crafted LLM benchmark datasets. Experiments show that our KBD-generated question set is comparable to the human-generated datasets. Our approach paves a new way to evaluate LLMs.
Paper Structure (23 sections, 9 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 9 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Distribution of entropy values for over 2,000 responses generated by our KBD algorithm across various topics and parameter configurations. The histogram and KDE curve reveal two prominent peaks: one in the low-entropy region ($\leq 40$), and another in the high-entropy region ($\geq 170$). These correspond to within-knowledge and beyond-knowledge boundaries, respectively. Only about 20% of the questions fall into the mid-entropy range ($40 < \text{entropy} < 170$), suggesting that the transition zone (i.e. where the model’s knowledge is ambiguous) is narrow.
  • Figure 2: t-SNE visualization of question embeddings. Blue: answerable questions (within-knowledge boundary) form a central cluster. Red: unanswerable questions (beyond-knowledge boundary) form a surrounding band. Green: random questions are diffusely scattered. This structure supports the non-triviality of KBD-generated samples.
  • Figure 3: The figure illustrates the average entropy over 50 rounds across 1000 episodes for our KBD algorithm, the expert questioning baseline, and the random questioning baseline. It shows that both KBD algorithm and expert questioning effectively explore the LLM’s knowledge boundary. However, the random questioning baseline fails to converge or reach the knowledge boundary.
  • Figure 4: The cumulative reward per episode increases with training episodes and eventually converges. This indicates that KBD is consistently learning to optimize strategies.