Table of Contents
Fetching ...

UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing

Yijun Yang, Jie He, Pinzhen Chen, Víctor Gutiérrez-Basulto, Jeff Z. Pan

TL;DR

This work addresses biases in extracting factual knowledge from pretrained LMs by framing the probing objective probabilistically and uncovering two core biases: object-likelihood bias and template-prior bias. It introduces UniArk, an adapter-based, bias-mitigating framework with two modules—max-entropy regularization and self-data augmentation—to improve generalisation to unseen prompts while maintaining accuracy. To evaluate generalisation and bias, the authors construct ParaTrex, a large, diverse paraphrase dataset, and show that UniArk yields significant improvements in out-of-domain performance and consistency across paraphrases, outperforming baselines like adapters and MeCoD. The provisioning of ParaTrex as a benchmarking resource, along with a scalable, modular debiasing approach, offers a practical pathway for robust factual knowledge extraction in real-world, paraphrase-rich settings.

Abstract

Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model's out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct ParaTrex, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models.

UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing

TL;DR

This work addresses biases in extracting factual knowledge from pretrained LMs by framing the probing objective probabilistically and uncovering two core biases: object-likelihood bias and template-prior bias. It introduces UniArk, an adapter-based, bias-mitigating framework with two modules—max-entropy regularization and self-data augmentation—to improve generalisation to unseen prompts while maintaining accuracy. To evaluate generalisation and bias, the authors construct ParaTrex, a large, diverse paraphrase dataset, and show that UniArk yields significant improvements in out-of-domain performance and consistency across paraphrases, outperforming baselines like adapters and MeCoD. The provisioning of ParaTrex as a benchmarking resource, along with a scalable, modular debiasing approach, offers a practical pathway for robust factual knowledge extraction in real-world, paraphrase-rich settings.

Abstract

Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model's out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct ParaTrex, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models.
Paper Structure (30 sections, 7 equations, 6 figures, 12 tables)

This paper contains 30 sections, 7 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Illustration of the inherent objectives' bias from the template prior and template verbalization, with a comparison to our UniArk framework.
  • Figure 2: Average pair-wise BLEU between all relations comparison with ParaRel. ParaTrex gets a consistently lower score than ParaRel, representing that the templates in ParaTrex are more lexically and syntactically diverse.
  • Figure 3: The cosine similarity of the embedding between the grounding template and the paraphrased template. The boxplot shows the comparison between the random paraphrase sampled from other relations and the paraphrase in our dataset for 39 relations.
  • Figure 4: Workflow to generate a paraphrased version of prompt templates in ParaTrex. We exemplify it for the relation 'capital of’ in LAMA.
  • Figure 5: Sscaling results between adapters and UniArk with different scales of models.
  • ...and 1 more figures