Table of Contents
Fetching ...

Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho

TL;DR

LaPael is introduced, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers that enables diverse and semantically consistent augmentations directly within the model and eliminates the recurring costs of paraphrase generation for each knowledge update.

Abstract

As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sample diversity. To this end, we introduce LaPael, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers. This approach enables diverse and semantically consistent augmentations directly within the model. Furthermore, it eliminates the recurring costs of paraphrase generation for each knowledge update. Our extensive experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches. Additionally, combining LaPael with data-level paraphrasing further enhances performance.

Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

TL;DR

LaPael is introduced, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers that enables diverse and semantically consistent augmentations directly within the model and eliminates the recurring costs of paraphrase generation for each knowledge update.

Abstract

As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sample diversity. To this end, we introduce LaPael, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers. This approach enables diverse and semantically consistent augmentations directly within the model. Furthermore, it eliminates the recurring costs of paraphrase generation for each knowledge update. Our extensive experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches. Additionally, combining LaPael with data-level paraphrasing further enhances performance.

Paper Structure

This paper contains 46 sections, 16 equations, 9 figures, 16 tables.

Figures (9)

  • Figure 1: Effect of paraphrasing data in knowledge injection.
  • Figure 2: A conceptual illustration of the proposed approach. On the left, we show the existing method of knowledge injection by paraphrasing each document for data-level augmentation. On the right, we present the conceptual illustration of LaPael with trained latent paraphrasers. Unlike the method on the left, LaPael can eliminate the need for users to repeatedly paraphrase using LLMs once latent paraphrasers are trained.
  • Figure 3: (a) Illustration of the latent paraphraser. The linear layer embeds each token's latent feature ${\bm{h}}$ into $\bm \mu$. We then sample stochastic noise $\alpha$ from ${\mathcal{N}}(\bm \mu, {\bm{I}})$ and apply a mask $m_t$ to control the scale. (b) Training pipeline of LaPael. To train the latent paraphraser, we estimate the parameters of Gaussian distributions. We then minimize the KL divergence between these distributions to optimize the latent paraphrasers.
  • Figure 4: Effect of the Number of Paraphrases. Each plot shows the relationship between the number of paraphrases (x-axis) and F1 scores (y-axis) in knowledge injection. The F1 scores of both standard fine-tuning and our method improve as the number of paraphrases increases.
  • Figure 5: (a) We conduct experiments varying the size of ${\mathcal{D}}_{\sf train}$ on SQuAD-syn, where $100\%$ indicates 1,000 documents. We report mean and std. over three runs. (b) We conduct experiments on StreamingQA-syn varying the start position of latent paraphrasers where '# layers' denotes the number of latent paraphrasers.
  • ...and 4 more figures