Table of Contents
Fetching ...

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang

TL;DR

This work addresses the challenge of updating large language models with new information while eliminating outdated or incorrect knowledge. It introduces F-Learning, a two-stage paradigm that first forgets old knowledge through parametric subtraction and then learns new knowledge via supervised fine-tuning, formalized by $\theta^{old}_\vartriangle$, $\theta'$, and $\theta^{*}$. Empirically, F-Learning improves knowledge updating over strong baselines (Full-FT, LoRA, FT-c, ROME, MEMIT) on ZsRE and CounterFact, and shows that forgetting with LoRA can approximate full forgetting with lower cost. The method demonstrates favorable locality and adaptability while incurring extra computation for the forgetting stage, offering a practical route for continual learning and controlled unlearning in LLMs. Overall, F-Learning advances knowledge editing by leveraging parametric arithmetic to resolve conflicts between old and new knowledge and to achieve more reliable updates.

Abstract

Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning, simultaneously outperforming the existing baselines in most cases. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can yield a similar effect to subtracting the parameters of full fine-tuning, and occasionally even surpass it significantly.

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

TL;DR

This work addresses the challenge of updating large language models with new information while eliminating outdated or incorrect knowledge. It introduces F-Learning, a two-stage paradigm that first forgets old knowledge through parametric subtraction and then learns new knowledge via supervised fine-tuning, formalized by , , and . Empirically, F-Learning improves knowledge updating over strong baselines (Full-FT, LoRA, FT-c, ROME, MEMIT) on ZsRE and CounterFact, and shows that forgetting with LoRA can approximate full forgetting with lower cost. The method demonstrates favorable locality and adaptability while incurring extra computation for the forgetting stage, offering a practical route for continual learning and controlled unlearning in LLMs. Overall, F-Learning advances knowledge editing by leveraging parametric arithmetic to resolve conflicts between old and new knowledge and to achieve more reliable updates.

Abstract

Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning, simultaneously outperforming the existing baselines in most cases. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can yield a similar effect to subtracting the parameters of full fine-tuning, and occasionally even surpass it significantly.
Paper Structure (32 sections, 8 equations, 7 figures, 5 tables)

This paper contains 32 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Diagram for “Forgetting before Learning”.
  • Figure 2: Objectives of the knowledge updating in large language model.
  • Figure 3: Loss changes of loss maximization and parameter subtraction.
  • Figure 4: Parametric Analysis of Forgetting Old Knowledge by full fine-tuning.
  • Figure 5: Parametric Analysis of Forgetting Old Knowledge by LoRA fine-tuning.
  • ...and 2 more figures