Table of Contents
Fetching ...

MemEIC: A Step Toward Continual and Compositional Knowledge Editing

Jin Seong, Jiyun Park, Wencke Liermann, Hongseok Choi, Yoonji Nam, Hyun Kim, Soojong Lim, Namhoon Lee

TL;DR

MemEIC addresses the challenge of continual and compositional knowledge editing in vision-language models by combining modality-specific external memory with dual internal adapters and a brain-inspired Knowledge Connector that enables cross-modal fusion only when required. The authors introduce the CCKEB benchmark and a CompRel metric to evaluate sequential, multimodal edits, and demonstrate that MemEIC outperforms existing external- and internal-memory editors on both edit retention and compositional reasoning across two LVLM backbones. Key contributions include a practical external memory scheme (Mem-E), a separated internal memory design (Mem-I) to prevent interference, and a transformer-attention–based Knowledge Connector that greatly improves near-oracle multimodal integration. The work suggests that careful memory separation plus selective cross-modal fusion enables robust, scalable continual editing in LVLMs, with strong implications for up-to-date, trustworthy multimodal AI systems.

Abstract

The dynamic nature of information necessitates continuously updating large vision-language models (LVLMs). While recent knowledge editing techniques hint at promising directions, they often focus on editing a single modality (vision or language) in isolation. This prevalent practice neglects the inherent multimodality of LVLMs and the continuous nature of knowledge updates, potentially leading to suboptimal editing outcomes when considering the interplay between modalities and the need for ongoing knowledge refinement. To address these limitations, we propose MemEIC, a novel method for Continual and Compositional Knowledge Editing (CCKE) in LVLMs. MemEIC enables compositional editing of both visual and textual knowledge sequentially. Our approach employs a hybrid external-internal editor featuring a dual external memory for cross-modal evidence retrieval and dual LoRA adapters that facilitate disentangled parameter updates for each modality. A key component is a brain-inspired knowledge connector, activated selectively for compositional reasoning, that integrates information across different modalities. Experiments demonstrate that MemEIC significantly improves performance on complex multimodal questions and effectively preserves prior edits, setting a new benchmark for CCKE in LVLMs.

MemEIC: A Step Toward Continual and Compositional Knowledge Editing

TL;DR

MemEIC addresses the challenge of continual and compositional knowledge editing in vision-language models by combining modality-specific external memory with dual internal adapters and a brain-inspired Knowledge Connector that enables cross-modal fusion only when required. The authors introduce the CCKEB benchmark and a CompRel metric to evaluate sequential, multimodal edits, and demonstrate that MemEIC outperforms existing external- and internal-memory editors on both edit retention and compositional reasoning across two LVLM backbones. Key contributions include a practical external memory scheme (Mem-E), a separated internal memory design (Mem-I) to prevent interference, and a transformer-attention–based Knowledge Connector that greatly improves near-oracle multimodal integration. The work suggests that careful memory separation plus selective cross-modal fusion enables robust, scalable continual editing in LVLMs, with strong implications for up-to-date, trustworthy multimodal AI systems.

Abstract

The dynamic nature of information necessitates continuously updating large vision-language models (LVLMs). While recent knowledge editing techniques hint at promising directions, they often focus on editing a single modality (vision or language) in isolation. This prevalent practice neglects the inherent multimodality of LVLMs and the continuous nature of knowledge updates, potentially leading to suboptimal editing outcomes when considering the interplay between modalities and the need for ongoing knowledge refinement. To address these limitations, we propose MemEIC, a novel method for Continual and Compositional Knowledge Editing (CCKE) in LVLMs. MemEIC enables compositional editing of both visual and textual knowledge sequentially. Our approach employs a hybrid external-internal editor featuring a dual external memory for cross-modal evidence retrieval and dual LoRA adapters that facilitate disentangled parameter updates for each modality. A key component is a brain-inspired knowledge connector, activated selectively for compositional reasoning, that integrates information across different modalities. Experiments demonstrate that MemEIC significantly improves performance on complex multimodal questions and effectively preserves prior edits, setting a new benchmark for CCKE in LVLMs.

Paper Structure

This paper contains 87 sections, 28 equations, 8 figures, 17 tables.

Figures (8)

  • Figure 1: Description of Compositional Edit Task and Sequential Editing
  • Figure 2: Overall Structure of MemEIC for Continual and Compositional Edit
  • Figure 3: Training Stage and Testing
  • Figure 4: Compositional reliability of five editing variants on LLaVA-1.5 (7B) on CCKEB testset.
  • Figure 5: Activation Visualization of Knowledge Separation
  • ...and 3 more figures