Localizing and Editing Knowledge in Large Audio-Language Models

Sung Kyun Chung; Jiaheng Dong; Qiuchi Hu; Gongping Huang; Hong Jia; Ting Dang

Localizing and Editing Knowledge in Large Audio-Language Models

Sung Kyun Chung, Jiaheng Dong, Qiuchi Hu, Gongping Huang, Hong Jia, Ting Dang

Abstract

Large Audio-Language Models (LALMs) have shown strong performance in speech understanding, making speech a natural interface for accessing factual information. Yet they are trained on static corpora and may encode incorrect facts. Existing model editing methods localize and update facts in text-only LLMs, but do not account for continuous speech representations, or where knowledge is stored across acoustic or language modules, or their cross-modal module. We construct the first audio benchmark for knowledge localization and editing in LALMs and propose a speech-driven locate-then-edit framework. First, we use speech-aware causal tracing to localize layers and modules that support factual retrieval and then apply editing at identified sites. Experiments show that factual knowledge is jointly encoded in audio and text modules, and that audio editing yields more effective updates than text editing or fine-tuning, enabling fine-grained knowledge control in speech AI systems.

Localizing and Editing Knowledge in Large Audio-Language Models

Abstract

Paper Structure (10 sections, 9 equations, 1 figure, 2 tables)

This paper contains 10 sections, 9 equations, 1 figure, 2 tables.

Method
Locate via Causal Tracing
Model Editing
Experimental Setup
Results
Causal Tracing
Performance Comparison of Model Editing
Layer and Word Selection Analysis
Conclusion
Generative AI Use Disclosure

Figures (1)

Figure 2: Layer-wise Average Indirect Effect (AIE) of hidden states (left), MLP layers (middle), and attention modules (right) for the CounterFact (top) and Known-1000 (bottom) datasets. The dashed vertical line separates the audio encoder layers (left) from the LLM text decoder layers (right).

Localizing and Editing Knowledge in Large Audio-Language Models

Abstract

Localizing and Editing Knowledge in Large Audio-Language Models

Authors

Abstract

Table of Contents

Figures (1)