Table of Contents
Fetching ...

Membership Inference Attacks against Large Audio Language Models

Jia-Kai Dong, Yu-Xiang Lin, Hung-Yi Lee

Abstract

We present the first systematic Membership Inference Attack (MIA) evaluation of Large Audio Language Models (LALMs). As audio encodes non-semantic information, it induces severe train and test distribution shifts and can lead to spurious MIA performance. Using a multi-modal blind baseline based on textual, spectral, and prosodic features, we demonstrate that common speech datasets exhibit near-perfect train/test separability (AUC approximately 1.0) even without model inference, and the standard MIA scores strongly correlate with these blind acoustic artifacts (correlation greater than 0.7). Using this blind baseline, we identify that distribution-matched datasets enable reliable MIA evaluation without distribution shift confounds. We benchmark multiple MIA methods and conduct modality disentanglement experiments on these datasets. The results reveal that LALM memorization is cross-modal, arising only from binding a speaker's vocal identity with its text. These findings establish a principled standard for auditing LALMs beyond spurious correlations.

Membership Inference Attacks against Large Audio Language Models

Abstract

We present the first systematic Membership Inference Attack (MIA) evaluation of Large Audio Language Models (LALMs). As audio encodes non-semantic information, it induces severe train and test distribution shifts and can lead to spurious MIA performance. Using a multi-modal blind baseline based on textual, spectral, and prosodic features, we demonstrate that common speech datasets exhibit near-perfect train/test separability (AUC approximately 1.0) even without model inference, and the standard MIA scores strongly correlate with these blind acoustic artifacts (correlation greater than 0.7). Using this blind baseline, we identify that distribution-matched datasets enable reliable MIA evaluation without distribution shift confounds. We benchmark multiple MIA methods and conduct modality disentanglement experiments on these datasets. The results reveal that LALM memorization is cross-modal, arising only from binding a speaker's vocal identity with its text. These findings establish a principled standard for auditing LALMs beyond spurious correlations.

Paper Structure

This paper contains 18 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of our proposed 3-phase LALM privacy auditing framework. (1) Multi-Modal Blind Baseline: A model-agnostic classifier first identifies and filters out samples with confounding distributional shifts to produce a "clean dataset." (2) MIA Auditing: The target LALM is then audited on this clean dataset using a suite of membership indicators derived from a 2-stage generation process. (3) Modality Disentanglement: Finally, we systematically probe any detected memorization to characterize its nature, specifically identifying cross-modal binding.
  • Figure 2: Cross-dataset membership inference heatmap. Diagonal cells represent standard intra-dataset MIA (Train vs. Test). Off-diagonal cells represent membership-neutral pairings (e.g., Dataset A Train vs. Dataset B Train), where near-perfect AUCs reflect domain shifts rather than memorization.