Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs

Xiaoyu Yang; Jie Lu; En Yu

Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs

Xiaoyu Yang, Jie Lu, En Yu

TL;DR

This work addresses concept drift in distillation from multiple drifting MLLMs by establishing a theoretical link between drift and multi-teacher ND in the KD process. It proposes autonomous preference optimization (APO) within a learn-compare-critique framework to learn from all teachers while suppressing drift-induced biases, including a formal multi-stream drift model and a KL-based concept-alignment step. A new large-scale chest X-ray reasoning dataset, CXR-MAX, collects 170,982 reasoning trajectories from seven MLLMs on MIMIC-CXR to study multi-teacher dynamics. Empirical results show APO improves consistency, robustness, and generalization, achieving a Top-1 accuracy of $0.76$ on MS-CXR-T (≈13% above the best baseline) and significant gains in diagnostic report generation metrics, while ablations confirm the centrality of APO in mitigating drift. The work advances drift-aware KD for domain-specific multimodal reasoning and provides public data and code to spur further research.

Abstract

This paper identifies a critical yet underexplored challenge in distilling from multimodal large language models (MLLMs): the reasoning trajectories generated by multiple drifting teachers exhibit concept drift, whereby their reasoning distributions evolve unpredictably and transmit biases to the student model, ultimately compromising its performance. To tackle this issue, we pioneer a theoretical connection between concept drift and knowledge distillation, casting the non-stationary reasoning dynamics from multiple MLLM teachers as next-token prediction of multi-stream reasoning trajectories.Guided by concept drift, we introduce the "learn, compare, critique" paradigm, culminating in autonomous preference optimization (APO). Under the active guidance of the teachers, the student model first learns and self-distils preferred thinking by comparing multiple teachers. It then engages in critical reflection over the drifting inference from teachers, performing concept alignment through APO, ultimately yielding a robust, consistent, and generalizable model.Extensive experiments demonstrate our superior performance of consistency, robustness and generalization within knowledge distillation. Besides, we also contributed a large-scale dataset, CXR-MAX (Multi-teachers Alignment X-rays), comprising 170,982 distilled reasoning trajectories derived from publicly accessible MLLMs based on MIMIC-CXR. Our code and data are public at: https://anonymous.4open.science/r/Autonomous-Distillation/.

Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs

TL;DR

Abstract

Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)

Theorems & Definitions (2)