Table of Contents
Fetching ...

Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework

Yukun Zhang, Qi Dong

TL;DR

MSMA presents an information-geometric framework that decomposes LLM representations into three semantic manifolds—local $\mathcal{M}_L$, intermediate $\mathcal{M}_I$, and global $\mathcal{M}_G$—and learns cross-scale mappings that preserve geometry and information. By formalizing mappings $f_{GI}$ and $f_{IL}$ under principles of geometric preservation, information fidelity, and curvature regularity, and optimizing $\mathcal{L}_{\text{total}}=\lambda_{geo}\mathcal{L}_{geo}+\lambda_{info}\mathcal{L}_{info}+\lambda_{curv}\mathcal{L}_{curv}$ with MINE-based mutual information estimates, MSMA achieves near-perfect alignment across GPT-2, BERT, RoBERTa, and T5 (e.g., $99\%$ KL reduction and $5$–$7\times$ MI gains). Empirically, MSMA reveals a robust three-scale hierarchy, shows architecture-dependent cross-scale effects when intervening at specific scales (altering lexical diversity, sentence structure, or discourse coherence), and enables targeted control for bias mitigation and robust generation. The framework integrates geometry and information theory to illuminate cross-scale information flow and offers practical knobs for controllable generation in transparent, trustworthy AI systems. The work advances interpretability by linking representational geometry to functional behavior across scales and provides a principled path toward scale-specific editing and safety enhancements.

Abstract

We present Multi-Scale Manifold Alignment(MSMA), an information-geometric framework that decomposes LLM representations into local, intermediate, and global manifolds and learns cross-scale mappings that preserve geometry and information. Across GPT-2, BERT, RoBERTa, and T5, we observe consistent hierarchical patterns and find that MSMA improves alignment metrics under multiple estimators (e.g., relative KL reduction and MI gains with statistical significance across seeds). Controlled interventions at different scales yield distinct and architecture-dependent effects on lexical diversity, sentence structure, and discourse coherence. While our theoretical analysis relies on idealized assumptions, the empirical results suggest that multi-objective alignment offers a practical lens for analyzing cross-scale information flow and guiding representation-level control.

Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework

TL;DR

MSMA presents an information-geometric framework that decomposes LLM representations into three semantic manifolds—local , intermediate , and global —and learns cross-scale mappings that preserve geometry and information. By formalizing mappings and under principles of geometric preservation, information fidelity, and curvature regularity, and optimizing with MINE-based mutual information estimates, MSMA achieves near-perfect alignment across GPT-2, BERT, RoBERTa, and T5 (e.g., KL reduction and MI gains). Empirically, MSMA reveals a robust three-scale hierarchy, shows architecture-dependent cross-scale effects when intervening at specific scales (altering lexical diversity, sentence structure, or discourse coherence), and enables targeted control for bias mitigation and robust generation. The framework integrates geometry and information theory to illuminate cross-scale information flow and offers practical knobs for controllable generation in transparent, trustworthy AI systems. The work advances interpretability by linking representational geometry to functional behavior across scales and provides a principled path toward scale-specific editing and safety enhancements.

Abstract

We present Multi-Scale Manifold Alignment(MSMA), an information-geometric framework that decomposes LLM representations into local, intermediate, and global manifolds and learns cross-scale mappings that preserve geometry and information. Across GPT-2, BERT, RoBERTa, and T5, we observe consistent hierarchical patterns and find that MSMA improves alignment metrics under multiple estimators (e.g., relative KL reduction and MI gains with statistical significance across seeds). Controlled interventions at different scales yield distinct and architecture-dependent effects on lexical diversity, sentence structure, and discourse coherence. While our theoretical analysis relies on idealized assumptions, the empirical results suggest that multi-objective alignment offers a practical lens for analyzing cross-scale information flow and guiding representation-level control.

Paper Structure

This paper contains 69 sections, 5 theorems, 28 equations, 4 figures, 8 tables.

Key Result

Theorem 3.1

Assume mappings $f_{GI}, f_{IL}$ are Lipschitz continuous with constants $L_1, L_2$. If geometric and information errors satisfy $\varepsilon_{\text{geo}}, \varepsilon_{\text{info}}$, then: where $C$ depends on manifold dimension, Lipschitz constants, and curvature bounds.

Figures (4)

  • Figure 1: Multi-Scale Manifold Alignment Framework
  • Figure 2: Comprehensive attention profile analysis for four Transformer models.
  • Figure 3: Comparative analysis of information metrics.
  • Figure 4: Layerwise probing confirms specialization of scales.

Theorems & Definitions (9)

  • Theorem 3.1: Alignment Error Bound
  • Theorem 3.2: Information Bottleneck Property
  • Theorem 3.3: Local Convergence
  • Definition E.1.1: Statistical Manifold
  • Definition E.1.2: Fisher Information Matrix
  • Lemma E.1.1: KL-Fisher Relationship
  • proof
  • Theorem E.1: Alignment Error Bound
  • proof