Focus-LIME: Surgical Interpretation of Long-Context Large Language Models via Proxy-Based Neighborhood Selection
Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang
TL;DR
This work tackles the challenge of faithful, word-level explanations for long-context LLMs under realistic query budgets. It introduces Focus-LIME, a two-phase, proxy-guided framework that first scouts an active neighborhood with a cheaper proxy model and then performs fine-grained attribution on the target model within that subspace. The approach formalizes the cost-fidelity trade-off, defines an active neighborhood S, and uses a Phase I/Phase II pipeline to preserve fidelity while reducing the perturbation space. Empirical results on CUAD and Qasper show that Focus-LIME achieves higher faithfulness (AOPC) than standard LIME and proxy baselines, while remaining efficient and robust to proxy choice; alignment with human evidence is demonstrated via Recall@k on expert annotations. Overall, Focus-LIME enables practical, surgical explanations for long documents in domains like law and science, improving trust and verifiability in high-stakes applications.
Abstract
As Large Language Models (LLMs) scale to handle massive context windows, achieving surgical feature-level interpretation is essential for high-stakes tasks like legal auditing and code debugging. However, existing local model-agnostic explanation methods face a critical dilemma in these scenarios: feature-based methods suffer from attribution dilution due to high feature dimensionality, thus failing to provide faithful explanations. In this paper, we propose Focus-LIME, a coarse-to-fine framework designed to restore the tractability of surgical interpretation. Focus-LIME utilizes a proxy model to curate the perturbation neighborhood, allowing the target model to perform fine-grained attribution exclusively within the optimized context. Empirical evaluations on long-context benchmarks demonstrate that our method makes surgical explanations practicable and provides faithful explanations to users.
