Table of Contents
Fetching ...

RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

Isaac Picov, Ritesh Goru

TL;DR

RoPE-LIME tackles the challenge of explaining outputs from closed-source LLMs without incurring high API costs. It replaces repeated closed-model queries with a small open-source surrogate that uses probability-based targets, such as $\mathrm{NLL}$ and $\mathrm{KL}$ divergence, and a Sparse-K perturbation strategy guided by RWMD in RoPE space to preserve locality. The approach yields stable, scalable token-level attributions across long contexts and outperforms prior methods like gSMILE on benchmarks such as MMLU and HotpotQA, while substantially reducing API usage. This work enables faithful, practical interpretability for black-box LLMs in real-world settings.

Abstract

Explaining closed-source LLM outputs is challenging because API access prevents gradient-based attribution, while perturbation methods are costly and noisy when they depend on regenerated text. We introduce RoPE-LIME, an open-source extension of gSMILE that decouples reasoning from explanation: given a fixed output from a closed model, a smaller open-source surrogate computes token-level attributions from probability-based objectives (negative log-likelihood and divergence targets) under input perturbations. RoPE-LIME incorporates (i) a locality kernel based on Relaxed Word Mover's Distance computed in RoPE embedding space for stable similarity under masking, and (ii) Sparse-K sampling, an efficient perturbation strategy that improves interaction coverage under limited budgets. Experiments on HotpotQA (sentence features) and a hand-labeled MMLU subset (word features) show that RoPE-LIME produces more informative attributions than leave-one-out sampling and improves over gSMILE while substantially reducing closed-model API calls.

RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

TL;DR

RoPE-LIME tackles the challenge of explaining outputs from closed-source LLMs without incurring high API costs. It replaces repeated closed-model queries with a small open-source surrogate that uses probability-based targets, such as and divergence, and a Sparse-K perturbation strategy guided by RWMD in RoPE space to preserve locality. The approach yields stable, scalable token-level attributions across long contexts and outperforms prior methods like gSMILE on benchmarks such as MMLU and HotpotQA, while substantially reducing API usage. This work enables faithful, practical interpretability for black-box LLMs in real-world settings.

Abstract

Explaining closed-source LLM outputs is challenging because API access prevents gradient-based attribution, while perturbation methods are costly and noisy when they depend on regenerated text. We introduce RoPE-LIME, an open-source extension of gSMILE that decouples reasoning from explanation: given a fixed output from a closed model, a smaller open-source surrogate computes token-level attributions from probability-based objectives (negative log-likelihood and divergence targets) under input perturbations. RoPE-LIME incorporates (i) a locality kernel based on Relaxed Word Mover's Distance computed in RoPE embedding space for stable similarity under masking, and (ii) Sparse-K sampling, an efficient perturbation strategy that improves interaction coverage under limited budgets. Experiments on HotpotQA (sentence features) and a hand-labeled MMLU subset (word features) show that RoPE-LIME produces more informative attributions than leave-one-out sampling and improves over gSMILE while substantially reducing closed-model API calls.
Paper Structure (14 sections, 10 equations, 2 figures, 8 tables)

This paper contains 14 sections, 10 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: gSMILE (gpt-4o-mini) vs RoPE-LIME (Qwen-8B)
  • Figure 2: RoPE-LIME evaluated on longer contexts