RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

Isaac Picov; Ritesh Goru

RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

Isaac Picov, Ritesh Goru

TL;DR

RoPE-LIME tackles the challenge of explaining outputs from closed-source LLMs without incurring high API costs. It replaces repeated closed-model queries with a small open-source surrogate that uses probability-based targets, such as $\mathrm{NLL}$ and $\mathrm{KL}$ divergence, and a Sparse-K perturbation strategy guided by RWMD in RoPE space to preserve locality. The approach yields stable, scalable token-level attributions across long contexts and outperforms prior methods like gSMILE on benchmarks such as MMLU and HotpotQA, while substantially reducing API usage. This work enables faithful, practical interpretability for black-box LLMs in real-world settings.

Abstract

Explaining closed-source LLM outputs is challenging because API access prevents gradient-based attribution, while perturbation methods are costly and noisy when they depend on regenerated text. We introduce RoPE-LIME, an open-source extension of gSMILE that decouples reasoning from explanation: given a fixed output from a closed model, a smaller open-source surrogate computes token-level attributions from probability-based objectives (negative log-likelihood and divergence targets) under input perturbations. RoPE-LIME incorporates (i) a locality kernel based on Relaxed Word Mover's Distance computed in RoPE embedding space for stable similarity under masking, and (ii) Sparse-K sampling, an efficient perturbation strategy that improves interaction coverage under limited budgets. Experiments on HotpotQA (sentence features) and a hand-labeled MMLU subset (word features) show that RoPE-LIME produces more informative attributions than leave-one-out sampling and improves over gSMILE while substantially reducing closed-model API calls.

RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

TL;DR

and

divergence, and a Sparse-K perturbation strategy guided by RWMD in RoPE space to preserve locality. The approach yields stable, scalable token-level attributions across long contexts and outperforms prior methods like gSMILE on benchmarks such as MMLU and HotpotQA, while substantially reducing API usage. This work enables faithful, practical interpretability for black-box LLMs in real-world settings.

Abstract

Paper Structure (14 sections, 10 equations, 2 figures, 8 tables)

This paper contains 14 sections, 10 equations, 2 figures, 8 tables.

Introduction
Related Work
Method
RWMD + RoPE
Feature Representations
Sparse-K Sampling
RoPE-LIME Pipeline
Results
Evaluation against gSMILE
Evaluation on HotpotQA
Conclusion
Algorithms
Sparse-K parameter sweep
RoPE-LIME pipeline

Figures (2)

Figure 1: gSMILE (gpt-4o-mini) vs RoPE-LIME (Qwen-8B)
Figure 2: RoPE-LIME evaluated on longer contexts

RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

TL;DR

Abstract

RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution

Authors

TL;DR

Abstract

Table of Contents

Figures (2)