Table of Contents
Fetching ...

EXACT: Explicit Attribute-Guided Decoding-Time Personalization

Xin Yu, Hanwen Xing, Lingzhou Xue

TL;DR

This work introduces EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes, and establishes theoretical approximation guarantees for the proposed algorithm under mild assumptions.

Abstract

Achieving personalized alignment requires adapting large language models to each user's evolving context. While decoding-time personalization offers a scalable alternative to training-time methods, existing methods largely rely on implicit, less interpretable preference representations and impose a rigid, context-agnostic user representation, failing to account for how preferences shift across prompts. We introduce EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes. EXACT first identifies user-specific attribute subsets by maximizing the likelihood of preferred responses in the offline stage. Then, for online inference, EXACT retrieves the most semantically relevant attributes for an incoming prompt and injects them into the context to steer generation. We establish theoretical approximation guarantees for the proposed algorithm under mild assumptions, and provably show that our similarity-based retrieval mechanism effectively mitigates contextual preference shifts, adapting to disparate tasks without pooling conflicting preferences. Extensive experiments on human-annotated preference datasets demonstrate that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.

EXACT: Explicit Attribute-Guided Decoding-Time Personalization

TL;DR

This work introduces EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes, and establishes theoretical approximation guarantees for the proposed algorithm under mild assumptions.

Abstract

Achieving personalized alignment requires adapting large language models to each user's evolving context. While decoding-time personalization offers a scalable alternative to training-time methods, existing methods largely rely on implicit, less interpretable preference representations and impose a rigid, context-agnostic user representation, failing to account for how preferences shift across prompts. We introduce EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes. EXACT first identifies user-specific attribute subsets by maximizing the likelihood of preferred responses in the offline stage. Then, for online inference, EXACT retrieves the most semantically relevant attributes for an incoming prompt and injects them into the context to steer generation. We establish theoretical approximation guarantees for the proposed algorithm under mild assumptions, and provably show that our similarity-based retrieval mechanism effectively mitigates contextual preference shifts, adapting to disparate tasks without pooling conflicting preferences. Extensive experiments on human-annotated preference datasets demonstrate that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.
Paper Structure (63 sections, 4 theorems, 52 equations, 7 figures, 8 tables, 2 algorithms)

This paper contains 63 sections, 4 theorems, 52 equations, 7 figures, 8 tables, 2 algorithms.

Key Result

Theorem 5.1

Let $S_k$ be the output of greedy selection, and let $S^* \in \arg\max_{|A|\le k}F(A)$ be an optimal solution to eq:set_max_main. If $F$ is normalized and monotone and has submodularity ratio $\gamma$, then

Figures (7)

  • Figure 1: EXACT with different $k$. We evaluate EXACT with varying attribute budget $k\in\{1,3,5,10\}$ on three open-weight instruction-tuned backbones (Llama-3.1-8B, Gemma-2-9B-it, and Qwen2.5-7B-Instruct), reporting pairwise accuracy (%). Overall, performance improves as $k$ increases from 1 to 5, indicating that incorporating a small set of retrieved attributes is beneficial. Gains saturate thereafter (and may slightly fluctuate at $k{=}10$), suggesting diminishing returns from adding more attributes.
  • Figure 2: Case 1 (PRISM). Attribute-guided prompt modification by appending keywords ("Attributes: $<...>$") to the user prompt, and an example win/lose pair used for pairwise preference evaluation.
  • Figure 3: Case 2 (PRISM). Attribute-guided prompt modification with an example win/lose pair.
  • Figure 4: Case 3 (PRISM). Attribute-guided prompt modification with an example win/lose pair.
  • Figure 5: Case 1 (Summarize from Human Feedback). Attribute-guided summarization prompt with a representative preferred vs. dispreferred summary.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 5.1: Greedy guarantee under weak submodularity
  • Theorem 5.2: Retrieval mitigates contextual preference shifts
  • Proposition 1.1: Closed-form optimizer of KL-regularized RLHF rafailov2023direct
  • proof
  • Definition 2.1: Submodularity ratio (See Definition 2 JMLR:v19:16-534)
  • Theorem 2.2: Greedy guarantee under weak submodularity (restated)
  • proof