Table of Contents
Fetching ...

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

Kounianhua Du, Jianxing Liu, Kangning Zhang, Wenxiang Jiao, Yuan Lu, Jiarui Jin, Weiwen Liu, Yong Yu, Weinan Zhang

TL;DR

Fints introduces inference-time steering to personalize LLM outputs without gradient updates, addressing fast-changing user preferences and data sparsity. By constructing per-user steering vectors from contrastive prompts and employing fine-grained hooking of attention and MLP signals, Fints achieves instant, instance-specific adaptation. An input-aware aggregation mechanism selects top-K signals and a Pulse-and-Re-Pulse injection at a mid-layer yields personalized outputs while keeping the base model frozen. Across headline generation, abstract writing, and web-function calling, Fints outperforms prompt-based and parametric baselines, with strong data-efficiency and modest overhead, highlighting its practical utility as a plug-in personalization component.

Abstract

The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the in-context learning ability of LLMs, recent parametric adaptation methods, including personalized parameter-efficient fine-tuning and reward modeling emerge. However, these methods face limitations in handling dynamic user patterns and high data sparsity scenarios, due to low adaptability and data efficiency. To address these challenges, we propose a fine-grained and instance-tailored steering framework that dynamically generates sample-level interference vectors from user data and injects them into the model's forward pass for personalized adaptation. Our approach introduces two key technical innovations: a fine-grained steering component that captures nuanced signals by hooking activations from attention and MLP layers, and an input-aware aggregation module that synthesizes these signals into contextually relevant enhancements. The method demonstrates high flexibility and data efficiency, excelling in fast-changing distribution and high data sparsity scenarios. In addition, the proposed method is orthogonal to existing methods and operates as a plug-in component compatible with different personalization techniques. Extensive experiments across diverse scenarios--including short-to-long text generation, and web function calling--validate the effectiveness and compatibility of our approach. Results show that our method significantly enhances personalization performance in fast-shifting environments while maintaining robustness across varying interaction modes and context lengths. Implementation is available at https://github.com/KounianhuaDu/Fints.

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

TL;DR

Fints introduces inference-time steering to personalize LLM outputs without gradient updates, addressing fast-changing user preferences and data sparsity. By constructing per-user steering vectors from contrastive prompts and employing fine-grained hooking of attention and MLP signals, Fints achieves instant, instance-specific adaptation. An input-aware aggregation mechanism selects top-K signals and a Pulse-and-Re-Pulse injection at a mid-layer yields personalized outputs while keeping the base model frozen. Across headline generation, abstract writing, and web-function calling, Fints outperforms prompt-based and parametric baselines, with strong data-efficiency and modest overhead, highlighting its practical utility as a plug-in personalization component.

Abstract

The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the in-context learning ability of LLMs, recent parametric adaptation methods, including personalized parameter-efficient fine-tuning and reward modeling emerge. However, these methods face limitations in handling dynamic user patterns and high data sparsity scenarios, due to low adaptability and data efficiency. To address these challenges, we propose a fine-grained and instance-tailored steering framework that dynamically generates sample-level interference vectors from user data and injects them into the model's forward pass for personalized adaptation. Our approach introduces two key technical innovations: a fine-grained steering component that captures nuanced signals by hooking activations from attention and MLP layers, and an input-aware aggregation module that synthesizes these signals into contextually relevant enhancements. The method demonstrates high flexibility and data efficiency, excelling in fast-changing distribution and high data sparsity scenarios. In addition, the proposed method is orthogonal to existing methods and operates as a plug-in component compatible with different personalization techniques. Extensive experiments across diverse scenarios--including short-to-long text generation, and web function calling--validate the effectiveness and compatibility of our approach. Results show that our method significantly enhances personalization performance in fast-shifting environments while maintaining robustness across varying interaction modes and context lengths. Implementation is available at https://github.com/KounianhuaDu/Fints.

Paper Structure

This paper contains 26 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of different methodologies. (a) Prompt-based methods retrieve relevant context and feed it with the target query into the large language models (LLMs) for personalized output. (b) Personalized parameter-efficient tuning methods adapt LLMs with user data and obtain personalized weights to offer personalization. (c) Steering-based methodology constructs contrastive prompts from user logs and obtains a personal interference vector to guide model behavior.
  • Figure 2: Overview of Fints. 1) Steering Vectors Preparation. During this stage, we construct contrastive prompts from user logs, where relevant context retrieved from personal corpus is concatenated with target query to form the positive sample and irrelevant context sampled from other users is concatenated with target query to form the negative sample. For each positive sample, we generate $K$ negative samples. Each pair is then fed into the LLM in the teacher forcing mode, during which the last token representation of attention and MLP blocks are hooked. We store the difference between two sample activations of a pair to serve as the steering vector, with the text of each pair being the key for indexing convenience. 2) Instance-Tailored Personalized Adaption and Inference. During this stage, we sample from target user's steering vectors set to interfere model for personalized output. Concretely, we rank the similarity between query text and sample pair text to select the top-k steering vectors, which are then attentively aggregated and injected into LLM for personalized adaption.
  • Figure 3: T-SNE visualization of the data distribution, where the heterogeneous test sets are sampled from.
  • Figure 4: Illustration of the heterogeneous data.
  • Figure 5: Data efficiency analysis of different methods.