Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

Kounianhua Du; Jianxing Liu; Kangning Zhang; Wenxiang Jiao; Yuan Lu; Jiarui Jin; Weiwen Liu; Yong Yu; Weinan Zhang

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

Kounianhua Du, Jianxing Liu, Kangning Zhang, Wenxiang Jiao, Yuan Lu, Jiarui Jin, Weiwen Liu, Yong Yu, Weinan Zhang

TL;DR

Fints introduces inference-time steering to personalize LLM outputs without gradient updates, addressing fast-changing user preferences and data sparsity. By constructing per-user steering vectors from contrastive prompts and employing fine-grained hooking of attention and MLP signals, Fints achieves instant, instance-specific adaptation. An input-aware aggregation mechanism selects top-K signals and a Pulse-and-Re-Pulse injection at a mid-layer yields personalized outputs while keeping the base model frozen. Across headline generation, abstract writing, and web-function calling, Fints outperforms prompt-based and parametric baselines, with strong data-efficiency and modest overhead, highlighting its practical utility as a plug-in personalization component.

Abstract

The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the in-context learning ability of LLMs, recent parametric adaptation methods, including personalized parameter-efficient fine-tuning and reward modeling emerge. However, these methods face limitations in handling dynamic user patterns and high data sparsity scenarios, due to low adaptability and data efficiency. To address these challenges, we propose a fine-grained and instance-tailored steering framework that dynamically generates sample-level interference vectors from user data and injects them into the model's forward pass for personalized adaptation. Our approach introduces two key technical innovations: a fine-grained steering component that captures nuanced signals by hooking activations from attention and MLP layers, and an input-aware aggregation module that synthesizes these signals into contextually relevant enhancements. The method demonstrates high flexibility and data efficiency, excelling in fast-changing distribution and high data sparsity scenarios. In addition, the proposed method is orthogonal to existing methods and operates as a plug-in component compatible with different personalization techniques. Extensive experiments across diverse scenarios--including short-to-long text generation, and web function calling--validate the effectiveness and compatibility of our approach. Results show that our method significantly enhances personalization performance in fast-shifting environments while maintaining robustness across varying interaction modes and context lengths. Implementation is available at https://github.com/KounianhuaDu/Fints.

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

TL;DR

Abstract

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)