Table of Contents
Fetching ...

Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei Zhou, Jingtong Hu, Yiyu Shi

TL;DR

The paper tackles on-device personalization of large language models under privacy and resource constraints by introducing a self-supervised data selection framework that maintains a compact, annotated buffer of dialogue sets. It combines three quality metrics—Embedding Entropy, Domain Specific Score, and In-Domain Dissimilarity—to select representative data from streaming, unlabeled user interactions, and augments this data with semantically similar pairs generated by the LLM itself. A fixed prompt and a ROUGE-1 based sanity check govern data synthesis, and fine-tuning is performed with LoRA on the selected and synthesized data. Experiments across six diverse datasets on a Llama-3B backbone show that the proposed approach yields up to $ROUGE-1$ improvements of around 38% over baselines while enabling faster on-device learning, demonstrating practical privacy-preserving personalization for edge devices.

Abstract

After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.

Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

TL;DR

The paper tackles on-device personalization of large language models under privacy and resource constraints by introducing a self-supervised data selection framework that maintains a compact, annotated buffer of dialogue sets. It combines three quality metrics—Embedding Entropy, Domain Specific Score, and In-Domain Dissimilarity—to select representative data from streaming, unlabeled user interactions, and augments this data with semantically similar pairs generated by the LLM itself. A fixed prompt and a ROUGE-1 based sanity check govern data synthesis, and fine-tuning is performed with LoRA on the selected and synthesized data. Experiments across six diverse datasets on a Llama-3B backbone show that the proposed approach yields up to improvements of around 38% over baselines while enabling faster on-device learning, demonstrating practical privacy-preserving personalization for edge devices.

Abstract

After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.
Paper Structure (16 sections, 5 equations, 3 figures, 4 tables)

This paper contains 16 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the framework. Fine-tune LLMs using data from data selection and following data generating.
  • Figure 2: The learning curve of our proposed framework, Random Replace, FIFO Replace, and K-Center with buffer size 281KB on datasets (a) ALPACA (b) DOLLY (c) Prosocial-Dialog (d) Empathetic-Dialog (e) OPENORCA (f) MedDialog.
  • Figure 3: ROUGE-1/training time on MedDialog dataset with different number of dialogue sets generated from each original set in the buffer.