Table of Contents
Fetching ...

MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones

Jiaxiang Geng, Lunyu Zhao, Yiyi Lu, Bing Luo

TL;DR

MobileFineTuner introduces the first unified open-source framework for practical end-to-end LLM fine-tuning directly on commodity mobile phones. It combines a four-layer C++ architecture with on-device Full-FT and PEFT (LoRA) support, plus memory and energy optimizations such as ZeRO-inspired sharding, gradient accumulation, and an energy-aware scheduler. The framework demonstrates on-device fine-tuning across multiple models and tasks with results comparable to server-side baselines, and provides comprehensive ablations validating the effectiveness of the optimizations. This work paves the way for privacy-preserving, resource-aware on-device learning and lays groundwork for future federated and cross-device training on mobile hardware.

Abstract

Mobile phones are the most ubiquitous end devices, generating vast amounts of human-authored data and serving as the primary platform for end-side applications. As high-quality public data for large language models (LLMs) approaches exhaustion, on-device fine-tuning provides an opportunity to leverage private user data while preserving privacy. However, existing approaches are predominantly simulation-based or rely on IoT devices and PCs, leaving commodity mobile phones largely unexplored. A key gap is the absence of an open-source framework that enables practical LLM fine-tuning on mobile phones. We present MobileFineTuner, a unified open-source framework that enables end-to-end LLM fine-tuning directly on commodity mobile phones. MobileFineTuner is designed for efficiency, scalability, and usability, supporting full-parameters fine-tuning (Full-FT) and parameter-efficient fine-tuning (PEFT). To address the memory and energy limitations inherent to mobile phones, we introduce system-level optimizations including parameter sharding, gradient accumulation, and energy-aware computation scheduling. We demonstrate the practicality of MobileFineTuner by fine-tuning GPT-2, Gemma 3, and Qwen 2.5 on real mobile phones. Extensive experiments and ablation studies validate the effectiveness of the proposed optimizations and establish MobileFineTuner as a viable foundation for future research on on-device LLM training.

MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones

TL;DR

MobileFineTuner introduces the first unified open-source framework for practical end-to-end LLM fine-tuning directly on commodity mobile phones. It combines a four-layer C++ architecture with on-device Full-FT and PEFT (LoRA) support, plus memory and energy optimizations such as ZeRO-inspired sharding, gradient accumulation, and an energy-aware scheduler. The framework demonstrates on-device fine-tuning across multiple models and tasks with results comparable to server-side baselines, and provides comprehensive ablations validating the effectiveness of the optimizations. This work paves the way for privacy-preserving, resource-aware on-device learning and lays groundwork for future federated and cross-device training on mobile hardware.

Abstract

Mobile phones are the most ubiquitous end devices, generating vast amounts of human-authored data and serving as the primary platform for end-side applications. As high-quality public data for large language models (LLMs) approaches exhaustion, on-device fine-tuning provides an opportunity to leverage private user data while preserving privacy. However, existing approaches are predominantly simulation-based or rely on IoT devices and PCs, leaving commodity mobile phones largely unexplored. A key gap is the absence of an open-source framework that enables practical LLM fine-tuning on mobile phones. We present MobileFineTuner, a unified open-source framework that enables end-to-end LLM fine-tuning directly on commodity mobile phones. MobileFineTuner is designed for efficiency, scalability, and usability, supporting full-parameters fine-tuning (Full-FT) and parameter-efficient fine-tuning (PEFT). To address the memory and energy limitations inherent to mobile phones, we introduce system-level optimizations including parameter sharding, gradient accumulation, and energy-aware computation scheduling. We demonstrate the practicality of MobileFineTuner by fine-tuning GPT-2, Gemma 3, and Qwen 2.5 on real mobile phones. Extensive experiments and ablation studies validate the effectiveness of the proposed optimizations and establish MobileFineTuner as a viable foundation for future research on on-device LLM training.

Paper Structure

This paper contains 30 sections, 10 figures, 7 tables.

Figures (10)

  • Figure 1: MobileFineTuner overview.
  • Figure 2: ZeRO-inspired parameters sharding.
  • Figure 3: Energy-aware dynamic computation scheduling.
  • Figure 4: Loss of Full-FT on GPT2-127M@WikiText-2.
  • Figure 5: Loss of PEFT on Different Tasks.
  • ...and 5 more figures