Table of Contents
Fetching ...

Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices

Congzheng Song, Xinyu Tang

TL;DR

This work tackles the challenge of fine-tuning large language models on resource-constrained mobile devices by introducing memory-efficient backpropagation (MeBP). Built on gradient checkpointing and augmented with lazy weight loading, decompression, and memory-mapped activations, MeBP achieves exact gradient computation with a training memory footprint near a single checkpoint. On iPhone 15 Pro Max, MeBP enables sub-1GB on-device fine-tuning for 0.5–4B parameter LLMs and demonstrates substantially faster convergence and better utility than zeroth-order baselines like MeZO, despite modest per-step compute overhead. The approach is implemented in Swift for iOS and released as open-source, highlighting practical pathways for private, on-device personalization of LLMs. Limitations include device-specific hardware requirements (A17 Pro+ equivalent) and bottlenecks at the final layer and with longer sequences, pointing to future work in fused kernels and longer-sequence optimization.

Abstract

Fine-tuning large language models (LLMs) with backpropagation\textemdash even for a subset of parameters such as LoRA\textemdash can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10$\times$ to 100$\times$ more steps than backpropagation). We propose a memory-efficient implementation of backpropagation (MeBP) on mobile devices that provides better trade-off between memory usage and compute time, while converging faster and achieving better performance than the ZO baseline. We verify the effectiveness of MeBP on an iPhone 15 Pro Max and show that various LLMs, ranging from 0.5B to 4B parameters, can be fine-tuned using less than 1GB of memory. We release an example of the MeBP implementation at https://github.com/apple/ml-mebp.

Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices

TL;DR

This work tackles the challenge of fine-tuning large language models on resource-constrained mobile devices by introducing memory-efficient backpropagation (MeBP). Built on gradient checkpointing and augmented with lazy weight loading, decompression, and memory-mapped activations, MeBP achieves exact gradient computation with a training memory footprint near a single checkpoint. On iPhone 15 Pro Max, MeBP enables sub-1GB on-device fine-tuning for 0.5–4B parameter LLMs and demonstrates substantially faster convergence and better utility than zeroth-order baselines like MeZO, despite modest per-step compute overhead. The approach is implemented in Swift for iOS and released as open-source, highlighting practical pathways for private, on-device personalization of LLMs. Limitations include device-specific hardware requirements (A17 Pro+ equivalent) and bottlenecks at the final layer and with longer sequences, pointing to future work in fused kernels and longer-sequence optimization.

Abstract

Fine-tuning large language models (LLMs) with backpropagation\textemdash even for a subset of parameters such as LoRA\textemdash can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10 to 100 more steps than backpropagation). We propose a memory-efficient implementation of backpropagation (MeBP) on mobile devices that provides better trade-off between memory usage and compute time, while converging faster and achieving better performance than the ZO baseline. We verify the effectiveness of MeBP on an iPhone 15 Pro Max and show that various LLMs, ranging from 0.5B to 4B parameters, can be fine-tuned using less than 1GB of memory. We release an example of the MeBP implementation at https://github.com/apple/ml-mebp.

Paper Structure

This paper contains 20 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Convergence of Qwen2.5 (0.5B, 1.5B and 3B) and Gemma-3 (1B and 4B) fine-tuned with ZO and FO.
  • Figure 2: Per-layer memory footprint and wall-clock time. On the x-axis, emb stands for the embedding layer; layer name starts with f stands for forward and b for backward.
  • Figure 3: The performance of improved ZO methods (zoo malladi2023finetuning, kzoo qin2024federated,hizoo zhao2025secondorder, fzoo dang2025fzoo).