Table of Contents
Fetching ...

A New Frontier of AI: On-Device AI Training and Personalization

Ji Joong Moon, Hyun Suk Lee, Jiho Chu, Donghak Park, Seungbaek Hong, Hyungjun Seo, Donghyeon Jeong, Sungsik Kong, MyungJoo Ham

TL;DR

The paper tackles the challenge of on-device neural network training for user personalization under tight resource constraints. It introduces NNTrainer, a light-weight framework that leverages fine-grained execution-order scheduling, a memory planner, and proactive swapping to dramatically reduce peak memory while maintaining accuracy. Across multiple models and devices, NNTrainer achieves substantial memory reductions (up to 95% in some scenarios) and enables training on mobile/embedded hardware, demonstrated with Tacotron2 and Transformer-style architectures. The work offers a practical, cross-platform, open-source solution that lowers cloud dependence and enhances privacy by enabling continual personalization directly on consumer devices.

Abstract

Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the limited resources of devices incurs significant difficulties. We propose a light-weight on-device training framework, NNTrainer, which provides highly memory-efficient neural network training techniques and proactive swapping based on fine-grained execution order analysis for neural networks. Moreover, its optimizations do not sacrifice accuracy and are transparent to training algorithms; thus, prior algorithmic studies may be implemented on top of NNTrainer. The evaluations show that NNTrainer can reduce memory consumption down to 1/20 (saving 95%!) and effectively personalizes intelligence services on devices. NNTrainer is cross-platform and practical open-source software, which is being deployed to millions of mobile devices.

A New Frontier of AI: On-Device AI Training and Personalization

TL;DR

The paper tackles the challenge of on-device neural network training for user personalization under tight resource constraints. It introduces NNTrainer, a light-weight framework that leverages fine-grained execution-order scheduling, a memory planner, and proactive swapping to dramatically reduce peak memory while maintaining accuracy. Across multiple models and devices, NNTrainer achieves substantial memory reductions (up to 95% in some scenarios) and enables training on mobile/embedded hardware, demonstrated with Tacotron2 and Transformer-style architectures. The work offers a practical, cross-platform, open-source solution that lowers cloud dependence and enhances privacy by enabling continual personalization directly on consumer devices.

Abstract

Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the limited resources of devices incurs significant difficulties. We propose a light-weight on-device training framework, NNTrainer, which provides highly memory-efficient neural network training techniques and proactive swapping based on fine-grained execution order analysis for neural networks. Moreover, its optimizations do not sacrifice accuracy and are transparent to training algorithms; thus, prior algorithmic studies may be implemented on top of NNTrainer. The evaluations show that NNTrainer can reduce memory consumption down to 1/20 (saving 95%!) and effectively personalizes intelligence services on devices. NNTrainer is cross-platform and practical open-source software, which is being deployed to millions of mobile devices.
Paper Structure (13 sections, 16 figures, 4 tables, 2 algorithms)

This paper contains 13 sections, 16 figures, 4 tables, 2 algorithms.

Figures (16)

  • Figure 1: Memory buffer usage of forward and backward processes.
  • Figure 2: Different granularity of training procedures. A gradient needs to be computed before the derivative; otherwise, $X$ remains during derivative computation.
  • Figure 3: Different types of training procedures.
  • Figure 4: Abstract architecture of NNTrainer.
  • Figure 5: Execution orders and temporal-spatial relations of an example model, where only $X_0$, $X_1$, $D_3$, $\Delta W_0$, and $W_0$ are required. Refer to Figure \ref{['FIG_LAYERS_FORWARDING_BACKWARDING']}, Table \ref{['TBL_TENSOR_LIFE_SPAN']} and \ref{['TBL_TENSOR_SHARING']} for the notations.
  • ...and 11 more figures