Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

Chen Gong; Zhenzhe Zheng; Yiliu Chen; Sheng Wang; Fan Wu; Guihai Chen

Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

Chen Gong, Zhenzhe Zheng, Yiliu Chen, Sheng Wang, Fan Wu, Guihai Chen

Abstract

Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.

Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

Abstract

Paper Structure (19 sections, 5 equations, 22 figures, 1 table)

This paper contains 19 sections, 5 equations, 22 figures, 1 table.

Introduction
Background and Motivation
On-Device Model Execution Pipeline
Feature Extraction Bottleneck
Optimization Opportunities
AutoFeature Design
Overview
Graph Generator: Automated Redundancy Identification
Graph Optimizer: Inter-Feature Redundancy Elimination
Event Evaluator: Inter-Inference Redundancy Minimization
Evaluation
Experiment Setup
Overall Performance
Component-Wise Analysis
Sensitivity Analysis
...and 4 more sections

Figures (22)

Figure 1: Workflow of real-world mobile application services.
Figure 1: Behavior traces of testing users across three time periods (noon, evening and night). Each subfigure focuses on one video-related behavior and we plot the frequencies within each 10-minute segment of users with different activity levels (P90, P80, P70, P60, P50, and P30 traces).
Figure 2: A complete on-device model execution pipeline in industrial mobile apps.
Figure 3: Attribute number of mobile user behaviors.
Figure 4: Time breakdown of on-device model execution.
...and 17 more figures

Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

Abstract

Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

Authors

Abstract

Table of Contents

Figures (22)