Table of Contents
Fetching ...

Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

Mingyi Zhou, Xiang Gao, Pei Liu, John Grundy, Chunyang Chen, Xiao Chen, Li Li

TL;DR

The paper tackles the security risk of explicit DL model representations on devices by proposing CustomDLCoder, which automatically extracts backend DL code from a library (e.g., TFLite) and synthesizes a pure C++ executable that performs model inference without exposing the original graph or weights. The approach follows four steps—Model Parsing, Computing Unit Extraction, Configuring Data Analysis, and Dynamic Configuration—to assemble complete inference code that preserves accuracy while enabling strong obfuscation. Empirical results across 11 diverse models show CustomDLCoder achieves comparable or better inference performance than existing deployment strategies, with significant memory reductions (up to 68.8% on x86-64 and 36.0% on ARM64) and speedups (up to 24.3% on ARM64), while substantially complicating model extraction by attackers. The work contributes a practical, automatic pathway to safer on-device DL deployment and offers insights for integrating code-centric inference into current toolchains, with open-source artifacts for reproducibility and extension.

Abstract

Recent studies show that deployed deep learning (DL) models such as those of Tensor Flow Lite (TFLite) can be easily extracted from real-world applications and devices by attackers to generate many kinds of attacks like adversarial attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent the aforementioned threats. Traditional software protection techniques have been widely explored, if on-device models can be implemented using pure code, such as C++, it will open the possibility of reusing existing software protection techniques. However, due to the complexity of DL models, there is no automatic method that can translate the DL models to pure code. To fill this gap, we propose a novel method, CustomDLCoder, to automatically extract the on-device model information and synthesize a customized executable program for a wide range of DL models. CustomDLCoder first parses the DL model, extracts its backend computing units, configures the computing units to a graph, and then generates customized code to implement and deploy the ML solution without explicit model representation. The synthesized program hides model information for DL deployment environments since it does not need to retain explicit model representation, preventing many attacks on the DL model. In addition, it improves ML performance because the customized code removes model parsing and preprocessing steps and only retains the data computing process. Our experimental results show that CustomDLCoder improves model security by disabling on-device model sniffing. Compared with the original on-device platform (i.e., TFLite), our method can accelerate model inference by 21.8% and 24.3% on x86-64 and ARM64 platforms, respectively. Most importantly, it can significantly reduce memory consumption by 68.8% and 36.0% on x86-64 and ARM64 platforms, respectively.

Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

TL;DR

The paper tackles the security risk of explicit DL model representations on devices by proposing CustomDLCoder, which automatically extracts backend DL code from a library (e.g., TFLite) and synthesizes a pure C++ executable that performs model inference without exposing the original graph or weights. The approach follows four steps—Model Parsing, Computing Unit Extraction, Configuring Data Analysis, and Dynamic Configuration—to assemble complete inference code that preserves accuracy while enabling strong obfuscation. Empirical results across 11 diverse models show CustomDLCoder achieves comparable or better inference performance than existing deployment strategies, with significant memory reductions (up to 68.8% on x86-64 and 36.0% on ARM64) and speedups (up to 24.3% on ARM64), while substantially complicating model extraction by attackers. The work contributes a practical, automatic pathway to safer on-device DL deployment and offers insights for integrating code-centric inference into current toolchains, with open-source artifacts for reproducibility and extension.

Abstract

Recent studies show that deployed deep learning (DL) models such as those of Tensor Flow Lite (TFLite) can be easily extracted from real-world applications and devices by attackers to generate many kinds of attacks like adversarial attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent the aforementioned threats. Traditional software protection techniques have been widely explored, if on-device models can be implemented using pure code, such as C++, it will open the possibility of reusing existing software protection techniques. However, due to the complexity of DL models, there is no automatic method that can translate the DL models to pure code. To fill this gap, we propose a novel method, CustomDLCoder, to automatically extract the on-device model information and synthesize a customized executable program for a wide range of DL models. CustomDLCoder first parses the DL model, extracts its backend computing units, configures the computing units to a graph, and then generates customized code to implement and deploy the ML solution without explicit model representation. The synthesized program hides model information for DL deployment environments since it does not need to retain explicit model representation, preventing many attacks on the DL model. In addition, it improves ML performance because the customized code removes model parsing and preprocessing steps and only retains the data computing process. Our experimental results show that CustomDLCoder improves model security by disabling on-device model sniffing. Compared with the original on-device platform (i.e., TFLite), our method can accelerate model inference by 21.8% and 24.3% on x86-64 and ARM64 platforms, respectively. Most importantly, it can significantly reduce memory consumption by 68.8% and 36.0% on x86-64 and ARM64 platforms, respectively.
Paper Structure (34 sections, 3 equations, 5 figures, 6 tables)

This paper contains 34 sections, 3 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The high-level idea of generating pure code to replace DL model representations. The red block shows the difference between the deployed DL components.
  • Figure 2: Design Overview of CustomDLCoder. The generated program is collected and confined by analyzing the inference process of the original TFLite library.
  • Figure 3: Structure of the operator source code. TFLite implements different code units for computing the output in different situations. It will parse the device and input information to choose the computing unit.
  • Figure 4: The pattern of memory allocation in different deployment methods on the Skin diagnosis model. A complete model inference process: load model $\rightarrow$ configure model $\rightarrow$ invoke (compute the output).
  • Figure 5: Meta-model our method. Model representations including computational graphs and weights can be stored as a separate file or be integrated into the API library. Because AI platforms usually are open-sourced, the source code of API libraries can be collected from the Internet.