Table of Contents
Fetching ...

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Hillel Ohayon, Daniel Gilkarov, Ran Dubin

TL;DR

A lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation is proposed and it is shown that the method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners.

Abstract

Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

TL;DR

A lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation is proposed and it is shown that the method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners.

Abstract

Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python's pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01% F1-score compared with 7.23%-62.75% achieved by the SOTA scanners (Modelscan, Fickling, ClamAV, VirusTotal) on our dataset. Furthermore, on the PickleBall data (OOD), it achieves 81.22% F1-score compared with 76.09% achieved by the PickleBall method, while remaining fully library-agnostic. Finally, we show that our method is the only one to correctly parse and classify 9/9 evasive Hide-and-Seek malicious models specially crafted to evade scanners. This demonstrates that data-driven detection can effectively and generically mitigate Pickle-based model file attacks.
Paper Structure (35 sections, 1 equation, 7 figures, 1 table)

This paper contains 35 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Simple Python snippet that saves/loads a Python object using Pickle.
  • Figure 2: Visualization of Pickle model remote code execution. Step (1) illustrates how easy it is to create the attack, Step (2) is the hex view, Step (3) visualizes how Fickling prints the attack steps, and Step (4) is the execution of the ransomware.
  • Figure 3: Example of a Restricting Unpickler that only allows importing whitelisted functions.
  • Figure 4: Simple Python snippet that loads a pre-trained model from HuggingFace using the transformers Python API.
  • Figure 5: Overview of the proposed detection pipeline. In the first stage, the Pickle-based model file is parsed using Python’s Pickletools library, and the extracted opcodes are converted into normalized opcode-frequency vectors. In the second stage, these vectors are evaluated using supervised or unsupervised learning techniques to determine whether the file is benign or malicious, ultimately producing a final prediction
  • ...and 2 more figures