Table of Contents
Fetching ...

Unified Low-rank Compression Framework for Click-through Rate Prediction

Hao Yu, Minghao Fu, Jiandong Ding, Yusheng Zhou, Jianxin Wu

TL;DR

This work tackles the memory and compute bottlenecks of deep CTR models by introducing a unified low-rank decomposition framework that compresses both embedding tables and MLP layers. Central to the approach is Atomic Feature Mimicking (AFM), which focuses on mimicking output distributions via PCA-based low-rank approximations, enabling efficient, end-to-end compression with minimal performance loss. The method achieves 3-5x parameter reduction and faster inference while often improving AUC on multiple datasets, including an industrial benchmark, demonstrating practical impact for real-time recommendation systems. Although GPU speedups are limited by memory bandwidth and retrieval costs, the framework provides a versatile, plug-and-play path to deploy resource-efficient CTR models with strong predictive performance.

Abstract

Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.

Unified Low-rank Compression Framework for Click-through Rate Prediction

TL;DR

This work tackles the memory and compute bottlenecks of deep CTR models by introducing a unified low-rank decomposition framework that compresses both embedding tables and MLP layers. Central to the approach is Atomic Feature Mimicking (AFM), which focuses on mimicking output distributions via PCA-based low-rank approximations, enabling efficient, end-to-end compression with minimal performance loss. The method achieves 3-5x parameter reduction and faster inference while often improving AUC on multiple datasets, including an industrial benchmark, demonstrating practical impact for real-time recommendation systems. Although GPU speedups are limited by memory bandwidth and retrieval costs, the framework provides a versatile, plug-and-play path to deploy resource-efficient CTR models with strong predictive performance.

Abstract

Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.
Paper Structure (32 sections, 10 equations, 2 figures, 21 tables)