Table of Contents
Fetching ...

RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction

Ziye Tong, Jiahao Liu, Weimin Zhang, Hongji Ruan, Derick Tang, Zhanpeng Zeng, Qinsong Zeng, Peng Zhang, Tun Lu, Ning Gu

TL;DR

This work tackles the mismatch between multimodal embedding representations and CTR modeling by discretizing embeddings into semantic IDs via a probabilistic residual quantization framework. The proposed RQ-GMM uses Gaussian mixtures with soft assignment across multiple residual levels to achieve high codebook utilization and accurate reconstruction, producing robust semantic IDs for downstream CTR models. Offline experiments show superior reconstruction RMSE and CTR performance across datasets and backbones, while large-scale online A/B tests report meaningful Advertiser Value gains, culminating in full production deployment. The approach delivers a practical, scalable solution that stabilizes training and enhances recommendation quality in multimodal recommender systems.

Abstract

Multimodal content is crucial for click-through rate (CTR) prediction. However, directly incorporating continuous embeddings from pre-trained models into CTR models yields suboptimal results due to misaligned optimization objectives and convergence speed inconsistency during joint training. Discretizing embeddings into semantic IDs before feeding them into CTR models offers a more effective solution, yet existing methods suffer from limited codebook utilization, reconstruction accuracy, and semantic discriminability. We propose RQ-GMM (Residual Quantized Gaussian Mixture Model), which introduces probabilistic modeling to better capture the statistical structure of multimodal embedding spaces. Through Gaussian Mixture Models combined with residual quantization, RQ-GMM achieves superior codebook utilization and reconstruction accuracy. Experiments on public datasets and online A/B tests on a large-scale short-video platform serving hundreds of millions of users demonstrate substantial improvements: RQ-GMM yields a 1.502% gain in Advertiser Value over strong baselines. The method has been fully deployed, serving daily recommendations for hundreds of millions of users.

RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction

TL;DR

This work tackles the mismatch between multimodal embedding representations and CTR modeling by discretizing embeddings into semantic IDs via a probabilistic residual quantization framework. The proposed RQ-GMM uses Gaussian mixtures with soft assignment across multiple residual levels to achieve high codebook utilization and accurate reconstruction, producing robust semantic IDs for downstream CTR models. Offline experiments show superior reconstruction RMSE and CTR performance across datasets and backbones, while large-scale online A/B tests report meaningful Advertiser Value gains, culminating in full production deployment. The approach delivers a practical, scalable solution that stabilizes training and enhances recommendation quality in multimodal recommender systems.

Abstract

Multimodal content is crucial for click-through rate (CTR) prediction. However, directly incorporating continuous embeddings from pre-trained models into CTR models yields suboptimal results due to misaligned optimization objectives and convergence speed inconsistency during joint training. Discretizing embeddings into semantic IDs before feeding them into CTR models offers a more effective solution, yet existing methods suffer from limited codebook utilization, reconstruction accuracy, and semantic discriminability. We propose RQ-GMM (Residual Quantized Gaussian Mixture Model), which introduces probabilistic modeling to better capture the statistical structure of multimodal embedding spaces. Through Gaussian Mixture Models combined with residual quantization, RQ-GMM achieves superior codebook utilization and reconstruction accuracy. Experiments on public datasets and online A/B tests on a large-scale short-video platform serving hundreds of millions of users demonstrate substantial improvements: RQ-GMM yields a 1.502% gain in Advertiser Value over strong baselines. The method has been fully deployed, serving daily recommendations for hundreds of millions of users.
Paper Structure (23 sections, 14 equations, 1 figure, 3 tables)

This paper contains 23 sections, 14 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: RMSE convergence curves of RQ-GMM and RQ-KMeans over EM/K-means iterations on a proprietary industrial dataset.