Scaled Supervision is an Implicit Lipschitz Regularizer
Zhongyu Ouyang, Chunhui Zhang, Yaning Jia, Soroush Vosoughi
TL;DR
The paper addresses overfitting in CTR models caused by thresholding and rapidly changing online contexts. It proposes scaled, fine-grained supervision in the form of ratings to act as an implicit Lipschitz regularizer, deriving a bound $L_p(N) \le L_f / \sqrt{N}$ for temperature-scaled softmax outputs. Empirically, this approach improves predictive and ranking performance across multiple baselines and datasets with minimal latency and hyperparameter tuning. The work demonstrates that boosting supervision bandwidth enhances stability and generalization in CTR models, with broad implications for robust recommender systems and related architectures.
Abstract
In modern social media, recommender systems (RecSys) rely on the click-through rate (CTR) as the standard metric to evaluate user engagement. CTR prediction is traditionally framed as a binary classification task to predict whether a user will interact with a given item. However, this approach overlooks the complexity of real-world social modeling, where the user, item, and their interactive features change dynamically in fast-paced online environments. This dynamic nature often leads to model instability, reflected in overfitting short-term fluctuations rather than higher-level interactive patterns. While overfitting calls for more scaled and refined supervisions, current solutions often rely on binary labels that overly simplify fine-grained user preferences through the thresholding process, which significantly reduces the richness of the supervision. Therefore, we aim to alleviate the overfitting problem by increasing the supervision bandwidth in CTR training. Specifically, (i) theoretically, we formulate the impact of fine-grained preferences on model stability as a Lipschitz constrain; (ii) empirically, we discover that scaling the supervision bandwidth can act as an implicit Lipschitz regularizer, stably optimizing existing CTR models to achieve better generalizability. Extensive experiments show that this scaled supervision significantly and consistently improves the optimization process and the performance of existing CTR models, even without the need for additional hyperparameter tuning.
