Addressing Information Loss and Interaction Collapse: A Dual Enhanced Attention Framework for Feature Interaction
Yi Xu, Zhiyuan Lu, Xiaochen Li, Jinxin Hu, Hong Wen, Zulong Chen, Yu Zhang, Jing Zhang
TL;DR
The paper tackles two core limitations of Transformer-based CTR models: information loss in feature interactions from inner-product attention and interaction collapse due to long-tailed feature distributions. It proposes a Dual Enhanced Attention framework comprising Combo-ID Attention, which memorizes feature interactions via an independent memory codebook (with gated Siamese codebooks to mitigate collisions), and Collapse-avoiding Attention, which uses dynamic thresholding to filter low-information interactions; these are fused through multiple schemes to yield robust attention scores for prediction. Key contributions include the memory-based Combo-ID mechanism, the dynamic thresholding strategy for long-tail features, and versatile fusion methods, all validated on a large industrial dataset where the method outperforms strong baselines in AUC and GAUC. The work demonstrates practical impact for production CTR systems by preserving rich interaction signals while avoiding degradation from data sparsity, suggesting a viable path for scalable, reliable recommendation models.
Abstract
The Transformer has proven to be a significant approach in feature interaction for CTR prediction, achieving considerable success in previous works. However, it also presents potential challenges in handling feature interactions. Firstly, Transformers may encounter information loss when capturing feature interactions. By relying on inner products to represent pairwise relationships, they compress raw interaction information, which can result in a degradation of fidelity. Secondly, due to the long-tail features distribution, feature fields with low information-abundance embeddings constrain the information abundance of other fields, leading to collapsed embedding matrices. To tackle these issues, we propose a Dual Attention Framework for Enhanced Feature Interaction, known as Dual Enhanced Attention. This framework integrates two attention mechanisms: the Combo-ID attention mechanism and the collapse-avoiding attention mechanism. The Combo-ID attention mechanism directly retains feature interaction pairs to mitigate information loss, while the collapse-avoiding attention mechanism adaptively filters out low information-abundance interaction pairs to prevent interaction collapse. Extensive experiments conducted on industrial datasets have shown the effectiveness of Dual Enhanced Attention.
