Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training
Haonan Chen, Zhicheng Dou, Xuetong Hao, Yunhao Tao, Shiren Song, Zhenli Sheng
TL;DR
This paper tackles B2B cloud solution matching by identifying two key challenges: modeling complex multi-field features and coping with limited, incomplete, and sparse transaction data. It introduces CAMA, a hierarchical multi-field matching framework with two BERT-based token encoders for description and attribute texts, field-aware embeddings, scale encoding, and a Transformer for field-level interactions, complemented by data augmentation and a contrastive pre-training objective. Empirical results on a real-world Huawei Cloud dataset show CAMA outperforms strong baselines offline and delivers approximately a 30% relative CVR improvement online, underscoring its industrial value. The work is validated through extensive ablations and hyperparameter studies, and its unified model design suggests applicability to other B2B matching problems with similar data characteristics.
Abstract
Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to adequately address. In this work, we study the B2B solution matching problem and identify two main challenges of this scenario: (1) the modeling of complex multi-field features and (2) the limited, incomplete, and sparse transaction data. To tackle these challenges, we propose a framework CAMA, which is built with a hierarchical multi-field matching structure as its backbone and supplemented by three data augmentation strategies and a contrastive pre-training objective to compensate for the imperfections in the available data. Through extensive experiments on a real-world dataset, we demonstrate that CAMA outperforms several strong baseline matching models significantly. Furthermore, we have deployed our matching framework on a system of Huawei Cloud. Our observations indicate an improvement of about 30% compared to the previous online model in terms of Conversion Rate (CVR), which demonstrates its great business value.
