Table of Contents
Fetching ...

Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

Haonan Chen, Zhicheng Dou, Xuetong Hao, Yunhao Tao, Shiren Song, Zhenli Sheng

TL;DR

This paper tackles B2B cloud solution matching by identifying two key challenges: modeling complex multi-field features and coping with limited, incomplete, and sparse transaction data. It introduces CAMA, a hierarchical multi-field matching framework with two BERT-based token encoders for description and attribute texts, field-aware embeddings, scale encoding, and a Transformer for field-level interactions, complemented by data augmentation and a contrastive pre-training objective. Empirical results on a real-world Huawei Cloud dataset show CAMA outperforms strong baselines offline and delivers approximately a 30% relative CVR improvement online, underscoring its industrial value. The work is validated through extensive ablations and hyperparameter studies, and its unified model design suggests applicability to other B2B matching problems with similar data characteristics.

Abstract

Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to adequately address. In this work, we study the B2B solution matching problem and identify two main challenges of this scenario: (1) the modeling of complex multi-field features and (2) the limited, incomplete, and sparse transaction data. To tackle these challenges, we propose a framework CAMA, which is built with a hierarchical multi-field matching structure as its backbone and supplemented by three data augmentation strategies and a contrastive pre-training objective to compensate for the imperfections in the available data. Through extensive experiments on a real-world dataset, we demonstrate that CAMA outperforms several strong baseline matching models significantly. Furthermore, we have deployed our matching framework on a system of Huawei Cloud. Our observations indicate an improvement of about 30% compared to the previous online model in terms of Conversion Rate (CVR), which demonstrates its great business value.

Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

TL;DR

This paper tackles B2B cloud solution matching by identifying two key challenges: modeling complex multi-field features and coping with limited, incomplete, and sparse transaction data. It introduces CAMA, a hierarchical multi-field matching framework with two BERT-based token encoders for description and attribute texts, field-aware embeddings, scale encoding, and a Transformer for field-level interactions, complemented by data augmentation and a contrastive pre-training objective. Empirical results on a real-world Huawei Cloud dataset show CAMA outperforms strong baselines offline and delivers approximately a 30% relative CVR improvement online, underscoring its industrial value. The work is validated through extensive ablations and hyperparameter studies, and its unified model design suggests applicability to other B2B matching problems with similar data characteristics.

Abstract

Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to adequately address. In this work, we study the B2B solution matching problem and identify two main challenges of this scenario: (1) the modeling of complex multi-field features and (2) the limited, incomplete, and sparse transaction data. To tackle these challenges, we propose a framework CAMA, which is built with a hierarchical multi-field matching structure as its backbone and supplemented by three data augmentation strategies and a contrastive pre-training objective to compensate for the imperfections in the available data. Through extensive experiments on a real-world dataset, we demonstrate that CAMA outperforms several strong baseline matching models significantly. Furthermore, we have deployed our matching framework on a system of Huawei Cloud. Our observations indicate an improvement of about 30% compared to the previous online model in terms of Conversion Rate (CVR), which demonstrates its great business value.
Paper Structure (31 sections, 5 equations, 3 figures, 5 tables)

This paper contains 31 sections, 5 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The illustration of CAMA. The scale encoding module incorporates the usage of look-up embedding and the AutoDis encoder to effectively model categorical and numeric features, respectively. Furthermore, two pre-trained BERT encoders are employed along with field-aware embeddings to capture token-level interactions within two distinct groups of text pairs. At a higher level, a Transformer encoder is utilized to model field-level interactions among various feature groups.
  • Figure 2: The illustration of our data augmentation strategies and contrastive learning process. Initially, an original $(s,c)$ pair is augmented by two random strategies. The BERT-encoded representations of these two similar pairs are then brought closer through our contrastive loss function.
  • Figure 3: Performance of CAMA on the BSM dataset with different hyperparameters.