Table of Contents
Fetching ...

Comateformer: Combined Attention Transformer for Semantic Sentence Matching

Bo Li, Di Liang, Zixin Zhang

TL;DR

Comateformer addresses the challenge of distinguishing fine-grained semantic differences between sentence pairs by replacing vanilla transformer attention with a combined attention mechanism that jointly models affinity and dissimilarity. It introduces a dual-affinity module (E for similarity and N for dissimilarity) and a compositional attention matrix M that omits softmax, enabling a broader receptive field and nuanced interaction modeling. The method demonstrates consistent improvements across 10 large SSM datasets and exhibits enhanced robustness in adversarial tests, with ablation results highlighting the effectiveness of using $\tanh(E)$ and $\text{sigmoid}(N)$. The approach is shown to be compatible with both Transformer cores and PLMs, offering a practical path to more discriminative semantic matching in real-world applications.

Abstract

The Transformer-based model have made significant strides in semantic matching tasks by capturing connections between phrase pairs. However, to assess the relevance of sentence pairs, it is insufficient to just examine the general similarity between the sentences. It is crucial to also consider the tiny subtleties that differentiate them from each other. Regrettably, attention softmax operations in transformers tend to miss these subtle differences. To this end, in this work, we propose a novel semantic sentence matching model named Combined Attention Network based on Transformer model (Comateformer). In Comateformer model, we design a novel transformer-based quasi-attention mechanism with compositional properties. Unlike traditional attention mechanisms that merely adjust the weights of input tokens, our proposed method learns how to combine, subtract, or resize specific vectors when building a representation. Moreover, our proposed approach builds on the intuition of similarity and dissimilarity (negative affinity) when calculating dual affinity scores. This allows for a more meaningful representation of relationships between sentences. To evaluate the performance of our proposed model, we conducted extensive experiments on ten public real-world datasets and robustness testing. Experimental results show that our method achieves consistent improvements.

Comateformer: Combined Attention Transformer for Semantic Sentence Matching

TL;DR

Comateformer addresses the challenge of distinguishing fine-grained semantic differences between sentence pairs by replacing vanilla transformer attention with a combined attention mechanism that jointly models affinity and dissimilarity. It introduces a dual-affinity module (E for similarity and N for dissimilarity) and a compositional attention matrix M that omits softmax, enabling a broader receptive field and nuanced interaction modeling. The method demonstrates consistent improvements across 10 large SSM datasets and exhibits enhanced robustness in adversarial tests, with ablation results highlighting the effectiveness of using and . The approach is shown to be compatible with both Transformer cores and PLMs, offering a practical path to more discriminative semantic matching in real-world applications.

Abstract

The Transformer-based model have made significant strides in semantic matching tasks by capturing connections between phrase pairs. However, to assess the relevance of sentence pairs, it is insufficient to just examine the general similarity between the sentences. It is crucial to also consider the tiny subtleties that differentiate them from each other. Regrettably, attention softmax operations in transformers tend to miss these subtle differences. To this end, in this work, we propose a novel semantic sentence matching model named Combined Attention Network based on Transformer model (Comateformer). In Comateformer model, we design a novel transformer-based quasi-attention mechanism with compositional properties. Unlike traditional attention mechanisms that merely adjust the weights of input tokens, our proposed method learns how to combine, subtract, or resize specific vectors when building a representation. Moreover, our proposed approach builds on the intuition of similarity and dissimilarity (negative affinity) when calculating dual affinity scores. This allows for a more meaningful representation of relationships between sentences. To evaluate the performance of our proposed model, we conducted extensive experiments on ten public real-world datasets and robustness testing. Experimental results show that our method achieves consistent improvements.

Paper Structure

This paper contains 24 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The Combined Attention Network Example for Semantic Sentence Matching.
  • Figure 2: Difference between Softmax attention, Linear attention and Combined attention. Softmax attention computes the similarity between all Q-K pairs. Linear attention applies mapping function $\Phi$(·) to Q and K respectively. Our Combined Attention models both global affinity and local difference information, thus achieving dual perception of affinity and non-affinity, with higher fine-grained differentiation advantages.
  • Figure 3: The overall architecture of incorporating Comateformer to transformer Model.
  • Figure 4: Performance of each BERT layer on TextFlint transformed dataset.
  • Figure 5: The robustness experiment on the QQP and QNLI datasets based BERT.
  • ...and 1 more figures