Table of Contents
Fetching ...

Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance

Jiri Gesi, Iftekhar Ahmed

TL;DR

The paper identifies attention bias in fine-tuned Transformer-based language models for software engineering, showing that attention weights disproportionately focus on certain syntax tokens and AST elements during correct predictions. It introduces SyntaGuid, a syntax-pattern attention guiding mechanism that combines MLM with a dedicated SAG loss to steer attention toward critical code tokens and AST structures during fine-tuning. Empirical results across cloze tests, code clone detection, and code translation demonstrate that SyntaGuid yields up to a 3.25% overall performance improvement and fixes up to 28.3% of previously incorrect predictions, with significant gains over baseline and existing attention-guiding methods. The work has practical implications for model interpretability and robustness in software engineering tasks and provides data and replication resources for further research.

Abstract

Transformer-based models have demonstrated considerable potential for source code modeling tasks in software engineering. However, they are limited by their dependence solely on automatic self-attention weight learning mechanisms. Previous studies have shown that these models overemphasize delimiters added by tokenizers (e.g., [CLS], [SEP]), which may lead to overlooking essential information in the original input source code. To address this challenge, we introduce SyntaGuid, a novel approach that utilizes the observation that attention weights tend to be biased towards specific source code syntax tokens and abstract syntax tree (AST) elements in fine-tuned language models when they make correct predictions. SyntaGuid facilitates the guidance of attention-weight learning, leading to improved model performance on various software engineering tasks. We evaluate the effectiveness of SyntaGuid on multiple tasks and demonstrate that it outperforms existing state-of-the-art models in overall performance without requiring additional data. Experimental result shows that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions. Our work represents the first attempt to guide the attention of Transformer-based models towards critical source code tokens during fine-tuning, highlighting the potential for enhancing Transformer-based models in software engineering.

Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance

TL;DR

The paper identifies attention bias in fine-tuned Transformer-based language models for software engineering, showing that attention weights disproportionately focus on certain syntax tokens and AST elements during correct predictions. It introduces SyntaGuid, a syntax-pattern attention guiding mechanism that combines MLM with a dedicated SAG loss to steer attention toward critical code tokens and AST structures during fine-tuning. Empirical results across cloze tests, code clone detection, and code translation demonstrate that SyntaGuid yields up to a 3.25% overall performance improvement and fixes up to 28.3% of previously incorrect predictions, with significant gains over baseline and existing attention-guiding methods. The work has practical implications for model interpretability and robustness in software engineering tasks and provides data and replication resources for further research.

Abstract

Transformer-based models have demonstrated considerable potential for source code modeling tasks in software engineering. However, they are limited by their dependence solely on automatic self-attention weight learning mechanisms. Previous studies have shown that these models overemphasize delimiters added by tokenizers (e.g., [CLS], [SEP]), which may lead to overlooking essential information in the original input source code. To address this challenge, we introduce SyntaGuid, a novel approach that utilizes the observation that attention weights tend to be biased towards specific source code syntax tokens and abstract syntax tree (AST) elements in fine-tuned language models when they make correct predictions. SyntaGuid facilitates the guidance of attention-weight learning, leading to improved model performance on various software engineering tasks. We evaluate the effectiveness of SyntaGuid on multiple tasks and demonstrate that it outperforms existing state-of-the-art models in overall performance without requiring additional data. Experimental result shows that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions. Our work represents the first attempt to guide the attention of Transformer-based models towards critical source code tokens during fine-tuning, highlighting the potential for enhancing Transformer-based models in software engineering.
Paper Structure (32 sections, 9 equations, 7 figures, 3 tables)

This paper contains 32 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of attention guiding mechanism
  • Figure 2: Example attention guiding patterns for code snippet "<s> sum = num1 + num2; <\\ s>", whose syntax type list is: [[CLS], identifier, operator, identifier, operator, identifier, separator, [SEP]].
  • Figure 3: Comparison of attention weights on syntax tokens: Correctly Predicted vs. Mis-predicted groups
  • Figure 4: Comparison of attention weights on AST statements: Correctly Predicted vs. Mis-predicted groups
  • Figure 5: Comparison of model accuracy based on Syntax attention weight: Low vs. High attention weights
  • ...and 2 more figures