Table of Contents
Fetching ...

Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Patrik Zavoral, Dušan Variš, Ondřej Bojar

TL;DR

It is hypothesized that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.

Abstract

The Transformer model has a tendency to overfit various aspects of the training data, such as the overall sequence length. We study elementary string edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer. We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic, although partially correct answers are often obtained. Additionally, we find that other structural characteristics of the sequences, such as subsegment length, may be equally important. We hypothesize that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.

Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

TL;DR

It is hypothesized that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.

Abstract

The Transformer model has a tendency to overfit various aspects of the training data, such as the overall sequence length. We study elementary string edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer. We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic, although partially correct answers are often obtained. Additionally, we find that other structural characteristics of the sequences, such as subsegment length, may be equally important. We hypothesize that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.

Paper Structure

This paper contains 19 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Python-style RASP programs for string reverse, copy, and flip (reverse a string, identical copy, and swap $a$ for $b$ and vice versa, respectively)
  • Figure 2: Source - target example for padded flip.
  • Figure 3: Reference - hypothesis target example for padded flip with two errors marked in purple and underlined.
  • Figure 4: Selected indicators after 400 epochs. Top row: left - simple tasks, right - padded tasks. Bottom row - simple tasks only. Copy (blue), flip (green), reverse (orange), copy/all (red), flip/all (brown), reverse/all (violet). Dashed vertical lines mark the training length range $(30, 40]$. Best viewed in color.
  • Figure 5: Copy - padded distributions (Count - total number of validation examples with a given value Value) for hypothesis padding lengths $|\tilde{P}|$ (blue), reference padding lengths $|P|$ (orange), and the training padding length $|P|$ prior (green).
  • ...and 3 more figures