Table of Contents
Fetching ...

FourierKAN outperforms MLP on Text Classification Head Fine-tuning

Abdullah Al Imran, Md Farhan Ishmam

TL;DR

The paper demonstrates that Fourier-KAN (FR-KAN) can serve as a superior, more efficient text-classification head for linear probing on frozen transformer backbones, outperforming the traditional MLP and the original KAN across multiple datasets and backbones. By formulating the head with a Fourier-series representation, FR-KAN delivers smoother, globally controlled non-linearities that converge faster and use fewer parameters. Empirical results show an average accuracy boost of about 10% and F1 boost of about 11% over MLP heads, with RoBERTa benefiting the most and XLNet showing mixed results. The work highlights FR-KAN as a potentially universal, greener alternative to MLPs for NLP tasks, while acknowledging interpretability trade-offs and grid-size considerations as future directions.

Abstract

In resource constraint settings, adaptation to downstream classification tasks involves fine-tuning the final layer of a classifier (i.e. classification head) while keeping rest of the model weights frozen. Multi-Layer Perceptron (MLP) heads fine-tuned with pre-trained transformer backbones have long been the de facto standard for text classification head fine-tuning. However, the fixed non-linearity of MLPs often struggles to fully capture the nuances of contextual embeddings produced by pre-trained models, while also being computationally expensive. In our work, we investigate the efficacy of KAN and its variant, Fourier KAN (FR-KAN), as alternative text classification heads. Our experiments reveal that FR-KAN significantly outperforms MLPs with an average improvement of 10% in accuracy and 11% in F1-score across seven pre-trained transformer models and four text classification tasks. Beyond performance gains, FR-KAN is more computationally efficient and trains faster with fewer parameters. These results underscore the potential of FR-KAN to serve as a lightweight classification head, with broader implications for advancing other Natural Language Processing (NLP) tasks.

FourierKAN outperforms MLP on Text Classification Head Fine-tuning

TL;DR

The paper demonstrates that Fourier-KAN (FR-KAN) can serve as a superior, more efficient text-classification head for linear probing on frozen transformer backbones, outperforming the traditional MLP and the original KAN across multiple datasets and backbones. By formulating the head with a Fourier-series representation, FR-KAN delivers smoother, globally controlled non-linearities that converge faster and use fewer parameters. Empirical results show an average accuracy boost of about 10% and F1 boost of about 11% over MLP heads, with RoBERTa benefiting the most and XLNet showing mixed results. The work highlights FR-KAN as a potentially universal, greener alternative to MLPs for NLP tasks, while acknowledging interpretability trade-offs and grid-size considerations as future directions.

Abstract

In resource constraint settings, adaptation to downstream classification tasks involves fine-tuning the final layer of a classifier (i.e. classification head) while keeping rest of the model weights frozen. Multi-Layer Perceptron (MLP) heads fine-tuned with pre-trained transformer backbones have long been the de facto standard for text classification head fine-tuning. However, the fixed non-linearity of MLPs often struggles to fully capture the nuances of contextual embeddings produced by pre-trained models, while also being computationally expensive. In our work, we investigate the efficacy of KAN and its variant, Fourier KAN (FR-KAN), as alternative text classification heads. Our experiments reveal that FR-KAN significantly outperforms MLPs with an average improvement of 10% in accuracy and 11% in F1-score across seven pre-trained transformer models and four text classification tasks. Beyond performance gains, FR-KAN is more computationally efficient and trains faster with fewer parameters. These results underscore the potential of FR-KAN to serve as a lightweight classification head, with broader implications for advancing other Natural Language Processing (NLP) tasks.
Paper Structure (33 sections, 2 theorems, 15 equations, 3 figures, 7 tables)

This paper contains 33 sections, 2 theorems, 15 equations, 3 figures, 7 tables.

Key Result

Theorem 1

Assume with Fourier coefficients $a_{k}, b_{k}$ and grid size $G$, the Fourier series for the function $f(x)$ taking the form: converges to a corresponding univariate function over a finite interval $[a,b]$ as $G \rightarrow \infty$, given the function is continuous.

Figures (3)

  • Figure 1: Comparison of average accuracy of different classification heads.
  • Figure 2: Overview of the architecture with FR-KAN classification head -- following the standard tokenization and embedding, the input text is passed to a pre-trained transformer encoder. The FR-KAN layer maps the contextualized embedding produced by the transformer to the output classes.
  • Figure 3: Results of the DistilBERT model on the IMDb dataset. For different classification heads, (a)-(c) training and validation loss, (d) accuracy, and (e) F1 score. For the FR-KAN head, (f) accuracy and F1 score at varying grid sizes.

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • Corollary 1