MLP-KAN: Unifying Deep Representation and Function Learning

Yunhong He; Yifeng Xie; Zhengqing Yuan; Lichao Sun

MLP-KAN: Unifying Deep Representation and Function Learning

Yunhong He, Yifeng Xie, Zhengqing Yuan, Lichao Sun

TL;DR

This work introduces MLP-KAN, a unified method designed to eliminate the need for manual model selection, and demonstrates its superior versatility, delivering competitive performance across both deep representation and function learning tasks.

Abstract

Recent advancements in both representation learning and function learning have demonstrated substantial promise across diverse domains of artificial intelligence. However, the effective integration of these paradigms poses a significant challenge, particularly in cases where users must manually decide whether to apply a representation learning or function learning model based on dataset characteristics. To address this issue, we introduce MLP-KAN, a unified method designed to eliminate the need for manual model selection. By integrating Multi-Layer Perceptrons (MLPs) for representation learning and Kolmogorov-Arnold Networks (KANs) for function learning within a Mixture-of-Experts (MoE) architecture, MLP-KAN dynamically adapts to the specific characteristics of the task at hand, ensuring optimal performance. Embedded within a transformer-based framework, our work achieves remarkable results on four widely-used datasets across diverse domains. Extensive experimental evaluation demonstrates its superior versatility, delivering competitive performance across both deep representation and function learning tasks. These findings highlight the potential of MLP-KAN to simplify the model selection process, offering a comprehensive, adaptable solution across various domains. Our code and weights are available at \url{https://github.com/DLYuanGod/MLP-KAN}.

MLP-KAN: Unifying Deep Representation and Function Learning

TL;DR

Abstract

Paper Structure (28 sections, 15 equations, 3 figures, 8 tables)

This paper contains 28 sections, 15 equations, 3 figures, 8 tables.

Introduction
Related Work
Deep Representation Learning.
Deep Function Learning.
Preliminary
Methodology
MLP-KAN
Representation Expert.
Function Expert.
Gating Mechanism.
Architecture
Experiment
Experimental Setup
Datasets.
Training and Evaluation Details.
...and 13 more sections

Figures (3)

Figure 1: The comparison between the MLP, KAN, and our proposed MLP-KAN. In the domains of Computer Vision and Natural Language Processing, the goal is to achieve the highest accuracy possible. In contrast, for the Symbolic Formula Representation task, the objective is to minimize the root mean square error (RMSE). The numbers are the average values of the experimental results. MLP-KAN effectively combines the strengths of both, ensuring strong performance in representation and function learning, and eliminating the need for task-specific model selection.
Figure 2: The framework combines a soft mixture of experts (MoE) with a unification of MLPs and KANs, denoted as the MLP-KAN module, to dynamically select experts for each token. The input tokens are passed through a multi-headed self-attention mechanism followed by layer normalization. The routing process involves soft weighting of experts for each slot and token via linear combinations and a softmax layer per slot and token. MLP and KAN experts are arranged in parallel, and based on the input's characteristics, either MLP or KAN is selected for computation, enhancing the model's ability to handle diverse representations efficiently. The gating mechanism determines the most relevant expert for each token, improving overall computational efficiency. This architecture retains the residual connections of the traditional Transformer while expanding its capacity to model complex functional and representational data.
Figure 3: Architecture of the transformer encoder with MLP-KAN Integration.

MLP-KAN: Unifying Deep Representation and Function Learning

TL;DR

Abstract

MLP-KAN: Unifying Deep Representation and Function Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)