GPT Meets Graphs and KAN Splines: Testing Novel Frameworks on Multitask Fine-Tuned GPT-2 with LoRA
Gabriel Bo, Marc Bernardino, Justin Gu
TL;DR
This work evaluates the integration of interpretable modules—Kolmogorov-Arnold Networks (KANs) and Graph Attention Networks (GATs)—into a GPT-2 framework for multitask NLP using parameter-efficient LoRA fine-tuning. It compares two extensions, Hybrid KAN-LoRA and Graph-LoRA, against a LoRA-enhanced baseline across sentiment analysis, paraphrase detection, and sonnet generation. The results show that the LoRA-augmented Transformer consistently outperforms the KAN/GAT variants, achieving 55.249% SST accuracy, 99.18% CFIMDB dev accuracy, 89.9% paraphrase accuracy, and CHRF 42.097 for sonnet generation, while Graph-LoRA and Hybrid KAN-LoRA add substantial complexity without performance gains. The findings suggest that, for multitask NLP, efficient parameter adaptation via LoRA remains superior to more complex interpretable architectures, underscoring the importance of strong baselines and careful evaluation of architectural benefits in practical settings.
Abstract
We explore the potential of integrating learnable and interpretable modules--specifically Kolmogorov-Arnold Networks (KAN) and graph-based representations--within a pre-trained GPT-2 model to enhance multi-task learning accuracy. Motivated by the recent surge in using KAN and graph attention (GAT) architectures in chain-of-thought (CoT) models and debates over their benefits compared to simpler architectures like MLPs, we begin by enhancing a standard self-attention transformer using Low-Rank Adaptation (LoRA), fine-tuning hyperparameters, and incorporating L2 regularization. This approach yields significant improvements. To further boost interpretability and richer representations, we develop two variants that attempt to improve the standard KAN and GAT: Graph LoRA and Hybrid-KAN LoRA (Learnable GPT). However, systematic evaluations reveal that neither variant outperforms the optimized LoRA-enhanced transformer, which achieves 55.249% accuracy on the SST test set, 99.18% on the CFIMDB dev set, and 89.9% paraphrase detection test accuracy. On sonnet generation, we get a CHRF score of 42.097. These findings highlight that efficient parameter adaptation via LoRA remains the most effective strategy for our tasks: sentiment analysis, paraphrase detection, and sonnet generation.
