MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

Shu Yang; Muhammad Asif Ali; Cheng-Long Wang; Lijie Hu; Di Wang

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

Shu Yang, Muhammad Asif Ali, Cheng-Long Wang, Lijie Hu, Di Wang

TL;DR

A new evaluation benchmark namely: Life Long Learning of LLM (5L-bench) encompassing a newly curated dataset of question-answer pairs, and a set of evaluation metrics for rigorous evaluation of MoRAL in open-book and closed-book settings are introduced.

Abstract

Adapting large language models (LLMs) to new domains/tasks and enabling them to be efficient lifelong learners is a pivotal challenge. In this paper, we propose MoRAL, i.e., Mixture-of-Experts augmented Low-Rank Adaptation for Lifelong Learning. MoRAL combines the multi-tasking abilities of MoE with the fine-tuning abilities of LoRA for effective life-long learning of LLMs. In contrast to the conventional approaches that use factual triplets as inputs MoRAL relies on simple question-answer pairs, which is a more practical and effective strategy for robust and efficient learning. Owing to new data settings, we introduce a new evaluation benchmark namely: Life Long Learning of LLM (5L-bench) encompassing a newly curated dataset of question-answer pairs, and a set of evaluation metrics for rigorous evaluation of MoRAL in open-book and closed-book settings. Experimental evaluation shows (i) LLMs learn fast in open-book settings with up to 30.15% improvement in "RA" for Phi-2-2.7B compared to closed-book (for models fine-tuned with MoRAL); (ii) MoRAL shows higher performance improvement for models with a greater number of parameters; (iii) MoRAL is robust to catastrophic forgetting offering better knowledge retention compared to baselines.

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

TL;DR

Abstract

Paper Structure (41 sections, 6 equations, 6 figures, 9 tables)

This paper contains 41 sections, 6 equations, 6 figures, 9 tables.

Introduction
Related Works
Preliminaries
"Open/Closed" book and Cross setting
Fact Triples vs Question-Answer Pairs
MoRAL for Lifelong LLMs
(a) Router Network.
(b) MoRAL Output.
5L-Bench (Evaluation Benchmark)
Arxiv Data Curation
Evaluation Metrics
(a) Open-book Settings.
(b) Closed-book Settings.
(c) Cross Settings.
Experimentation
...and 26 more sections

Figures (6)

Figure 1: An example illustration, ChatGPT-4 is unable to provide accurate information about events that occurred after April 2023.
Figure 2: Example illustration of difference between the input data for conventional approaches and MoRAL.
Figure 3: MoRAL architecture for life-long learning of LLMs. We use $n$ experts. FFN in the figure represents Feed-Foward Network.
Figure 4: Overview of the 5L-Bench data curation and evaluation pipeline.
Figure 5: Performance comparison of MoRAL vs LoRA for large models with varying number of model parameters, best viewed in colors. These results are computed using the Arxiv dataset.
...and 1 more figures

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

TL;DR

Abstract

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)