Table of Contents
Fetching ...

Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment

Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen

TL;DR

An empirical study indicates the existence of redundant neurons in LLMs for alignment training, and proposes a low-redundant alignment method named **ALLO**, focusing on optimizing the most related neurons with the most useful supervised signals.

Abstract

Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. They are prone to overfit into the unexpected patterns or superficial styles in the training data. We conduct an empirical study that only selects the top-10\% most updated parameters in LLMs for alignment training, and see improvements in the convergence process and final performance. It indicates the existence of redundant neurons in LLMs for alignment training. To reduce its influence, we propose a low-redundant alignment method named \textbf{ALLO}, focusing on optimizing the most related neurons with the most useful supervised signals. Concretely, we first identify the neurons that are related to the human preference data by a gradient-based strategy, then identify the alignment-related key tokens by reward models for computing loss. Besides, we also decompose the alignment process into the forgetting and learning stages, where we first forget the tokens with unaligned knowledge and then learn aligned knowledge, by updating different ratios of neurons, respectively. Experimental results on 10 datasets have shown the effectiveness of ALLO. Our code and data are available at \url{https://github.com/RUCAIBox/ALLO}.

Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment

TL;DR

An empirical study indicates the existence of redundant neurons in LLMs for alignment training, and proposes a low-redundant alignment method named **ALLO**, focusing on optimizing the most related neurons with the most useful supervised signals.

Abstract

Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. They are prone to overfit into the unexpected patterns or superficial styles in the training data. We conduct an empirical study that only selects the top-10\% most updated parameters in LLMs for alignment training, and see improvements in the convergence process and final performance. It indicates the existence of redundant neurons in LLMs for alignment training. To reduce its influence, we propose a low-redundant alignment method named \textbf{ALLO}, focusing on optimizing the most related neurons with the most useful supervised signals. Concretely, we first identify the neurons that are related to the human preference data by a gradient-based strategy, then identify the alignment-related key tokens by reward models for computing loss. Besides, we also decompose the alignment process into the forgetting and learning stages, where we first forget the tokens with unaligned knowledge and then learn aligned knowledge, by updating different ratios of neurons, respectively. Experimental results on 10 datasets have shown the effectiveness of ALLO. Our code and data are available at \url{https://github.com/RUCAIBox/ALLO}.
Paper Structure (16 sections, 9 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Training loss curve and benchmark performance of QA tasks using different trainable neurons in LLM. We perform alignment training using DPO dpo on ECQA and QASC. The top/last-10% related neurons are selected based on the accumulated gradients during DPO training.
  • Figure 2: The framework of our proposed alignment method ALLO. We first locate the key neurons in LLMs by computing the weight changes of the reference model. Then, based on the selected key neurons, we perform a fine-grained unlearning using NPO to help LLMs forget unaligned knowledge, and fine-grained learning using DPO to further align LLMs to human preference.
  • Figure 3: The experimental results of the influence of different warm-up methods on downstream tasks.
  • Figure 4: The experimental results of the different neuron mask ratios on ECQA and AlpaceEval 2.0, reporting the accuracy and win rate respectively. In the evaluation, we keep the mask ratio of one stage frozen and change the ratio of another stage.