Table of Contents
Fetching ...

Consensus-Aligned Neuron Efficient Fine-Tuning Large Language Models for Multi-Domain Machine Translation

Shuting Jiang, Ran Song, Yuxin Huang, Yan Xiang, Yantuan Xian, Shengxiang Gao, Zhengtao Yu

TL;DR

This work addresses multi-domain machine translation (MDMT) when leveraging large language models by introducing CANEFT, a neuron-efficient fine-tuning framework that identifies and updates consensus-aligned neurons. It combines activation-gradient analysis to select task-relevant neurons, mutual-information based cross-domain screening to form a robust MDMT consensus set, and masked gradient updates to fine-tune only these neurons, avoiding parameter interference. Across German-English and Chinese-English tasks over 10 domains and 3 backbones, CANEFT delivers consistent gains over strong PEFT baselines and achieves state-of-the-art performance on both seen and unseen domains, while updating about $1\%$ of parameters. The results highlight the potential of neuron-level, cross-domain consensus in LLMs for robust, efficient MDMT with practical impact on cross-domain translation systems $($e.g., generalization to unseen domains$)$.

Abstract

Multi-domain machine translation (MDMT) aims to build a unified model capable of translating content across diverse domains. Despite the impressive machine translation capabilities demonstrated by large language models (LLMs), domain adaptation still remains a challenge for LLMs. Existing MDMT methods such as in-context learning and parameter-efficient fine-tuning often suffer from domain shift, parameter interference and limited generalization. In this work, we propose a neuron-efficient fine-tuning framework for MDMT that identifies and updates consensus-aligned neurons within LLMs. These neurons are selected by maximizing the mutual information between neuron behavior and domain features, enabling LLMs to capture both generalizable translation patterns and domain-specific nuances. Our method then fine-tunes LLMs guided by these neurons, effectively mitigating parameter interference and domain-specific overfitting. Comprehensive experiments on three LLMs across ten German-English and Chinese-English translation domains evidence that our method consistently outperforms strong PEFT baselines on both seen and unseen domains, achieving state-of-the-art performance.

Consensus-Aligned Neuron Efficient Fine-Tuning Large Language Models for Multi-Domain Machine Translation

TL;DR

This work addresses multi-domain machine translation (MDMT) when leveraging large language models by introducing CANEFT, a neuron-efficient fine-tuning framework that identifies and updates consensus-aligned neurons. It combines activation-gradient analysis to select task-relevant neurons, mutual-information based cross-domain screening to form a robust MDMT consensus set, and masked gradient updates to fine-tune only these neurons, avoiding parameter interference. Across German-English and Chinese-English tasks over 10 domains and 3 backbones, CANEFT delivers consistent gains over strong PEFT baselines and achieves state-of-the-art performance on both seen and unseen domains, while updating about of parameters. The results highlight the potential of neuron-level, cross-domain consensus in LLMs for robust, efficient MDMT with practical impact on cross-domain translation systems e.g., generalization to unseen domains.

Abstract

Multi-domain machine translation (MDMT) aims to build a unified model capable of translating content across diverse domains. Despite the impressive machine translation capabilities demonstrated by large language models (LLMs), domain adaptation still remains a challenge for LLMs. Existing MDMT methods such as in-context learning and parameter-efficient fine-tuning often suffer from domain shift, parameter interference and limited generalization. In this work, we propose a neuron-efficient fine-tuning framework for MDMT that identifies and updates consensus-aligned neurons within LLMs. These neurons are selected by maximizing the mutual information between neuron behavior and domain features, enabling LLMs to capture both generalizable translation patterns and domain-specific nuances. Our method then fine-tunes LLMs guided by these neurons, effectively mitigating parameter interference and domain-specific overfitting. Comprehensive experiments on three LLMs across ten German-English and Chinese-English translation domains evidence that our method consistently outperforms strong PEFT baselines on both seen and unseen domains, achieving state-of-the-art performance.
Paper Structure (18 sections, 12 equations, 4 figures, 2 tables)

This paper contains 18 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) LoRA-based fine-tuning causes parameter interference, while (b) adapter-based methods introduce additional parameters. (c) Our proposed CANEFT addresses these issues by only updating consensus-aligned neurons.
  • Figure 2: The impact of fine-tuning different multi-domain consensus-aligned neuron ratios on BLEU (left) and COMET (right) values with LLaMA3.1-8B-Instruct. In both plots, the x-axis shows neuron ratios, and the y-axis shows evaluation scores.
  • Figure 3: Distribution of multi-domain consensus-aligned neurons across layers within the FFN's gate_proj, up_proj and down_proj modules of LLaMA3.1-8B-Instruct. In each plots, the x-axis denotes the layer number, and the y-axis corresponds to neuron index bins, derived by segmenting the full range of neuron indices into 15 divisions.
  • Figure 4: Gradient changes in consensus-aligned neurons (CANEFT) and randomly chosen neurons (RCN) within the FFN's gate_proj, up_proj and down_proj modules of LLaMA3.1-8B-Instruct. Layers are grouped by depth into lower, middle, and higher sections. In each bar, the lower segment represents the gradient changes of randomly chosen neurons, while the upper segment corresponds to those of consensus-aligned neurons.