Table of Contents
Fetching ...

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Amal Akli, Maxime Cordy, Mike Papadakis, Yves Le Traon

Abstract

Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within a single model, but fully fine-tuning LLMs across tasks is computationally prohibitive. Parameter-efficient fine-tuning mitigates this cost by updating only a small fraction of weights. Although PEFT has proven effective in single-task settings, its potential for multi-task learning has not yet been systematically explored. We present the first comprehensive evaluation of multi-task PEFT for code analysis, comparing several methods across diverse tasks and model architectures. Our experiments show that a single PEFT module shared across tasks can match, and in some cases surpass, full multi-task fine-tuning, confirming that the benefits of PEFT extend beyond isolated tasks. When comparing single-task and multi-task setups, we find that multi-task PEFT achieves a favorable performance-efficiency trade-off: it delivers accuracy close to single-task fine-tuning while reducing storage requirements, cutting the number of trainable parameters by a factor of the task count, and lowering computation costs by as much as 85%. At the same time, multi-task gains remain sensitive to task grouping. Through task-pairing experiments, we identify key factors shaping outcomes: task stability, model architecture, task complementarity, asymmetry, and dataset quality determine the success of co-fine-tuning. Finally, we benchmark efficient multi-task PEFT against direct prompting of open-source general-purpose LLMs, including DeepSeek, Qwen, Mistral, CodeLlama, and StarCoder. Despite their strong performance in code generation, these models underperform on analysis tasks, where even a 1B-parameter model with multi-task PEFT achieves significantly better results.

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Abstract

Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within a single model, but fully fine-tuning LLMs across tasks is computationally prohibitive. Parameter-efficient fine-tuning mitigates this cost by updating only a small fraction of weights. Although PEFT has proven effective in single-task settings, its potential for multi-task learning has not yet been systematically explored. We present the first comprehensive evaluation of multi-task PEFT for code analysis, comparing several methods across diverse tasks and model architectures. Our experiments show that a single PEFT module shared across tasks can match, and in some cases surpass, full multi-task fine-tuning, confirming that the benefits of PEFT extend beyond isolated tasks. When comparing single-task and multi-task setups, we find that multi-task PEFT achieves a favorable performance-efficiency trade-off: it delivers accuracy close to single-task fine-tuning while reducing storage requirements, cutting the number of trainable parameters by a factor of the task count, and lowering computation costs by as much as 85%. At the same time, multi-task gains remain sensitive to task grouping. Through task-pairing experiments, we identify key factors shaping outcomes: task stability, model architecture, task complementarity, asymmetry, and dataset quality determine the success of co-fine-tuning. Finally, we benchmark efficient multi-task PEFT against direct prompting of open-source general-purpose LLMs, including DeepSeek, Qwen, Mistral, CodeLlama, and StarCoder. Despite their strong performance in code generation, these models underperform on analysis tasks, where even a 1B-parameter model with multi-task PEFT achieves significantly better results.
Paper Structure (29 sections, 4 equations, 9 figures, 2 tables)

This paper contains 29 sections, 4 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overview of four PEFT integration patterns in a Transformer block: serial adapters, parallel adapters, prefix-tuning, and LoRA. Colored components denote the added trainable modules, and dashed insets illustrate their internal layouts.
  • Figure 2: Data processing pipeline for our multi-task fine-tuning.
  • Figure 3: Overview of our PEFT-based multi-task fine-tuning pipeline. At every optimization step, we draw a mini-batch from each task and pass the inputs through a shared backbone whose original attention and feed-forward layers are frozen (blue) while the inserted PEFT modules remain trainable (orange). The shared representation is routed to task-specific output heads, producing one loss $\ell_{i}$ per task. A set of learnable weights $w_{i}$ balances these losses before they are summed and back-propagated through the PEFT blocks and the individual heads; backbone weights stay fixed.
  • Figure 4: Mean performance difference (PEFT - full fine-tuning) across four models, reported separately for each task–PEFT pair. Dots represent average differences, while horizontal bars indicate 95% confidence intervals. Colors denote the PEFT method: blue = Serial Adapter (SA), orange = Parallel Adapter (PA), green = LoRA, and red = Prefix. The x-axis shows differences in percentage points (pp), where positive values indicate superior performance of PEFT relative to full fine-tuning.
  • Figure 5: Average performance difference (PEFT - full fine-tuning) across tasks, grouped by model type and PEFT method. Bars show mean differences in percentage points (%) for encoder-decoder models (blue) and decoder-only models (orange). Positive values indicate performance gains from PEFT relative to full fine-tuning, while negative values indicate losses.
  • ...and 4 more figures