Table of Contents
Fetching ...

Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models

Zhichao Sheng, Shilin Zhou, Chen Gong, Zhenghua Li

TL;DR

This work tackles efficiency and robustness in reasoning for Large Audio Language Models by enabling difficulty-aware adaptation of reasoning depth. It critically analyzes SFT versus GRPO and explicit versus implicit prompting, finding that explicit reasoning via GRPO aids hard problems but can cause redundant content on easy ones. The authors propose two difficulty-adaptive length-based rewards, GRDR and GA2DR, which tie reasoning length to question difficulty through model-perspective metrics and an audio attention based measure. Extensive MMAU benchmarks show that these rewards improve performance on hard items while substantially reducing reasoning length, with ablations confirming the benefits of adaptive, not fixed, length control. The study also provides insights into reasoning structure paradigms and offers practical guidance for training LALMs with variable reasoning depth across task difficulty.

Abstract

Large Audio Language Models (LALMs), powered by the chain-of-thought (CoT) paradigm, have shown remarkable reasoning capabilities. Intuitively, different problems often require varying depths of reasoning. While some methods can determine whether to reason for a given problem, they typically lack a fine-grained mechanism to modulate how much to reason. This often results in a ``one-size-fits-all'' reasoning depth, which generates redundant overthinking for simple questions while failing to allocate sufficient thought to complex ones. In this paper, we conduct an in-depth analysis of LALMs and find that an effective and efficient LALM should reason smartly by adapting its reasoning depth to the problem's complexity. To achieve this, we propose a difficulty-adaptive reasoning method for LALMs. Specifically, we propose a reward function that dynamically links reasoning length to the model's perceived problem difficulty. This reward encourages shorter, concise reasoning for easy tasks and more elaborate, in-depth reasoning for complex ones. Extensive experiments demonstrate that our method is both effective and efficient, simultaneously improving task performance and significantly reducing the average reasoning length. Further analysis on reasoning structure paradigm offers valuable insights for future work.

Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models

TL;DR

This work tackles efficiency and robustness in reasoning for Large Audio Language Models by enabling difficulty-aware adaptation of reasoning depth. It critically analyzes SFT versus GRPO and explicit versus implicit prompting, finding that explicit reasoning via GRPO aids hard problems but can cause redundant content on easy ones. The authors propose two difficulty-adaptive length-based rewards, GRDR and GA2DR, which tie reasoning length to question difficulty through model-perspective metrics and an audio attention based measure. Extensive MMAU benchmarks show that these rewards improve performance on hard items while substantially reducing reasoning length, with ablations confirming the benefits of adaptive, not fixed, length control. The study also provides insights into reasoning structure paradigms and offers practical guidance for training LALMs with variable reasoning depth across task difficulty.

Abstract

Large Audio Language Models (LALMs), powered by the chain-of-thought (CoT) paradigm, have shown remarkable reasoning capabilities. Intuitively, different problems often require varying depths of reasoning. While some methods can determine whether to reason for a given problem, they typically lack a fine-grained mechanism to modulate how much to reason. This often results in a ``one-size-fits-all'' reasoning depth, which generates redundant overthinking for simple questions while failing to allocate sufficient thought to complex ones. In this paper, we conduct an in-depth analysis of LALMs and find that an effective and efficient LALM should reason smartly by adapting its reasoning depth to the problem's complexity. To achieve this, we propose a difficulty-adaptive reasoning method for LALMs. Specifically, we propose a reward function that dynamically links reasoning length to the model's perceived problem difficulty. This reward encourages shorter, concise reasoning for easy tasks and more elaborate, in-depth reasoning for complex ones. Extensive experiments demonstrate that our method is both effective and efficient, simultaneously improving task performance and significantly reducing the average reasoning length. Further analysis on reasoning structure paradigm offers valuable insights for future work.

Paper Structure

This paper contains 39 sections, 11 equations, 15 figures, 15 tables.

Figures (15)

  • Figure 1: Curves of GRDR and GA2DR with normalized length.
  • Figure 2: The trend of average length across different models on MMAU-Test-Mini, under both the human-perspective difficulty and model-perspective difficulty. The length is measured in tokens and is presented after applying a logarithmic transformation.
  • Figure 3: The trend of average reasoning length for direct GRPO and our two proposed methods on MMAU-Test-Mini, evaluated under both human-perspective and model-perspective difficulty. Length is measured directly in tokens without any logarithmic transformation.
  • Figure 4: An audio QA example from "$Friends$". The top-left shows the question and options (green indicates the correct one), the right side presents the audio dialogue, and the bottom-left shows the output of our proposed method on Qwen2.5-Omni-7B.
  • Figure 5: Different prompt templates for GRPO, where Prompt1 is the implicit prompt and Prompt2 is the explicit prompt.
  • ...and 10 more figures