Table of Contents
Fetching ...

MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

Chuang Yu, Jinmiao Zhao, Mingxuan Zhao, Yunpeng Liu, Xiujun Shu, Yuanhao Feng, Bo Wang, Xiangyu Yue

TL;DR

MIND addresses the limitation of single-rationale supervision in multimodal large language models by introducing a multi-rationale discriminative framework. It combines RAD for diverse rationale generation, P2CL for progressive understanding and correction, and MCA for embedding-space discrimination to realize an Understand → Rethink → Correct reasoning cycle. The approach achieves state-of-the-art results across ScienceQA, A-OKVQA, and M3CoT, demonstrating improved reasoning robustness and interpretability. This work proposes a new paradigm for building cognitively capable MLLMs with active self-correction and discrimination capabilities.

Abstract

Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence. Our code is available at https://github.com/YuChuang1205/MIND

MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

TL;DR

MIND addresses the limitation of single-rationale supervision in multimodal large language models by introducing a multi-rationale discriminative framework. It combines RAD for diverse rationale generation, P2CL for progressive understanding and correction, and MCA for embedding-space discrimination to realize an Understand → Rethink → Correct reasoning cycle. The approach achieves state-of-the-art results across ScienceQA, A-OKVQA, and M3CoT, demonstrating improved reasoning robustness and interpretability. This work proposes a new paradigm for building cognitively capable MLLMs with active self-correction and discrimination capabilities.

Abstract

Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence. Our code is available at https://github.com/YuChuang1205/MIND

Paper Structure

This paper contains 16 sections, 8 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Illustration of MIND’s "Understand → Rethink → Correct" paradigm. It consists of two phases: Rationale (reasoning chain) generation and Answer generation. In Phase I, the model focuses on understanding the essential elements of problem-solving. In Phase II, the model rethinks the generated rationales and corrects erroneous reasoning logic.
  • Figure 2: Overview of the MIND reasoning framework. The blue circle denotes the format of the supervision signal.
  • Figure 3: Overview of the RAD paradigm.
  • Figure 4: Performance analysis of Epochs and Caption generation methods on the ScienceQA dataset.
  • Figure S1: Exploring the hyperparameters of the MCA optimization strategy on the ScienceQA dataset. Left:$m$ and $\alpha$. Right: Sampled rationales and Top-k. Red bars denote the optimal.
  • ...and 6 more figures