MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

Chuang Yu; Jinmiao Zhao; Mingxuan Zhao; Yunpeng Liu; Xiujun Shu; Yuanhao Feng; Bo Wang; Xiangyu Yue

MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

Chuang Yu, Jinmiao Zhao, Mingxuan Zhao, Yunpeng Liu, Xiujun Shu, Yuanhao Feng, Bo Wang, Xiangyu Yue

TL;DR

MIND addresses the limitation of single-rationale supervision in multimodal large language models by introducing a multi-rationale discriminative framework. It combines RAD for diverse rationale generation, P2CL for progressive understanding and correction, and MCA for embedding-space discrimination to realize an Understand → Rethink → Correct reasoning cycle. The approach achieves state-of-the-art results across ScienceQA, A-OKVQA, and M3CoT, demonstrating improved reasoning robustness and interpretability. This work proposes a new paradigm for building cognitively capable MLLMs with active self-correction and discrimination capabilities.

Abstract

Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence. Our code is available at https://github.com/YuChuang1205/MIND

MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

TL;DR

Abstract

MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)