Table of Contents
Fetching ...

EmoAssist: Emotional Assistant for Visual Impairment Community

Xingyu Qi, He Li, Linjie Li, Zhenyu Wu

TL;DR

EmoAssist tackles the lack of emotional intelligence in visual-impaired (VI) assistive AI by introducing the EmoAssist Benchmark and EmoAssist Model. The framework combines an emotion-focused evaluation suite with a 1,000-sample EmoAssist Dataset and a fine-tuning strategy using LoRA and Direct Preference Optimization to align outputs with human emotional preferences. Experimental results demonstrate substantial gains in Empathy and Suggestion metrics, with EmoAssist outperforming strong baselines including GPT-4o and surpassing pre-tuning LMMs, indicating improved recognition of implicit emotions and user intents along with actionable guidance. This work lays a foundation for emotion-aware VI assistance and points to future directions in multimodal and device-integrated solutions for real-world impact.

Abstract

The rapid advancement of large multi-modality models (LMMs) has significantly propelled the integration of artificial intelligence into practical applications. Visual Question Answering (VQA) systems, which can process multi-modal data including vision, text, and audio, hold great potential for assisting the Visual Impairment (VI) community in navigating complex and dynamic real-world environments. However, existing VI assistive LMMs overlook the emotional needs of VI individuals, and current benchmarks lack emotional evaluation of these LMMs. To address these gaps, this paper introduces the EmoAssist Benchmark, a comprehensive benchmark designed to evaluate the assistive performance of LMMs for the VI community. To the best of our knowledge, this is the first benchmark that incorporates emotional intelligence as a key consideration. Furthermore, we propose the EmoAssist Model, an Emotion-Assistive LMM specifically designed for the VI community. The EmoAssist Model utilizes Direct Preference Optimization (DPO) to align outputs with human emotional preferences. Experiment results demonstrate that the EmoAssist Model significantly enhances the recognition of implicit emotions and intentions of VI users, delivers empathetic responses, and provides actionable guidance. Specifically, it shows respective improvements of 147.8% and 89.7% in the Empathy and Suggestion metrics on the EmoAssist Benchmark, compared to the pre-tuning LMM, and even outperforms state-of-the-art LLMs such as GPT-4o.

EmoAssist: Emotional Assistant for Visual Impairment Community

TL;DR

EmoAssist tackles the lack of emotional intelligence in visual-impaired (VI) assistive AI by introducing the EmoAssist Benchmark and EmoAssist Model. The framework combines an emotion-focused evaluation suite with a 1,000-sample EmoAssist Dataset and a fine-tuning strategy using LoRA and Direct Preference Optimization to align outputs with human emotional preferences. Experimental results demonstrate substantial gains in Empathy and Suggestion metrics, with EmoAssist outperforming strong baselines including GPT-4o and surpassing pre-tuning LMMs, indicating improved recognition of implicit emotions and user intents along with actionable guidance. This work lays a foundation for emotion-aware VI assistance and points to future directions in multimodal and device-integrated solutions for real-world impact.

Abstract

The rapid advancement of large multi-modality models (LMMs) has significantly propelled the integration of artificial intelligence into practical applications. Visual Question Answering (VQA) systems, which can process multi-modal data including vision, text, and audio, hold great potential for assisting the Visual Impairment (VI) community in navigating complex and dynamic real-world environments. However, existing VI assistive LMMs overlook the emotional needs of VI individuals, and current benchmarks lack emotional evaluation of these LMMs. To address these gaps, this paper introduces the EmoAssist Benchmark, a comprehensive benchmark designed to evaluate the assistive performance of LMMs for the VI community. To the best of our knowledge, this is the first benchmark that incorporates emotional intelligence as a key consideration. Furthermore, we propose the EmoAssist Model, an Emotion-Assistive LMM specifically designed for the VI community. The EmoAssist Model utilizes Direct Preference Optimization (DPO) to align outputs with human emotional preferences. Experiment results demonstrate that the EmoAssist Model significantly enhances the recognition of implicit emotions and intentions of VI users, delivers empathetic responses, and provides actionable guidance. Specifically, it shows respective improvements of 147.8% and 89.7% in the Empathy and Suggestion metrics on the EmoAssist Benchmark, compared to the pre-tuning LMM, and even outperforms state-of-the-art LLMs such as GPT-4o.

Paper Structure

This paper contains 20 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Emotional intelligence performance of GPT-4o and the EmoAssist Model on VI assistance
  • Figure 2: LMMs Performance on the EmoAssist Benchmark
  • Figure 3: EmoAssist Benchmark
  • Figure 4: EmoAssist Model performance on VI individual queries compared with baseline LMMs