Table of Contents
Fetching ...

Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in finance

Sarmistha Das, Basha Mujavarsheik, R E Zera Lyngkhoi, Sriparna Saha, Alka Maurya

TL;DR

The paper addresses the challenge of extracting aspect-level complaints from multimodal financial reviews by introducing MulComp, a video-based dataset with five financial aspects and utterance-level annotations, and Solution 3.0, a CLIP-based multimodal model with an image segment encoder and contextual attention for robust fusion. The framework performs multitask, multilabel predictions across text, audio, and video, utilizing a three-phase pipeline: feature extraction with CLIP, multimodal fusion via ISEC, and a transformer-based multilabel classifier. Empirical results show Solution 3.0 achieving superior performance over strong baselines and SOTA models, with ablation studies highlighting the importance of audio modality, ISEC, and multimodal fusion. The work contributes a novel dataset and a scalable, generalizable model for financial complaint mining, with potential impact on customer care and regulatory analytics in finance.

Abstract

In today's competitive marketing landscape, effective complaint management is crucial for customer service and business success. Video complaints, integrating text and image content, offer invaluable insights by addressing customer grievances and delineating product benefits and drawbacks. However, comprehending nuanced complaint aspects within vast daily multimodal financial data remains a formidable challenge. Addressing this gap, we have curated a proprietary multimodal video complaint dataset comprising 433 publicly accessible instances. Each instance is meticulously annotated at the utterance level, encompassing five distinct categories of financial aspects and their associated complaint labels. To support this endeavour, we introduce Solution 3.0, a model designed for multimodal aspect-based complaint identification task. Solution 3.0 is tailored to perform three key tasks: 1) handling multimodal features ( audio and video), 2) facilitating multilabel aspect classification, and 3) conducting multitasking for aspect classifications and complaint identification parallelly. Solution 3.0 utilizes a CLIP-based dual frozen encoder with an integrated image segment encoder for global feature fusion, enhanced by contextual attention (ISEC) to improve accuracy and efficiency. Our proposed framework surpasses current multimodal baselines, exhibiting superior performance across nearly all metrics by opening new ways to strengthen appropriate customer care initiatives and effectively assisting individuals in resolving their problems.

Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in finance

TL;DR

The paper addresses the challenge of extracting aspect-level complaints from multimodal financial reviews by introducing MulComp, a video-based dataset with five financial aspects and utterance-level annotations, and Solution 3.0, a CLIP-based multimodal model with an image segment encoder and contextual attention for robust fusion. The framework performs multitask, multilabel predictions across text, audio, and video, utilizing a three-phase pipeline: feature extraction with CLIP, multimodal fusion via ISEC, and a transformer-based multilabel classifier. Empirical results show Solution 3.0 achieving superior performance over strong baselines and SOTA models, with ablation studies highlighting the importance of audio modality, ISEC, and multimodal fusion. The work contributes a novel dataset and a scalable, generalizable model for financial complaint mining, with potential impact on customer care and regulatory analytics in finance.

Abstract

In today's competitive marketing landscape, effective complaint management is crucial for customer service and business success. Video complaints, integrating text and image content, offer invaluable insights by addressing customer grievances and delineating product benefits and drawbacks. However, comprehending nuanced complaint aspects within vast daily multimodal financial data remains a formidable challenge. Addressing this gap, we have curated a proprietary multimodal video complaint dataset comprising 433 publicly accessible instances. Each instance is meticulously annotated at the utterance level, encompassing five distinct categories of financial aspects and their associated complaint labels. To support this endeavour, we introduce Solution 3.0, a model designed for multimodal aspect-based complaint identification task. Solution 3.0 is tailored to perform three key tasks: 1) handling multimodal features ( audio and video), 2) facilitating multilabel aspect classification, and 3) conducting multitasking for aspect classifications and complaint identification parallelly. Solution 3.0 utilizes a CLIP-based dual frozen encoder with an integrated image segment encoder for global feature fusion, enhanced by contextual attention (ISEC) to improve accuracy and efficiency. Our proposed framework surpasses current multimodal baselines, exhibiting superior performance across nearly all metrics by opening new ways to strengthen appropriate customer care initiatives and effectively assisting individuals in resolving their problems.

Paper Structure

This paper contains 13 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An instance of Aspect Based Complaint identification model vs traditional unimodal model.
  • Figure 2: An instance sample in proposed MulComp Dataset with two prime class labels
  • Figure 3: Architectural view of our proposed Solution 3.0 model; Video frames and Audio Transcripts are passed as input
  • Figure 4: A sample instance of aspect label classification task