Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li
TL;DR
This work addresses generating sentiment-controlled feedback from multimodal inputs (text and images) by introducing the CMFeed dataset and a dual-branch feedback synthesis system with a controllability layer. The textual encoder uses a Transformer while the visual encoder relies on Faster R-CNN, and a KAAP-based interpretability framework reveals how features drive sentiment in generated feedback. Empirical results show a sentiment classification accuracy of $77.23\%$ and improved semantic relevance and ranking (MRR $=0.3789$) over baselines, with human evaluation validating sentiment alignment and relevance. The dataset and code are publicly available, enabling research in empathetic, context-aware feedback for education, healthcare, marketing, and customer service, along with transparent control signals to foster user trust.
Abstract
The ability to generate sentiment-controlled feedback in response to multimodal inputs comprising text and images addresses a critical gap in human-computer interaction. This capability allows systems to provide empathetic, accurate, and engaging responses, with useful applications in education, healthcare, marketing, and customer service. To this end, we have constructed a large-scale Controllable Multimodal Feedback Synthesis (CMFeed) dataset and proposed a controllable feedback synthesis system. The system features an encoder, decoder, and controllability block for textual and visual inputs. It extracts features using a transformer and a Faster R-CNN network, combining them to generate feedback. The CMFeed dataset includes images, texts, reactions to the posts, human comments with relevance scores, and reactions to these comments. These reactions train the model to produce feedback with specified sentiments, achieving a sentiment classification accuracy of 77.23%, which is 18.82% higher than the accuracy without controllability. Access to the CMFeed dataset and the system's code is available at https://github.com/MIntelligence-Group/CMFeed.
