Disentangling Emotional Bases and Transient Fluctuations: A Low-Rank Sparse Decomposition Approach for Video Affective Analysis
Feng-Qi Cui, Jinyang Huang, Ziyu Jia, Xinyu Li, Xin Yan, Xiaokang Zhou, Meng Wang
TL;DR
This work addresses the instability and entanglement in video-based affective computing by proposing LSEF, a hierarchical framework that disentangles stable emotional bases from transient fluctuations using a low-rank sparse decomposition. The method integrates three plug-and-play modules—Stability Encoding (SEM), Dynamic Decoupling (DDM), and Consistency Integration (CIM)—alongside Rank Aware Optimization (RAO) to balance smoothness and sensitivity. Empirical results on DFEW, FERV39k, and VEATIC demonstrate state-of-the-art performance for both discrete emotion recognition and continuous valence-arousal estimation, highlighting improved robustness and dynamic discrimination. The approach provides a principled, interpretable foundation for modeling affective dynamics and holds promise for generalization across varied real-world conditions and potential multi-modal extensions.
Abstract
Video-based Affective Computing (VAC), vital for emotion analysis and human-computer interaction, suffers from model instability and representational degradation due to complex emotional dynamics. Since the meaning of different emotional fluctuations may differ under different emotional contexts, the core limitation is the lack of a hierarchical structural mechanism to disentangle distinct affective components, i.e., emotional bases (the long-term emotional tone), and transient fluctuations (the short-term emotional fluctuations). To address this, we propose the Low-Rank Sparse Emotion Understanding Framework (LSEF), a unified model grounded in the Low-Rank Sparse Principle, which theoretically reframes affective dynamics as a hierarchical low-rank sparse compositional process. LSEF employs three plug-and-play modules, i.e., the Stability Encoding Module (SEM) captures low-rank emotional bases; the Dynamic Decoupling Module (DDM) isolates sparse transient signals; and the Consistency Integration Module (CIM) reconstructs multi-scale stability and reactivity coherence. This framework is optimized by a Rank Aware Optimization (RAO) strategy that adaptively balances gradient smoothness and sensitivity. Extensive experiments across multiple datasets confirm that LSEF significantly enhances robustness and dynamic discrimination, which further validates the effectiveness and generality of hierarchical low-rank sparse modeling for understanding affective dynamics.
