Table of Contents
Fetching ...

Synergistic Feature Fusion for Latent Lyrical Classification: A Gated Deep Learning Architecture

M. A. Gameiro

TL;DR

This work tackles the fusion of high-dimensional SBERT semantics with low-dimensional structural cues for lyric classification by introducing the Synergistic Fusion Layer (SFL), a gated fusion mechanism where $F_{SFL} = F_{deep} \odot G$ with $G=\sigma(W_G F_{struct} + b)$. Relative to a concatenation-based RF baseline, SFL delivers higher accuracy and dramatically improved calibration, achieving $ACC=0.9894$ and $ECE=0.00351$—a 93% reduction in miscalibrated confidence and a 2.5× improvement in Log Loss. The results demonstrate the value of non-linear, context-aware fusion for multimodal lyric analysis and emphasize the practical importance of reliable probability estimates in deployment. The approach holds potential for tighter integration of gating mechanisms within transformer-based architectures to enhance context-sensitive interpretation of structural metadata.

Abstract

This study addresses the challenge of integrating complex, high-dimensional deep semantic features with simple, interpretable structural cues for lyrical content classification. We introduce a novel Synergistic Fusion Layer (SFL) architecture, a deep learning model utilizing a gated mechanism to modulate Sentence-BERT embeddings (Fdeep) using low-dimensional auxiliary features (Fstruct). The task, derived from clustering UMAP-reduced lyrical embeddings, is reframed as binary classification, distinguishing a dominant, homogeneous cluster (Class 0) from all other content (Class 1). The SFL model achieved an accuracy of 0.9894 and a Macro F1 score of 0.9894, outperforming a comprehensive Random Forest (RF) baseline that used feature concatenation (Accuracy = 0.9868). Crucially, the SFL model demonstrated vastly superior reliability and calibration, exhibiting a 93% reduction in Expected Calibration Error (ECE = 0.0035) and a 2.5x lower Log Loss (0.0304) compared to the RF baseline (ECE = 0.0500; Log Loss = 0.0772). This performance validates the architectural hypothesis that non-linear gating is superior to simple feature concatenation, establishing the SFL model as a robust and trustworthy system for complex multimodal lyrical analysis.

Synergistic Feature Fusion for Latent Lyrical Classification: A Gated Deep Learning Architecture

TL;DR

This work tackles the fusion of high-dimensional SBERT semantics with low-dimensional structural cues for lyric classification by introducing the Synergistic Fusion Layer (SFL), a gated fusion mechanism where with . Relative to a concatenation-based RF baseline, SFL delivers higher accuracy and dramatically improved calibration, achieving and —a 93% reduction in miscalibrated confidence and a 2.5× improvement in Log Loss. The results demonstrate the value of non-linear, context-aware fusion for multimodal lyric analysis and emphasize the practical importance of reliable probability estimates in deployment. The approach holds potential for tighter integration of gating mechanisms within transformer-based architectures to enhance context-sensitive interpretation of structural metadata.

Abstract

This study addresses the challenge of integrating complex, high-dimensional deep semantic features with simple, interpretable structural cues for lyrical content classification. We introduce a novel Synergistic Fusion Layer (SFL) architecture, a deep learning model utilizing a gated mechanism to modulate Sentence-BERT embeddings (Fdeep) using low-dimensional auxiliary features (Fstruct). The task, derived from clustering UMAP-reduced lyrical embeddings, is reframed as binary classification, distinguishing a dominant, homogeneous cluster (Class 0) from all other content (Class 1). The SFL model achieved an accuracy of 0.9894 and a Macro F1 score of 0.9894, outperforming a comprehensive Random Forest (RF) baseline that used feature concatenation (Accuracy = 0.9868). Crucially, the SFL model demonstrated vastly superior reliability and calibration, exhibiting a 93% reduction in Expected Calibration Error (ECE = 0.0035) and a 2.5x lower Log Loss (0.0304) compared to the RF baseline (ECE = 0.0500; Log Loss = 0.0772). This performance validates the architectural hypothesis that non-linear gating is superior to simple feature concatenation, establishing the SFL model as a robust and trustworthy system for complex multimodal lyrical analysis.

Paper Structure

This paper contains 15 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: UMAP Projection of Lyrics Embeddings. (a) shows the 11 intrinsic clusters identified by HDBSCAN. (b) shows the reframed binary target: Class 0 is the single largest intrinsic cluster (Dominant Archetype), and Class 1 comprises all other content, confirming the high topological separability of the final classification task.
  • Figure 2: Architecture of the Synergistic Fusion Layer (SFL) Model. The structural cues ($F_{\text{struct}}$) are used to generate a Gating Vector ($G$), which non-linearly modulates the deep semantic embeddings ($F_{\text{deep}}$) via element-wise multiplication ($\odot$) before final classification.
  • Figure 3: Feature Importance of Auxiliary Features (Random Forest Baseline). This plot quantifies the direct predictive contribution of the four structural features ($F_{\text{struct}}$) when concatenated with the deep embeddings in the RF Baseline model, providing context for the SFL's non-linear fusion strategy.
  • Figure 4: Confusion Matrix (SFL Model). The near-equal distribution of False Negatives (105) and False Positives (105) confirms the model's balanced error rate and high classification fidelity.
  • Figure 5: SFL Model Discriminative Performance. Both curves confirm the model's maximal discriminative power and high performance on the minority class.