Table of Contents
Fetching ...

MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis

Peng Shu, Junhao Chen, Zhengliang Liu, Hanqi Jiang, Yi Pan, Khanh Nhu Nguyen, Zihao Wu, Huaqin Zhao, Yiwei Li, Enze Shi, ShaoChen Xu

TL;DR

MoMoE introduces a two-tier architecture that fuses sparsely gated Mixture of Experts with a Mixture of Agents framework on top of LLaMA-3.1-8B to tackle financial sentiment analysis. By replacing the final attention FFN with a four-layer MoE and adding a load-balancing loss, the model achieves micro and macro specialization; its integration with multiple agents allows cross-perspective refinement, yielding state-of-the-art results across diverse financial datasets. The approach demonstrates robust gains in accuracy and F1 scores while highlighting practical considerations for multi-agent robustness and potential bias from consensus among intermediates. Overall, MoMoE offers a scalable, domain-tuned paradigm for expert-guided large language models in finance with strong implications for sentiment-driven decision support.

Abstract

We present a novel approach called Mixture of Mixture of Expert (MoMoE) that combines the strengths of Mixture-of-Experts (MoE) architectures with collaborative multi-agent frameworks. By modifying the LLaMA 3.1 8B architecture to incorporate MoE layers in each agent of a layered collaborative structure, we create an ensemble of specialized expert agents that iteratively refine their outputs. Each agent leverages an MoE layer in its final attention block, enabling efficient task decomposition while maintaining computational feasibility. This hybrid approach creates specialized pathways through both the model architecture and the agent collaboration layers. Experimental results demonstrate significant improvements across multiple language understanding and generation benchmarks, highlighting the synergistic benefits of combining expert routing at both the neural and agent levels.

MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis

TL;DR

MoMoE introduces a two-tier architecture that fuses sparsely gated Mixture of Experts with a Mixture of Agents framework on top of LLaMA-3.1-8B to tackle financial sentiment analysis. By replacing the final attention FFN with a four-layer MoE and adding a load-balancing loss, the model achieves micro and macro specialization; its integration with multiple agents allows cross-perspective refinement, yielding state-of-the-art results across diverse financial datasets. The approach demonstrates robust gains in accuracy and F1 scores while highlighting practical considerations for multi-agent robustness and potential bias from consensus among intermediates. Overall, MoMoE offers a scalable, domain-tuned paradigm for expert-guided large language models in finance with strong implications for sentiment-driven decision support.

Abstract

We present a novel approach called Mixture of Mixture of Expert (MoMoE) that combines the strengths of Mixture-of-Experts (MoE) architectures with collaborative multi-agent frameworks. By modifying the LLaMA 3.1 8B architecture to incorporate MoE layers in each agent of a layered collaborative structure, we create an ensemble of specialized expert agents that iteratively refine their outputs. Each agent leverages an MoE layer in its final attention block, enabling efficient task decomposition while maintaining computational feasibility. This hybrid approach creates specialized pathways through both the model architecture and the agent collaboration layers. Experimental results demonstrate significant improvements across multiple language understanding and generation benchmarks, highlighting the synergistic benefits of combining expert routing at both the neural and agent levels.

Paper Structure

This paper contains 19 sections, 6 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overall architecture of our LLaMoE model. We replace the FFN in the final attention block with a MoE layer. The router dynamically selects the top-k experts, each consisting of a three-layer structure incorporating SwiGLU activations. The MoE output is then combined with the residual connection from the preceding Multi-Head Attention layer to ensure stable gradient flow and information retention.
  • Figure 2: Illustration of our single-layer agent system following the MoA structure. The input prompt is processed independently by three agents: LLaMoE, GPT-4o, and DeepSeek V3. Their intermediate outputs are concatenated with the original prompt and subsequently fed into a final decision-making agent (GPT-4o) to produce the final classification output.
  • Figure 3: One example illustrates how the final agent overrides an incorrect intermediate prediction and determines the correct classification outcome.