Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework

Bakary Badjie; José Cecílio; António Casimiro

Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework

Bakary Badjie, José Cecílio, António Casimiro

TL;DR

The paper tackles noise and scalability challenges in image-classification MoEs by introducing the DFCP-MoE framework, which uses double-stage feature-level clustering, pseudo-labeling, and conditional end-to-end training to improve expert specialization and reduce inference latency. It employs EfficientNet-B1 feature extraction, two-stage clustering (with K-means and a refinement step using nearest neighbors), SiameseNet-based pseudo-labeling, and a gate network that assigns clusters to specialized experts, all trained jointly. The approach achieves state-of-the-art results on GTSRB with 43 clusters/expert models, demonstrating superior mAP and balanced accuracy while maintaining real-time inference, albeit with increased training complexity. This work advances large-scale image classification by partitioning the input space into subdomains and leveraging labeled data to label unlabeled samples, potentially benefiting real-world systems requiring both accuracy and efficiency, with opportunities to extend to video and improve outlier robustness.

Abstract

The Mixture-of-Experts (MoE) model has succeeded in deep learning (DL). However, its complex architecture and advantages over dense models in image classification remain unclear. In previous studies, MoE performance has often been affected by noise and outliers in the input space. Some approaches incorporate input clustering for training MoE models, but most clustering algorithms lack access to labeled data, limiting their effectiveness. This paper introduces the Double-stage Feature-level Clustering and Pseudo-labeling-based Mixture of Experts (DFCP-MoE) framework, which consists of input feature extraction, feature-level clustering, and a computationally efficient pseudo-labeling strategy. This approach reduces the impact of noise and outliers while leveraging a small subset of labeled data to label a large portion of unlabeled inputs. We propose a conditional end-to-end joint training method that improves expert specialization by training the MoE model on well-labeled, clustered inputs. Unlike traditional MoE and dense models, the DFCP-MoE framework effectively captures input space diversity, leading to competitive inference results. We validate our approach on three benchmark datasets for multi-class classification tasks.

Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework

TL;DR

Abstract

Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)