Table of Contents
Fetching ...

Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node

Andreas Charalampopoulos, Nikolas Chatzis, Foivos Ntoulas-Panagiotopoulos, Charilaos Papaioannou, Alexandros Potamianos

TL;DR

The paper tackles training instability and leaf utilization imbalance in Fast Feed Forward networks (FFFs) by introducing MoE-inspired load balancing and a Master Leaf to improve accuracy and stability. It formalizes an enhanced FFF (eFFF) with a load-balancing loss term and a Master Leaf that blends with the FFF output via a trainable coefficient k, using the objective $L' = L_{\text{pred}} + h L_{\text{harden}} + \alpha L_{\text{balance}}$. The authors provide detailed architectural definitions (tree-conditional activation), training dynamics, and the Master Leaf integration, accompanied by experiments on MNIST and FashionMNIST showing training gains up to 16.3 percentage points and test gains up to 3 percentage points, along with reduced result variance. Overall, the work demonstrates that MoE-inspired techniques can enhance FFFs, offering more accurate and robust inference with potential applicability to broader datasets and larger models.

Abstract

Fast feedforward networks (FFFs) are a class of neural networks that exploit the observation that different regions of the input space activate distinct subsets of neurons in wide networks. FFFs partition the input space into separate sections using a differentiable binary tree of neurons and during inference descend the binary tree in order to improve computational efficiency. Inspired by Mixture of Experts (MoE) research, we propose the incorporation of load balancing and Master Leaf techniques into the FFF architecture to improve performance and simplify the training process. We reproduce experiments found in literature and present results on FFF models enhanced using these techniques. The proposed architecture and training recipe achieves up to 16.3% and 3% absolute classification accuracy increase in training and test accuracy, respectively, compared to the original FFF architecture. Additionally, we observe a smaller variance in the results compared to those reported in prior research. These findings demonstrate the potential of integrating MoE-inspired techniques into FFFs for developing more accurate and efficient models.

Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node

TL;DR

The paper tackles training instability and leaf utilization imbalance in Fast Feed Forward networks (FFFs) by introducing MoE-inspired load balancing and a Master Leaf to improve accuracy and stability. It formalizes an enhanced FFF (eFFF) with a load-balancing loss term and a Master Leaf that blends with the FFF output via a trainable coefficient k, using the objective . The authors provide detailed architectural definitions (tree-conditional activation), training dynamics, and the Master Leaf integration, accompanied by experiments on MNIST and FashionMNIST showing training gains up to 16.3 percentage points and test gains up to 3 percentage points, along with reduced result variance. Overall, the work demonstrates that MoE-inspired techniques can enhance FFFs, offering more accurate and robust inference with potential applicability to broader datasets and larger models.

Abstract

Fast feedforward networks (FFFs) are a class of neural networks that exploit the observation that different regions of the input space activate distinct subsets of neurons in wide networks. FFFs partition the input space into separate sections using a differentiable binary tree of neurons and during inference descend the binary tree in order to improve computational efficiency. Inspired by Mixture of Experts (MoE) research, we propose the incorporation of load balancing and Master Leaf techniques into the FFF architecture to improve performance and simplify the training process. We reproduce experiments found in literature and present results on FFF models enhanced using these techniques. The proposed architecture and training recipe achieves up to 16.3% and 3% absolute classification accuracy increase in training and test accuracy, respectively, compared to the original FFF architecture. Additionally, we observe a smaller variance in the results compared to those reported in prior research. These findings demonstrate the potential of integrating MoE-inspired techniques into FFFs for developing more accurate and efficient models.
Paper Structure (15 sections, 8 equations, 2 figures, 3 tables)