Table of Contents
Fetching ...

Conditional Information Gain Trellis

Ufuk Can Bicici, Tuna Han Salih Meral, Lale Akarun

TL;DR

The paper tackles the high computational cost of deep CNNs by introducing Conditional Information Gain Trellis (CIGT), a trellis-structured network that routes samples through selectively activated blocks using differentiable information gain-based routing. By formulating a global objective that combines a mixture-of-experts loss with layer-wise information gain terms and a balancing term, CIGT learns expert paths that group semantically similar classes, enabling efficient inference without sacrificing accuracy. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 show that CIGT achieves competitive or superior accuracy while using substantially fewer MACs and parameters compared to thick baselines and other conditional computation methods. The work demonstrates that trellis-based routing with information gain can deliver practical, edge-friendly deep learning performance and offers avenues for multi-path inference to further boost accuracy.

Abstract

Conditional computing processes an input using only part of the neural network's computational units. Learning to execute parts of a deep convolutional network by routing individual samples has several advantages: Reducing the computational burden is an obvious advantage. Furthermore, if similar classes are routed to the same path, that part of the network learns to discriminate between finer differences and better classification accuracies can be attained with fewer parameters. Recently, several papers have exploited this idea to take a particular child of a node in a tree-shaped network or to skip parts of a network. In this work, we follow a Trellis-based approach for generating specific execution paths in a deep convolutional neural network. We have designed routing mechanisms that use differentiable information gain-based cost functions to determine which subset of features in a convolutional layer will be executed. We call our method Conditional Information Gain Trellis (CIGT). We show that our conditional execution mechanism achieves comparable or better model performance compared to unconditional baselines, using only a fraction of the computational resources.

Conditional Information Gain Trellis

TL;DR

The paper tackles the high computational cost of deep CNNs by introducing Conditional Information Gain Trellis (CIGT), a trellis-structured network that routes samples through selectively activated blocks using differentiable information gain-based routing. By formulating a global objective that combines a mixture-of-experts loss with layer-wise information gain terms and a balancing term, CIGT learns expert paths that group semantically similar classes, enabling efficient inference without sacrificing accuracy. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 show that CIGT achieves competitive or superior accuracy while using substantially fewer MACs and parameters compared to thick baselines and other conditional computation methods. The work demonstrates that trellis-based routing with information gain can deliver practical, edge-friendly deep learning performance and offers avenues for multi-path inference to further boost accuracy.

Abstract

Conditional computing processes an input using only part of the neural network's computational units. Learning to execute parts of a deep convolutional network by routing individual samples has several advantages: Reducing the computational burden is an obvious advantage. Furthermore, if similar classes are routed to the same path, that part of the network learns to discriminate between finer differences and better classification accuracies can be attained with fewer parameters. Recently, several papers have exploited this idea to take a particular child of a node in a tree-shaped network or to skip parts of a network. In this work, we follow a Trellis-based approach for generating specific execution paths in a deep convolutional neural network. We have designed routing mechanisms that use differentiable information gain-based cost functions to determine which subset of features in a convolutional layer will be executed. We call our method Conditional Information Gain Trellis (CIGT). We show that our conditional execution mechanism achieves comparable or better model performance compared to unconditional baselines, using only a fraction of the computational resources.
Paper Structure (17 sections, 11 equations, 7 figures, 3 tables)

This paper contains 17 sections, 11 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: An example of the proposed CIGT architecture. The output of the first layer $F_{0}$ is used as input to router $H_{0}$. The output of $F_{0}$ follows the routing unit $k$ that $H_{0}$ decides. Then, the output of layer $F_{1,k}$ is used as input to router $H_{1}$. The output of $F_{1,k}$ follows the routing unit $j$$H_{1}$ decides. And the output of $F_{2,j}$ follows the layers $F_{3}$ and $F_{4}$.
  • Figure 2: The class distribution of Fashion MNIST samples into CIGT routing units.
  • Figure 3: Fashion MNIST Test set sample distribution in each of routing blocks.
  • Figure 4: The class distribution of CIFAR 10 samples into CIGT routing units.
  • Figure 5: CIFAR 10 Test set sample distribution in each of routing blocks.
  • ...and 2 more figures