Block-Operations: Using Modular Routing to Improve Compositional Generalization

Florian Dietz; Dietrich Klakow

Block-Operations: Using Modular Routing to Improve Compositional Generalization

Florian Dietz, Dietrich Klakow

TL;DR

Neural networks struggle with compositional generalization because routing of information across subnetworks is biased or static. The authors introduce block-operations that split activation tensors into uniformly sized blocks of width $b$ and enforce Modular Representation-Preserving Mappings ($MRPM$) to enable dynamic, object-like routing; and build the SMFR architecture as a stack of MFNNR modules comprising a Multiplexer and FNNR. Empirical results on synthetic tasks (e.g., addition/multiplication, double-addition, algorithmic tasks) and BPMNIST show SMFRs achieve superior compositional generalization, including perfect or near-perfect OOD generalization on several tasks where FNNs and Transformers struggle. This work suggests block-operations as a versatile inductive bias that can be integrated into existing architectures to improve modular routing and generalization, with potential applicability to broader neural architectures.

Abstract

We explore the hypothesis that poor compositional generalization in neural networks is caused by difficulties with learning effective routing. To solve this problem, we propose the concept of block-operations, which is based on splitting all activation tensors in the network into uniformly sized blocks and using an inductive bias to encourage modular routing and modification of these blocks. Based on this concept we introduce the Multiplexer, a new architectural component that enhances the Feed Forward Neural Network (FNN). We experimentally confirm that Multiplexers exhibit strong compositional generalization. On both a synthetic and a realistic task our model was able to learn the underlying process behind the task, whereas both FNNs and Transformers were only able to learn heuristic approximations. We propose as future work to use the principles of block-operations to improve other existing architectures.

Block-Operations: Using Modular Routing to Improve Compositional Generalization

TL;DR

and enforce Modular Representation-Preserving Mappings (

) to enable dynamic, object-like routing; and build the SMFR architecture as a stack of MFNNR modules comprising a Multiplexer and FNNR. Empirical results on synthetic tasks (e.g., addition/multiplication, double-addition, algorithmic tasks) and BPMNIST show SMFRs achieve superior compositional generalization, including perfect or near-perfect OOD generalization on several tasks where FNNs and Transformers struggle. This work suggests block-operations as a versatile inductive bias that can be integrated into existing architectures to improve modular routing and generalization, with potential applicability to broader neural architectures.

Abstract

Paper Structure (21 sections, 4 figures, 7 tables)

This paper contains 21 sections, 4 figures, 7 tables.

Introduction
Related Work
Methodology: Block-Operations
Modules
Experiments
Addition/Multiplication Experiments
Double-Addition Experiments
Algorithmic Experiments
BPMNIST Experiments
Discussion
Limitations
Conclusion
Appendix
Code
Experiment Details, General
...and 6 more sections

Figures (4)

Figure 1: Left. An example FNN receives a layer of 30 input neurons and maps it to a layer of 30 output neurons using densely connected layers. Right. An equivalent SMFR architecture instead views the input as 3 blocks of 10 neurons each and it outputs another 3 blocks of 10 neurons each. In this example, the first output block is a copy of the first input block. The second output block is generated through an FNN based on all 3 input blocks (the FNN is a submodule inside the SMFR). The third output block is a linear interpolation of the 3 input blocks and an FNN output.
Figure 2: Left: A Multiplexer with $M=4$ and $N=3$. Right: An FNNR with $N=3$.
Figure 3: Each line shows the OOD accuracy of one SMFR experiment as training progresses.
Figure 4: Left: The average accuracy of different architectures and variants at different iterations. Right: The fraction of trials that achieved 100% accuracy, instead of the average accuracy. The zig-zag pattern is intended behavior: We train on 2 iterations, so odd-numbered iterations are OOD in a different way than even-numbered ones.

Block-Operations: Using Modular Routing to Improve Compositional Generalization

TL;DR

Abstract

Block-Operations: Using Modular Routing to Improve Compositional Generalization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)