Deep Learning with Dynamic Computation Graphs
Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, Peter Norvig
TL;DR
The paper tackles the challenge of efficiently training neural networks on dynamic computation graphs by introducing dynamic batching, which batches operations across disparate graph shapes and within individual graphs. It presents TensorFlow Fold, a combinator library that enables concise construction of DCG models and automatically leverages dynamic batching through a static graph backend. The approach yields substantial speedups on GPUs and demonstrates state-of-the-art results on tree-structured and graph-structured tasks, including Tree-LSTM on sentiment and graph convolutions for molecules. The work highlights practical benefits for practitioners needing scalable DCG models with rich architectures.
Abstract
Neural networks that compute over graph structures are a natural fit for problems in a variety of domains, including natural language (parse trees) and cheminformatics (molecular graphs). However, since the computation graph has a different shape and size for every input, such networks do not directly support batched training or inference. They are also difficult to implement in popular deep learning libraries, which are based on static data-flow graphs. We introduce a technique called dynamic batching, which not only batches together operations between different input graphs of dissimilar shape, but also between different nodes within a single input graph. The technique allows us to create static graphs, using popular libraries, that emulate dynamic computation graphs of arbitrary shape and size. We further present a high-level library of compositional blocks that simplifies the creation of dynamic graph models. Using the library, we demonstrate concise and batch-wise parallel implementations for a variety of models from the literature.
