Table of Contents
Fetching ...

Learning Tree-Structured Composition of Data Augmentation

Dongyue Li, Kailai Chen, Predrag Radivojac, Hongyang R. Zhang

TL;DR

The paper tackles the combinatorial challenge of learning effective data augmentation by introducing a binary tree-structured composition that models augmentation as a depth-d tree with 2^d−1 nodes. A greedy, top-down search constructs the tree in O(2^d k) time, significantly faster than the traditional O(k^d) worst case, and a density-matching approach evaluates candidates without retraining. It further extends to heterogeneous subpopulations by learning one tree per group and combining them into a forest with weights learned via a bilevel optimization, enabling robust, group-aware augmentation. Across graph and image benchmarks—including a newly collected AlphaFold-based protein graph dataset—the method achieves substantial runtime reductions (e.g., 43%) and improved performance (up to 4.3%), while providing interpretable importance scores for each transformation. These results suggest practical, scalable augmentation search with interpretable structures and principled handling of group shifts in real-world data.

Abstract

Data augmentation is widely used for training a neural network given little labeled data. A common practice of augmentation training is applying a composition of multiple transformations sequentially to the data. Existing augmentation methods such as RandAugment randomly sample from a list of pre-selected transformations, while methods such as AutoAugment apply advanced search to optimize over an augmentation set of size $k^d$, which is the number of transformation sequences of length $d$, given a list of $k$ transformations. In this paper, we design efficient algorithms whose running time complexity is much faster than the worst-case complexity of $O(k^d)$, provably. We propose a new algorithm to search for a binary tree-structured composition of $k$ transformations, where each tree node corresponds to one transformation. The binary tree generalizes sequential augmentations, such as the SimCLR augmentation scheme for contrastive learning. Using a top-down, recursive search procedure, our algorithm achieves a runtime complexity of $O(2^d k)$, which is much faster than $O(k^d)$ as $k$ increases above $2$. We apply our algorithm to tackle data distributions with heterogeneous subpopulations by searching for one tree in each subpopulation and then learning a weighted combination, resulting in a forest of trees. We validate our proposed algorithms on numerous graph and image datasets, including a multi-label graph classification dataset we collected. The dataset exhibits significant variations in the sizes of graphs and their average degrees, making it ideal for studying data augmentation. We show that our approach can reduce the computation cost by 43% over existing search methods while improving performance by 4.3%. The tree structures can be used to interpret the relative importance of each transformation, such as identifying the important transformations on small vs. large graphs.

Learning Tree-Structured Composition of Data Augmentation

TL;DR

The paper tackles the combinatorial challenge of learning effective data augmentation by introducing a binary tree-structured composition that models augmentation as a depth-d tree with 2^d−1 nodes. A greedy, top-down search constructs the tree in O(2^d k) time, significantly faster than the traditional O(k^d) worst case, and a density-matching approach evaluates candidates without retraining. It further extends to heterogeneous subpopulations by learning one tree per group and combining them into a forest with weights learned via a bilevel optimization, enabling robust, group-aware augmentation. Across graph and image benchmarks—including a newly collected AlphaFold-based protein graph dataset—the method achieves substantial runtime reductions (e.g., 43%) and improved performance (up to 4.3%), while providing interpretable importance scores for each transformation. These results suggest practical, scalable augmentation search with interpretable structures and principled handling of group shifts in real-world data.

Abstract

Data augmentation is widely used for training a neural network given little labeled data. A common practice of augmentation training is applying a composition of multiple transformations sequentially to the data. Existing augmentation methods such as RandAugment randomly sample from a list of pre-selected transformations, while methods such as AutoAugment apply advanced search to optimize over an augmentation set of size , which is the number of transformation sequences of length , given a list of transformations. In this paper, we design efficient algorithms whose running time complexity is much faster than the worst-case complexity of , provably. We propose a new algorithm to search for a binary tree-structured composition of transformations, where each tree node corresponds to one transformation. The binary tree generalizes sequential augmentations, such as the SimCLR augmentation scheme for contrastive learning. Using a top-down, recursive search procedure, our algorithm achieves a runtime complexity of , which is much faster than as increases above . We apply our algorithm to tackle data distributions with heterogeneous subpopulations by searching for one tree in each subpopulation and then learning a weighted combination, resulting in a forest of trees. We validate our proposed algorithms on numerous graph and image datasets, including a multi-label graph classification dataset we collected. The dataset exhibits significant variations in the sizes of graphs and their average degrees, making it ideal for studying data augmentation. We show that our approach can reduce the computation cost by 43% over existing search methods while improving performance by 4.3%. The tree structures can be used to interpret the relative importance of each transformation, such as identifying the important transformations on small vs. large graphs.
Paper Structure (25 sections, 1 theorem, 39 equations, 7 figures, 8 tables, 2 algorithms)

This paper contains 25 sections, 1 theorem, 39 equations, 7 figures, 8 tables, 2 algorithms.

Key Result

Theorem B.3

Let $\hat{\theta}$, $\bar{\theta}^{w}$, $\theta_0^\star$ be defined as the above equations with a fixed $w$. Suppose Assumptions assume_1 and assume_2 both hold. Let $\delta \in (0,1)$ be a fixed real number. Then, for any representation $\hat{\theta} \in \bar{\Theta}^{w}$ such that $\hat{\theta}$ i where $N_{w} = ( \sum_{g=1}^m \frac{w_g^2}{n_g} )^{-1}$, $C_1$ and $C_2$ are two constants that do

Figures (7)

  • Figure 1: We illustrate the overall procedure of our algorithms. The input consists of $k$ transformation functions, denoted as $A_1, \dots, A_k$. Given a dataset, our algorithm constructs a probabilistic binary tree-structured composition of these transformations, as shown on the left. Given an input sample $x$, $A_1$ is applied to map $x$ to $A_1(x)$, with probability $p_1$; otherwise, no transformation is applied, and $x$ remains unchanged. Let $x'$ denote the output. In the next step, we will apply $A_2$ to $x'$ with probability $p_2$, or $A_3$ to $x'$ with probability $1 - p_2$, etc. The second algorithm will first partition the entire dataset into a few groups, e.g., by the sizes of graphs, as illustrated above. Then, the first algorithm is applied to learn one tree for each partition. These trees are weighted jointly to form a "forest" as the final augmentation scheme. A byproduct is that we can now measure the importance of each transformation in the tree of each group. For example, permuting edges by randomly adding or deleting a fraction of edges works best for small graphs. For larger-sized graphs, generating a subgraph by simulating a random walk works better. Notice that in an augmentation tree, if a branch only has a single child node, it means if the transformation is not applied, then we will not change the input or use any augmentation, which is the same as applying $A(x) = x$.
  • Figure 2: Illustrating a sequential augmentation scheme chen2020simple.
  • Figure 3: Illustrating the binary tree returned by Algorithm \ref{['alg_constructing_tree']}, conducted on CIFAR-10. On the right branch, no further transformation is applied after the identity mapping.
  • Figure 4: The augmentation trees found from different groups can vary dramatically. On a protein graph classification dataset, the augmentation tree on small graphs (left) involves fewer augmentation steps than those found on large graphs (right). We also report each augmentation's importance score computed from the validation set. To clarify, no further transformation will be applied after the identity map, i.e., $A(x) = x$.
  • Figure 5: Illustrating the trees found between colored images vs. black-and-white images on an image classification dataset. The tree on the left involves different transformations compared to the right.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Remark 3.1
  • Remark 3.2
  • Remark 4.1
  • Definition B.1: $(\rho, C_\rho)$-transferable
  • Theorem B.3
  • proof