Table of Contents
Fetching ...

Reinforcement Learning for Graph Coloring: Understanding the Power and Limits of Non-Label Invariant Representations

Chase Cummins, Richard Veras

TL;DR

This work investigates reinforcement learning for graph coloring by casting register allocation as a $k$-coloring problem and evaluating model-free methods (DQN and PPO) within a GraphColoring Gym environment. It introduces a progression of reward designs, culminating in a PPO-based approach that can color small graphs and reveals a critical sensitivity to graph labeling, demonstrating that non-label invariant representations hinder consistent performance. The study shows that while PPO can achieve optimal or near-optimal coloring on small graphs, performance degrades with graph size and relabelings, highlighting the need for invariant graph representations, such as those offered by Graph Neural Networks. The findings have practical implications for compiler optimization pipelines and suggest a clear path for improving ML-based graph coloring through invariant representations and scalable architectures.

Abstract

Register allocation is one of the most important problems for modern compilers. With a practically unlimited number of user variables and a small number of CPU registers, assigning variables to registers without conflicts is a complex task. This work demonstrates the use of casting the register allocation problem as a graph coloring problem. Using technologies such as PyTorch and OpenAI Gymnasium Environments we will show that a Proximal Policy Optimization model can learn to solve the graph coloring problem. We will also show that the labeling of a graph is critical to the performance of the model by taking the matrix representation of a graph and permuting it. We then test the model's effectiveness on each of these permutations and show that it is not effective when given a relabeling of the same graph. Our main contribution lies in showing the need for label reordering invariant representations of graphs for machine learning models to achieve consistent performance.

Reinforcement Learning for Graph Coloring: Understanding the Power and Limits of Non-Label Invariant Representations

TL;DR

This work investigates reinforcement learning for graph coloring by casting register allocation as a -coloring problem and evaluating model-free methods (DQN and PPO) within a GraphColoring Gym environment. It introduces a progression of reward designs, culminating in a PPO-based approach that can color small graphs and reveals a critical sensitivity to graph labeling, demonstrating that non-label invariant representations hinder consistent performance. The study shows that while PPO can achieve optimal or near-optimal coloring on small graphs, performance degrades with graph size and relabelings, highlighting the need for invariant graph representations, such as those offered by Graph Neural Networks. The findings have practical implications for compiler optimization pipelines and suggest a clear path for improving ML-based graph coloring through invariant representations and scalable architectures.

Abstract

Register allocation is one of the most important problems for modern compilers. With a practically unlimited number of user variables and a small number of CPU registers, assigning variables to registers without conflicts is a complex task. This work demonstrates the use of casting the register allocation problem as a graph coloring problem. Using technologies such as PyTorch and OpenAI Gymnasium Environments we will show that a Proximal Policy Optimization model can learn to solve the graph coloring problem. We will also show that the labeling of a graph is critical to the performance of the model by taking the matrix representation of a graph and permuting it. We then test the model's effectiveness on each of these permutations and show that it is not effective when given a relabeling of the same graph. Our main contribution lies in showing the need for label reordering invariant representations of graphs for machine learning models to achieve consistent performance.
Paper Structure (17 sections, 4 equations, 6 figures, 1 table)

This paper contains 17 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: A plot showing the training of the DQN over 10,000 episodes given a graph of 5 nodes. The average number of steps needed is close to uniform over the training period, neglecting an initial drop at the start, showing that the DQN was failing to improve past the 68 step average it finished at.
  • Figure 2: A plot showing the number of steps required to solve graphs of 8, 12, and 16 nodes over 200k training time steps. The model learns to solve the 8-node graph in 8 steps, the minimum number required, but fails to reach that same metric on graphs of 12 and 16 nodes over 200k time steps. Additional training was performed on the 16 node graph up to 400k time steps leading to a 50 step average in place of the 100 step average with 200k time steps. This points to a possible quadratic relationship between graph size and necessary training time.
  • Figure 3: A histogram of steps taken to solve 1000 permutations of a graph with a trained model. The model averaged 2,040 steps and had a median number of steps needed of 534 showing that a model trained on one labeling of a graph is not well equipped to solve the same graph relabeled.
  • Figure 4: Plot showing the number of steps to solve over training time steps for an initial training of 200k steps on the graph and then training of 50k steps for 16 permutations of the graph. The different permutations are differently colored. As shown, the model often has to relearn how to solve the graph and sometimes it performs poorly for the entire duration of training for a given permutation.
  • Figure 5: A histogram of steps taken to solve 1000 permutations with additional training on 16 relabelings of 50k steps each. The performance improved significantly with the additional training decreasing the median and mean steps to 89 and 119 respectively. The minimum steps increased to 10, though, implying that as the model gets better generally, it's performance on previously trained permutations will most likely decrease.
  • ...and 1 more figures