DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spaces
Jacob F. Pettit, Chak Shing Lee, Jiachen Yang, Alex Ho, Daniel Faissol, Brenden Petersen, Mikel Landajuela
TL;DR
DisCo-DSO tackles black-box optimization in hybrid discrete-continuous, variable-length spaces by learning a joint distribution over complete designs via an autoregressive model. It extends discrete-continuous optimization to prefix-constrained, variable-length sequences and trains with a risk-seeking policy gradient, achieving one-evaluation-per-solution efficiency. Across parameterized bitstrings, decision-tree policies for RL, and symbolic regression, it outperforms decoupled baselines and, in many cases, state-of-the-art methods, especially as problem complexity increases. This approach enables more sample-efficient, interpretable, and scalable optimization in hybrid spaces with non-differentiable rewards, offering practical impact for RL with interpretable policies and equation discovery.
Abstract
We consider the challenge of black-box optimization within hybrid discrete-continuous and variable-length spaces, a problem that arises in various applications, such as decision tree learning and symbolic regression. We propose DisCo-DSO (Discrete-Continuous Deep Symbolic Optimization), a novel approach that uses a generative model to learn a joint distribution over discrete and continuous design variables to sample new hybrid designs. In contrast to standard decoupled approaches, in which the discrete and continuous variables are optimized separately, our joint optimization approach uses fewer objective function evaluations, is robust against non-differentiable objectives, and learns from prior samples to guide the search, leading to significant improvement in performance and sample efficiency. Our experiments on a diverse set of optimization tasks demonstrate that the advantages of DisCo-DSO become increasingly evident as the complexity of the problem increases. In particular, we illustrate DisCo-DSO's superiority over the state-of-the-art methods for interpretable reinforcement learning with decision trees.
