Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer Programs

Akul Bansal; Simge Küçükyavuz

Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer Programs

Akul Bansal, Simge Küçükyavuz

TL;DR

The paper addresses weak cuts arising from dual degeneracy in multistage stochastic programs with mixed-integer state variables by normalizing the ReLU dual in an extended space. It proves that the resulting normalized cuts are tight and Pareto-optimal in the original state space and shows that normalization can reproduce any cut from regularization while offering greater flexibility. The authors derive theoretical results connecting the normalized ReLU dual to lifted Lagrangian cuts and provide a comprehensive computational study on DCAP and CLSP problems, demonstrating stronger cuts and faster convergence, especially when paired with an alternating cut strategy. An open-source implementation is released, highlighting practical impact for scalable, convergent MSIP solution methods. The work advances cut-generation theory and offers a robust, flexible tool for improving decomposition-based solution methods in stochastic integer optimization.

Abstract

We study the Rectified Linear Unit (ReLU) dual, an existing dual formulation for stochastic programs that reformulates non-anticipativity constraints using ReLU functions to generate tight, non-convex, and mixed-integer representable cuts. While this dual reformulation guarantees convergence with mixed-integer state variables, it admits multiple optimal solutions that can yield weak cuts. To address this issue, we propose normalizing the dual in the extended space to identify solutions that yield stronger cuts. We prove that the resulting normalized cuts are tight and Pareto-optimal in the original state space. We further compare normalization with existing regularization-based approaches for handling dual degeneracy and explain why normalization offers key advantages. In particular, we show that normalization can recover any cut obtained via regularization, whereas the converse does not hold. Computational experiments demonstrate that the proposed approach outperforms existing methods by consistently yielding stronger cuts and reducing solution times on harder instances.

Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer Programs

TL;DR

Abstract

Paper Structure (13 sections, 8 theorems, 56 equations, 2 figures, 4 tables)

This paper contains 13 sections, 8 theorems, 56 equations, 2 figures, 4 tables.

Introduction
Normalization of the Dual Formulation
ReLU Dual and ReLU Cuts
Connection with the Original Lagrangian Cuts
Normalized ReLU Dual and Normalized ReLU cuts
Normalization Coefficients and their Impact on Cut Quality
Pareto-optimal Cuts
Tight Cuts
Normalization vs. Regularization
Computational Results
Conclusion
Proof of Proposition \ref{['prop:lifted_lagrangian']}
Proof of Proposition \ref{['prop:tight_relu_alpha']}

Key Result

Proposition 1

The ReLU Lagrangian cut cut:ReLU, generated at $\hat{x}_{a(n)}$ for $\mathop{\mathrm{epi}}\nolimits_{Z_{a(n)}}\left(\underline{Q}_n\right)$, corresponds to Lagrangian cut cut:lifted_lagrn_00 generated at $\left(\mathbf{0}, \mathbf{0}\right)$ for the lifted set $\mathop{\mathrm{epi}}\nolimits_{Z^{lif

Figures (2)

Figure 1: The solid (grey) line depicts $\ul{Q}_n(x_{a(n)})$, a piecewise-linear function. The domain $Z_{a(n)} = [0,4]$ and the shaded region shows the epigraph $\mathcal{H}_n$ at incumbent $\hat{x}_{a(n)}=1$. Two ReLU cuts are shown: $\theta_n = 3-(x_{a(n)}-1)^+ - (x_{a(n)}-1)^-$ (dotted line) and $\theta_n = 1+(x_{a(n)}-1)^+ + (x_{a(n)}-1)^-$ (dashed line). These cuts are obtained using different normalization coefficients, and their epigraph intersection equals $\mathcal{H}_n$. Observe that $\mathop{\mathrm{epi}}\nolimits(\ul{Q}_n)\subsetneq \mathcal{H}_n \subsetneq \mathop{\mathrm{epi}}\nolimits(\overline{\operatorname{co}}(\ul{Q}_n))$.
Figure 2: Left: $Q_n(\cdot)$ and the two cuts over the original domain $Z_{a(n)}$. Right: the corresponding representations in the lifted space $Z^{\mathrm{lift}}_{\hat{x}_{a(n)}}$. The value function is shown in black; the cuts are shown with dotted and dashed boundaries.

Theorems & Definitions (16)

Proposition 1: deng2024relu
Proposition 2
proof
Definition 1: Pareto-optimal affine cut
Proposition 3
Proposition 4
proof
Definition 2: Pareto-optimal $h$-cut
Proposition 5
proof
...and 6 more

Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer Programs

TL;DR

Abstract

Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer Programs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (16)