Convergence of Some Convex Message Passing Algorithms to a Fixed Point
Vaclav Voracek, Tomas Werner
TL;DR
The paper addresses convergence questions for convex message-passing methods used in MAP inference, framing these methods as block-coordinate descent on a dual LP or Lagrangian relaxation. It analyzes coordinate descent on a piecewise-affine convex objective, proves the iterates converge to a fixed point with a rate of $\mathcal{O}(1/\varepsilon)$, and shows that prominent algorithms like Max-Sum Diffusion and Max-Marginal Averaging are special cases of this framework. A novel energy function is introduced to guarantee descent and non-cycling under boundedness, and a mid-point variant is shown to potentially cycle. The results provide rigorous fixed-point guarantees and convergence rates for widely used convex message-passing approaches in MAP inference and related combinatorial problems, clarifying their theoretical behavior and scope of applicability.
Abstract
A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. This is also known as convex/convergent message passing; examples are max-sum diffusion and sequential tree-reweighted message passing (TRW-S). Convergence properties of these methods are currently not fully understood. They have been proved to converge to the set characterized by local consistency of active constraints, with unknown convergence rate; however, it was not clear if the iterates converge at all (to any point). We prove a stronger result (conjectured before but never proved): the iterates converge to a fixed point of the method. Moreover, we show that the algorithm terminates within $\mathcal{O}(1/\varepsilon)$ iterations. We first prove this for a version of coordinate descent applied to a general piecewise-affine convex objective. Then we show that several convex message passing methods are special cases of this method. Finally, we show that a slightly different version of coordinate descent can cycle.
