An Algorithm for Computing the Capacity of Symmetrized KL Information for Discrete Channels
Haobo Chen, Gholamali Aminian, Yuheng Bu
TL;DR
The paper tackles the problem of computing the capacity defined by the symmetrized KL information $I_{ ext{SKL}}(X;Y)$ for fixed discrete channels, which is challenging due to the non-concavity of Lautum information. It reformulates the problem as a discrete quadratic program $\max_{\boldsymbol{X} \in \Delta^{d-1}} \boldsymbol{X}^T \boldsymbol{C} \boldsymbol{X}$ with $C_{ij} = D(P_{Y|X=x_i} \| P_{Y|X=x_j})$ and introduces the Max-SKL algorithm that symmetrizes $\boldsymbol{C}$ to $\boldsymbol{C}_{\text{sym}}$ and updates $\boldsymbol{X}$ via a multiplicative, simplex-preserving rule, guaranteeing monotone improvement in $I_{ ext{SKL}}$. The method is validated on the Binary Symmetric Channel and Binomial Channel, showing excellent agreement with theoretical $C_{ ext{SKL}}$ values and revealing how the SKL capacity differs from mutual-information capacity. The framework is extended to Gibbs-channel learning, where $I_{ ext{SKL}}(W;S)$ characterizes the worst-case generalization error and the Max-SKL procedure identifies adversarial data inputs that maximize this quantity. Together, these results advance capacity estimation under symmetrized divergences and offer data-dependent insights for learning outcomes, with ongoing work to handle continuous inputs via random-matrix/mean-field methods.
Abstract
Symmetrized Kullback-Leibler (KL) information (\(I_{\mathrm{SKL}}\)), which symmetrizes the traditional mutual information by integrating Lautum information, has been shown as a critical quantity in communication~\cite{aminian2015capacity} and learning theory~\cite{aminian2023information}. This paper considers the problem of computing the capacity in terms of \(I_{\mathrm{SKL}}\) for a fixed discrete channel. Such a maximization problem is reformulated into a discrete quadratic optimization with a simplex constraint. One major challenge here is the non-concavity of Lautum information, which complicates the optimization problem. Our method involves symmetrizing the KL divergence matrix and applying iterative updates to ensure a non-decreasing update while maintaining a valid probability distribution. We validate our algorithm on Binary symmetric Channels and Binomial Channels, demonstrating its consistency with theoretical values. Additionally, we explore its application in machine learning through the Gibbs channel, showcasing the effectiveness of our algorithm in finding the worst-case data distributions.
