A remark on conditional entropy
Adam Wang
TL;DR
The paper investigates conditional entropy for sequential data and demonstrates an approximate time-reversal invariance. It derives the key relation $H_p(S)-H_{ ilde{p}}(\tilde{S})=\log(p(\vec{x}_f))-\log(p(\vec{x}_l))\le C$, which implies an $O(1/N)$ convergence of the forward/backward entropy difference. It defines a practical learnability metric $\Delta H = \frac{1}{N}(H_M(S)-H_{\tilde{M}}(\tilde{S}))$ to quantify distributional shift and compare forward-versus-backward training. The note discusses extensions to continuous variables, potential non-sequential datasets, and the role of symmetric training in ensuring equality between forward and backward generators, with empirical validation on Enwik9 subsets.
Abstract
The following note proves that conditional entropy of a sequence is almost time-reversal invariant, specifically they only differ by a small constant factor dependent only upon the forward and backward models that the entropies are being calculated with respect to. This gives rise to a numerical value that quantifies learnability, as well as a methodology to control for distributional shift between datasets. Rough guidelines are given for practitioners.
