Multi-answer Constrained Optimal Querying: Maximum Information Gain Coding
Zhefan Li, Pingyi Fan
TL;DR
The paper tackles multi-answer constrained querying by introducing Maximum Information Gain Coding (MIGC), a greedy $D$-ary partitioning approach that extends GBSC. It proves that the information gain of a query equals the partition entropy $H(S)$ and demonstrates that MIGC achieves near-optimal performance with per-symbol bounds, outperforming Shannon coding on several metrics. The authors validate MIGC via three practical scenarios—general discrete distributions ($D=3$), DNA detection, and battleship-style games—showing improved bits-per-symbol efficiency and near-brute-force performance. This work offers a principled, information-theoretic framework for constrained decision trees with potential applications in AI behavior trees and related learning frameworks.
Abstract
As the rapidly developments of artificial intelligence and machine learning, behavior tree design in multiagent system or AI game become more important. The behavior tree design problem is highly related to the source coding in information theory. "Twenty Questions" problem is a typical example for the behavior tree design, usually used to explain the source coding application in information theory and can be solved by Huffman coding. In some realistic scenarios, there are some constraints on the asked questions. However, for general question set, finding the minimum expected querying length is an open problem, belongs to NP-hard. Recently, a new coding scheme has been proposed to provide a near optimal solution for binary cases with some constraints, named greedy binary separation coding (GBSC). In this work, we shall generalize it to D-ary cases and propose maximum information gain coding (MIGC) approach to solve the multi-answer decision constrained querying problem. The optimality of the proposed MIGC is discussed in theory. Later on, we also apply MIGC to discuss three practical scenarios and showcase that MIGC has better performance than GBSC and Shannon Coding in terms of bits persymbol.
