Table of Contents
Fetching ...

Multi-answer Constrained Optimal Querying: Maximum Information Gain Coding

Zhefan Li, Pingyi Fan

TL;DR

The paper tackles multi-answer constrained querying by introducing Maximum Information Gain Coding (MIGC), a greedy $D$-ary partitioning approach that extends GBSC. It proves that the information gain of a query equals the partition entropy $H(S)$ and demonstrates that MIGC achieves near-optimal performance with per-symbol bounds, outperforming Shannon coding on several metrics. The authors validate MIGC via three practical scenarios—general discrete distributions ($D=3$), DNA detection, and battleship-style games—showing improved bits-per-symbol efficiency and near-brute-force performance. This work offers a principled, information-theoretic framework for constrained decision trees with potential applications in AI behavior trees and related learning frameworks.

Abstract

As the rapidly developments of artificial intelligence and machine learning, behavior tree design in multiagent system or AI game become more important. The behavior tree design problem is highly related to the source coding in information theory. "Twenty Questions" problem is a typical example for the behavior tree design, usually used to explain the source coding application in information theory and can be solved by Huffman coding. In some realistic scenarios, there are some constraints on the asked questions. However, for general question set, finding the minimum expected querying length is an open problem, belongs to NP-hard. Recently, a new coding scheme has been proposed to provide a near optimal solution for binary cases with some constraints, named greedy binary separation coding (GBSC). In this work, we shall generalize it to D-ary cases and propose maximum information gain coding (MIGC) approach to solve the multi-answer decision constrained querying problem. The optimality of the proposed MIGC is discussed in theory. Later on, we also apply MIGC to discuss three practical scenarios and showcase that MIGC has better performance than GBSC and Shannon Coding in terms of bits persymbol.

Multi-answer Constrained Optimal Querying: Maximum Information Gain Coding

TL;DR

The paper tackles multi-answer constrained querying by introducing Maximum Information Gain Coding (MIGC), a greedy -ary partitioning approach that extends GBSC. It proves that the information gain of a query equals the partition entropy and demonstrates that MIGC achieves near-optimal performance with per-symbol bounds, outperforming Shannon coding on several metrics. The authors validate MIGC via three practical scenarios—general discrete distributions (), DNA detection, and battleship-style games—showing improved bits-per-symbol efficiency and near-brute-force performance. This work offers a principled, information-theoretic framework for constrained decision trees with potential applications in AI behavior trees and related learning frameworks.

Abstract

As the rapidly developments of artificial intelligence and machine learning, behavior tree design in multiagent system or AI game become more important. The behavior tree design problem is highly related to the source coding in information theory. "Twenty Questions" problem is a typical example for the behavior tree design, usually used to explain the source coding application in information theory and can be solved by Huffman coding. In some realistic scenarios, there are some constraints on the asked questions. However, for general question set, finding the minimum expected querying length is an open problem, belongs to NP-hard. Recently, a new coding scheme has been proposed to provide a near optimal solution for binary cases with some constraints, named greedy binary separation coding (GBSC). In this work, we shall generalize it to D-ary cases and propose maximum information gain coding (MIGC) approach to solve the multi-answer decision constrained querying problem. The optimality of the proposed MIGC is discussed in theory. Later on, we also apply MIGC to discuss three practical scenarios and showcase that MIGC has better performance than GBSC and Shannon Coding in terms of bits persymbol.
Paper Structure (13 sections, 4 theorems, 18 equations, 11 figures)

This paper contains 13 sections, 4 theorems, 18 equations, 11 figures.

Key Result

Lemma 1

Figures (11)

  • Figure 1: Huffman decision tree of the Example 1
  • Figure 2: GBSC decision tree of the example 1
  • Figure 3: MIGC decision tree of the Example 2
  • Figure 4: Illustration for the proof of Lemma 2.
  • Figure 5: Performance of MIGC, Huffman coding and Shannon coding. (a) shows how the expected code length change with $N$. (b)shows the gap of code length between MIGC and Shannon coding when N=10. (c) shows the gap of code length between Huffman coding and MIGC when N=10.
  • ...and 6 more figures

Theorems & Definitions (14)

  • Example 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Example 2
  • Lemma 1
  • proof
  • Theorem 1
  • ...and 4 more