Table of Contents
Fetching ...

Learning Logical Rules using Minimum Message Length

Ruben Sharma, Sebastijan Dumančić, Ross D. King, Andrew Cropper

TL;DR

This work introduces a Bayesian inductive logic programming approach that learns minimum message length hypotheses from noisy data and significantly outperforms previous methods, notably those that learn minimum description length programs.

Abstract

Unifying probabilistic and logical learning is a key challenge in AI. We introduce a Bayesian inductive logic programming approach that learns minimum message length hypotheses from noisy data. Our approach balances hypothesis complexity and data fit through priors, which favour more general programs, and a likelihood, which favours accurate programs. Our experiments on several domains, including game playing and drug design, show that our method significantly outperforms previous methods, notably those that learn minimum description length programs. Our results also show that our approach is data-efficient and insensitive to example balance, including the ability to learn from exclusively positive examples.

Learning Logical Rules using Minimum Message Length

TL;DR

This work introduces a Bayesian inductive logic programming approach that learns minimum message length hypotheses from noisy data and significantly outperforms previous methods, notably those that learn minimum description length programs.

Abstract

Unifying probabilistic and logical learning is a key challenge in AI. We introduce a Bayesian inductive logic programming approach that learns minimum message length hypotheses from noisy data. Our approach balances hypothesis complexity and data fit through priors, which favour more general programs, and a likelihood, which favours accurate programs. Our experiments on several domains, including game playing and drug design, show that our method significantly outperforms previous methods, notably those that learn minimum description length programs. Our results also show that our approach is data-efficient and insensitive to example balance, including the ability to learn from exclusively positive examples.

Paper Structure

This paper contains 42 sections, 16 equations, 1 figure, 1 algorithm.

Figures (1)

  • Figure 1: Balanced MML and C-MDL program accuracy differences when changing the proportion of positive examples for training datasets of size 20. Blanks indicate proportions not possible with 20 examples. Error bars represent standard error.

Theorems & Definitions (6)

  • Definition 1: ILP input
  • Definition 2: Probabilistic hypothesis
  • Definition 3: Cost function
  • Definition 4: Optimal hypothesis
  • Definition 5: C-MDL cost function
  • Definition 6: Entailed Examples