Table of Contents
Fetching ...

Teaching Small Language Models to Learn Logic through Meta-Learning

Leonardo Bertolazzi, Manuel Vargas Guzmán, Raffaella Bernardi, Maciej Malicki, Jakub Szymanik

TL;DR

The paper tackles the challenge of out-of-distribution logical generalization in large language models by applying few-shot meta-learning to syllogistic reasoning, a well-defined logical fragment. It constructs synthetic knowledge bases and frames premise selection as selecting the minimal set of premises that entail a query, training small autoregressive models to extract abstract inference patterns across tasks. Empirical results show that 1.5B–7B Qwen models fine-tuned with meta-learning achieve strong generalization, often outperforming GPT-4o and o3-mini on the syllogistic task, particularly in low-data regimes. The findings suggest that meta-learning can enhance deductive reasoning in small LMs and provide a pathway toward more robust, abstract reasoning capabilities in practical AI systems.

Abstract

Large language models (LLMs) are increasingly evaluated on reasoning tasks, yet their logical abilities remain contested. To address this, we study LLMs' reasoning in a well-defined fragment of logic: syllogistic reasoning. We cast the problem as premise selection and construct controlled datasets to isolate logical competence. Beyond evaluation, an open challenge is enabling LLMs to acquire abstract inference patterns that generalize to novel structures. We propose to apply few-shot meta-learning to this domain, thereby encouraging models to extract rules across tasks rather than memorize patterns within tasks. Although meta-learning has been little explored in the context of logic learnability, our experiments show that it is effective: small models (1.5B-7B) fine-tuned with meta-learning demonstrate strong gains in generalization, with especially pronounced benefits in low-data regimes. These meta-learned models outperform GPT-4o and o3-mini on our syllogistic reasoning task.

Teaching Small Language Models to Learn Logic through Meta-Learning

TL;DR

The paper tackles the challenge of out-of-distribution logical generalization in large language models by applying few-shot meta-learning to syllogistic reasoning, a well-defined logical fragment. It constructs synthetic knowledge bases and frames premise selection as selecting the minimal set of premises that entail a query, training small autoregressive models to extract abstract inference patterns across tasks. Empirical results show that 1.5B–7B Qwen models fine-tuned with meta-learning achieve strong generalization, often outperforming GPT-4o and o3-mini on the syllogistic task, particularly in low-data regimes. The findings suggest that meta-learning can enhance deductive reasoning in small LMs and provide a pathway toward more robust, abstract reasoning capabilities in practical AI systems.

Abstract

Large language models (LLMs) are increasingly evaluated on reasoning tasks, yet their logical abilities remain contested. To address this, we study LLMs' reasoning in a well-defined fragment of logic: syllogistic reasoning. We cast the problem as premise selection and construct controlled datasets to isolate logical competence. Beyond evaluation, an open challenge is enabling LLMs to acquire abstract inference patterns that generalize to novel structures. We propose to apply few-shot meta-learning to this domain, thereby encouraging models to extract rules across tasks rather than memorize patterns within tasks. Although meta-learning has been little explored in the context of logic learnability, our experiments show that it is effective: small models (1.5B-7B) fine-tuned with meta-learning demonstrate strong gains in generalization, with especially pronounced benefits in low-data regimes. These meta-learned models outperform GPT-4o and o3-mini on our syllogistic reasoning task.

Paper Structure

This paper contains 55 sections, 25 figures, 8 tables.

Figures (25)

  • Figure 1: Overview of a ML episode. Given a set of premises (the knowledge base, $\mathcal{KB}$), a set of task demonstrations (or Study Examples), and a Query Hypothesis $x^\mathrm{query}$ that is entailed from $\mathcal{KB}$, models must generate the minimal subset of premises, the Query Premises $y^\mathrm{query}$, from which $x^\mathrm{query}$ can be derived. During each ML episode, by being trained on the Study Examples, models learn to extract the abstract logical patterns. The examples show how we frame syllogistic inferences as a premise selection task. The dataset is built with pseudwords, where here we have variables for space reasons.
  • Figure 2: Example inference. Edges labeled “All-are” denote universal affirmatives (e.g., All cats are felines). The solid red edge is a universal negative (No animals are plants). From these “atomic facts” we infer No cats are tulips (dashed red edge). Formally, this is expressed as $\{Aa - b, \; Ac - d, \; Ebd\} \vDash Eac$, the type 6 inference smiley1973. Here we use words to better explain the inference, the syntetic dataset models see consist of pseudowords.
  • Figure 3: Length generalization. We evaluate models on two types of length generalization: models trained on more complex (i.e., longer) inferences are tested on simpler (i.e., shorter) ones (Top) and vice versa (Bottom). The examples illustrate type 2 inferences.
  • Figure 4: Zero-shot system prompt. The zero-shot system prompt used with the closed models GPT-4o and o3-mini. The query hypothesis is subsequently provided as the first user interaction. We then extract the set of premises returned by the model using regular expressions.
  • Figure 5: Few-shot system prompt. The Few-shot system prompt used with the closed models GPT-4o and o3-mini. The set of study examples provided as few-shot examples, along with the query hypothesis are provided as the first user interaction. We then extract the set of premises returned by the model using regular expressions.
  • ...and 20 more figures