Table of Contents
Fetching ...

Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction

Chunyang Jiang, Paola Merlo

TL;DR

The work investigates whether cognitive-inspired input design can enable sample-efficient linguistic rule induction without large-scale modeling. By organizing data into analogical paradigms with minimal contextual cues and systematically designed distractors, lightweight BERT+CNN models (approximately $0.5$M parameters) achieve high accuracy ($F1=0.95$) using only $100$ examples, outperforming zero-shot large language models like GPT-o3 ($F1=0.87$). Ablation and generalization analyses show that analogical structure and contrastive distractors drive these gains, with robust cross-phenomenon performance on bake-class verbs and cross-type generalization. The findings suggest that cognitive-inspired data structuring is a distinct optimization dimension that can complement architectural scaling, potentially enabling practical, data-efficient linguistic rule learning across domains with future cross-linguistic validation.

Abstract

Large language models achieve strong performance through training on vast datasets. Can analogical paradigm organization enable lightweight models to match this performance with minimal data? We develop a computational approach implementing three cognitive-inspired principles: analogical structure, contrastive learning, and minimal contextual cues. We test this approach with structured completion tasks where models identify correct sentence completions from analogical patterns with contrastive alternatives. Training lightweight models (BERT+CNN, $0.5M$ parameters) on only one hundred structured examples of English causative/inchoative alternations achieves $F1=0.95$, outperforming zero-shot \texttt{GPT-o3} ($F1=0.87$). Ablation studies confirm that analogical organization and contrastive structure improve performance, consistently surpassing randomly shuffled baselines across architectures. Cross-phenomenon validation using unspecified object alternations replicates these efficiency gains, confirming approach robustness. Our results show that analogical paradigm organization enables competitive linguistic rule learning with orders of magnitude less data than conventional approaches require.

Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction

TL;DR

The work investigates whether cognitive-inspired input design can enable sample-efficient linguistic rule induction without large-scale modeling. By organizing data into analogical paradigms with minimal contextual cues and systematically designed distractors, lightweight BERT+CNN models (approximately M parameters) achieve high accuracy () using only examples, outperforming zero-shot large language models like GPT-o3 (). Ablation and generalization analyses show that analogical structure and contrastive distractors drive these gains, with robust cross-phenomenon performance on bake-class verbs and cross-type generalization. The findings suggest that cognitive-inspired data structuring is a distinct optimization dimension that can complement architectural scaling, potentially enabling practical, data-efficient linguistic rule learning across domains with future cross-linguistic validation.

Abstract

Large language models achieve strong performance through training on vast datasets. Can analogical paradigm organization enable lightweight models to match this performance with minimal data? We develop a computational approach implementing three cognitive-inspired principles: analogical structure, contrastive learning, and minimal contextual cues. We test this approach with structured completion tasks where models identify correct sentence completions from analogical patterns with contrastive alternatives. Training lightweight models (BERT+CNN, parameters) on only one hundred structured examples of English causative/inchoative alternations achieves , outperforming zero-shot \texttt{GPT-o3} (). Ablation studies confirm that analogical organization and contrastive structure improve performance, consistently surpassing randomly shuffled baselines across architectures. Cross-phenomenon validation using unspecified object alternations replicates these efficiency gains, confirming approach robustness. Our results show that analogical paradigm organization enables competitive linguistic rule learning with orders of magnitude less data than conventional approaches require.

Paper Structure

This paper contains 69 sections, 3 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Analogical paradigm organization for sample-efficient linguistic rule learning. (A) Causative/inchoative alternation pattern showing systematic Agent$\leftrightarrow$Theme mapping in English roll-class verbs, with cross-lexical consistency across multiple verb instances. (B) Three cognitive-inspired organizational principles: analogical structure enables cross-paradigm pattern recognition (Man:Dice :: Explorer:Mat), contrastive learning provides discriminative boundaries through systematic constraint violations, and minimal contextual cues offer semantic scaffolding without explicit labeling. (C) Implementation through structured completion tasks where models must integrate all three principles to identify correct answer d from systematically designed alternatives, each testing specific aspects of analogical reasoning capability (Distractor taxonomy details in Table \ref{['tab:data-roll-error-def']}).
  • Figure 2: Zero-shot prompt.
  • Figure 3: F1 Performance as a function of training size. (A) Comparison of model architectures with BERT embeddings on Base structure organization. (B) Impact of data organization using the best architecture.
  • Figure 4: Isolated contributions of organizational components to best model performance. (A) Impact of analogical organisation, comparing Base against NoAnalogy and Shuffled. (B) Impact of implicit soft annotations, comparing Base with NoSoftCue and Transposed variants.
  • Figure 5: Reasoning models (top row: deepseek-R1, gpt-o3, gpt-o3-mini, qwq-32B) vs. standard models (bottom row: llama-3.3-70B-Instruct, llama-3.2-3B-Instruct, qwen3-32B, deepseek-V3) across data structures. Red dotted line shows small model baseline (F1=0.98). Reasoning models consistently outperform standard models, but struggle to match structured lightweight model performance in zero-shot settings.
  • ...and 6 more figures