Table of Contents
Fetching ...

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim

TL;DR

GIFARC addresses the gap in abstract reasoning by introducing a large-scale, analogy-grounded ARC-style dataset synthesized from GIFs with explicit ground-truth analogies and executable ARC transformations. The authors employ a three-stage pipeline—visual abstraction from GIFs, task sketch formulation, and executable ARC-task generation via LLMs and VLMs with retrieval augmentation—to produce 10,000 tasks and over 100k input–output pairs. Empirical results show high generation fidelity and that analogy cues steer LLM reasoning toward human-like strategies, improving alignment with ground-truth analogies on ARC-style challenges. The work lays groundwork for analogy-informed AI reasoning with potential impact on education, scientific discovery, and decision support, while noting limitations related to single-GIF dependence and future expansion to videos and more diverse analogies.

Abstract

The Abstraction and Reasoning Corpus (ARC) poses a stringent test of general AI capabilities, requiring solvers to infer abstract patterns from only a handful of examples. Despite substantial progress in deep learning, state-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition, indicative of a significant gap between their performance and human-level reasoning. In this work, we seek to bridge that gap by introducing an analogy-inspired ARC dataset, GIFARC. Leveraging large language models (LLMs) and vision-language models (VLMs), we synthesize new ARC-style tasks from a variety of GIF images that include analogies. Each new task is paired with ground-truth analogy, providing an explicit mapping between visual transformations and everyday concepts. By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search, thus efficiently reducing problem complexity and build a more concise and human-understandable solution. We empirically validate that guiding LLM with analogic approach with GIFARC affects task-solving approaches of LLMs to align with analogic approach of human.

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

TL;DR

GIFARC addresses the gap in abstract reasoning by introducing a large-scale, analogy-grounded ARC-style dataset synthesized from GIFs with explicit ground-truth analogies and executable ARC transformations. The authors employ a three-stage pipeline—visual abstraction from GIFs, task sketch formulation, and executable ARC-task generation via LLMs and VLMs with retrieval augmentation—to produce 10,000 tasks and over 100k input–output pairs. Empirical results show high generation fidelity and that analogy cues steer LLM reasoning toward human-like strategies, improving alignment with ground-truth analogies on ARC-style challenges. The work lays groundwork for analogy-informed AI reasoning with potential impact on education, scientific discovery, and decision support, while noting limitations related to single-GIF dependence and future expansion to videos and more diverse analogies.

Abstract

The Abstraction and Reasoning Corpus (ARC) poses a stringent test of general AI capabilities, requiring solvers to infer abstract patterns from only a handful of examples. Despite substantial progress in deep learning, state-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition, indicative of a significant gap between their performance and human-level reasoning. In this work, we seek to bridge that gap by introducing an analogy-inspired ARC dataset, GIFARC. Leveraging large language models (LLMs) and vision-language models (VLMs), we synthesize new ARC-style tasks from a variety of GIF images that include analogies. Each new task is paired with ground-truth analogy, providing an explicit mapping between visual transformations and everyday concepts. By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search, thus efficiently reducing problem complexity and build a more concise and human-understandable solution. We empirically validate that guiding LLM with analogic approach with GIFARC affects task-solving approaches of LLMs to align with analogic approach of human.

Paper Structure

This paper contains 59 sections, 1 equation, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration of how differently an agent solves ARC-style task when it is guided with or without analogic approach. In (a), an agent attempts to solve the ARC-style task in brute-force manner and is in risk to result logically complex solutions. In contrast, When the agent has been guided to think analogically with GIFARC dataset, as depicted in (b), it finds out a more concise and human-intuitive solution.
  • Figure 2: Illustration of GIFARC data synthesis pipeline that transforms a single GIF into a corresponding ARC-style task and supplementary data. In step 1, a vision language model (GPT o1) digests a GIF file and outputs a detailed text expression $\mathcal{A}(g)$ of the visual transformation implied in the GIF in a JSONL format. In step 2, a large language model (GPT o3-mini) reads step 1 result JSONL and outputs a text sketch an ARC-style task. In step 3, a large language model (GPT o3-mini) reads the step 2 sketch result and generates an ARC-style task $\mathcal{E}$, an implied analogy $\alpha$, and the Python solution program $\phi$. The example prompts used in each step are reported in Appendix \ref{['sec:examples']}.
  • Figure 2: Code Complexity of GIFARC Tasks
  • Figure 3: Histogram of task types occurrence in GIFARC.
  • Figure 4: Experiment 1 case study: the LLMs learned in-context with richer analogy context result in more analogic interpretation of the ARC-AGI-2 task. Context 1, 2, 3 respectively stands for full description sample, description sample without analogy, description sample without analogy and without solution.
  • ...and 2 more figures