Table of Contents
Fetching ...

AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding

Zihan Huang, Tao Wu, Wang Lin, Shengyu Zhang, Jingyuan Chen, Fei Wu

TL;DR

This work tackles the lack of large-scale geometric datasets for multimodal language models by introducing AutoGeo, a pipeline that constructs geometry images and corresponding descriptions through an augmented geometry clause system, a rule-based clause selector, and a sample generator. AutoGeo enables AutoGeo-100k, a 100k-sample dataset with varied difficulty levels and guaranteed data integrity, generated in about 7.5 hours. Fine-tuning multiple multimodal models on AutoGeo-100k yields notable improvements in geometric captioning and reasoning tasks, validating the dataset's quality and utility for advancing geometry understanding in AI. The approach offers a scalable foundation for research and educational tools requiring robust geometric reasoning in multimodal settings.

Abstract

With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100k, an extensive repository comprising 100k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research. Project page: https://autogeo-official.github.io/.

AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding

TL;DR

This work tackles the lack of large-scale geometric datasets for multimodal language models by introducing AutoGeo, a pipeline that constructs geometry images and corresponding descriptions through an augmented geometry clause system, a rule-based clause selector, and a sample generator. AutoGeo enables AutoGeo-100k, a 100k-sample dataset with varied difficulty levels and guaranteed data integrity, generated in about 7.5 hours. Fine-tuning multiple multimodal models on AutoGeo-100k yields notable improvements in geometric captioning and reasoning tasks, validating the dataset's quality and utility for advancing geometry understanding in AI. The approach offers a scalable foundation for research and educational tools requiring robust geometric reasoning in multimodal settings.

Abstract

With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100k, an extensive repository comprising 100k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research. Project page: https://autogeo-official.github.io/.
Paper Structure (16 sections, 7 figures, 6 tables)

This paper contains 16 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Examples in AutoGeo-100k.
  • Figure 2: We explore three approaches for generating geometric images. First, we utilize diffusion models, which encounter challenges in achieving precision due to the lack of a systematic logical generation process. Second, we employ GPT-4 to generate Python code alongside Matplotlib for image creation. However, this approach frequently encounters syntax and logical errors within the generated code. Consequently, we propose AutoGeo, an automatic geometry sample generation pipeline equipped with a comprehensive geometric clause system.
  • Figure 3: Demonstration of AutoGeo pipeline. The augmented geometry clause system includes 77 clauses with key attributes. The system is enhanced by adding 26 clauses with numerical annotations and categorizing each clause into three levels of difficulty. The rule-based selector then automatically chooses mutually compatible clauses according to predefined rules to meet the complexity limits. Finally, the sample generator converts the selected clauses into dataset samples.
  • Figure 4: Demonstrations of geometric clauses. Each geometric clause is a formalized description about a geometric definition, including (a) basic geometric objects, (b) properties of geometric objects, (c) geometric transforms and (d) geometric objects with numerical annotations. Each clause has several crucial attributes and has different difficulty levels, which facilitates the complexity control of the image generation process.
  • Figure 5: Dataset statistics. The bar chart shows the frequency of each clause with different complexity levels. The line chart displays the average length of textual annotations corresponding to each clause. Half of the clause annotations are omitted in the figure for better visualization.
  • ...and 2 more figures