Table of Contents
Fetching ...

A Library for Automatic Natural Language Generation of Spanish Texts

Silvia García-Méndez, Milagros Fernández-Gavilanes, Enrique Costa-Montenegro, Jonathan Juncal-Martínez, F. Javier González-Castaño

TL;DR

This work introduces a fully automatic Spanish NLG library that generates complete sentences from a minimal input set by integrating a lexical resource (aLexiS) with a rule-based DCG grammar and a statistical language model. The system follows a three-stage pipeline (Text Planner, Sentence Planner, Realizer) to achieve coherent morphosyntactic realization, supported by automatic lexicon extension and a modular architecture that facilitates cross-language adaptation. Empirical evaluation on a handcrafted corpus shows the method achieves 77.64% automatic generation success and outperforms an automatically adapted Spanish SimpleNLG baseline by about 35 percentage points, with detailed error analyses and inter-annotator reliability supporting robustness. The approach promises practical impact for augmentative communication and automated report generation, offering a scalable, extensible framework that can be extended to other languages and domains.

Abstract

In this article we present a novel system for natural language generation (NLG) of Spanish sentences from a minimum set of meaningful words (such as nouns, verbs and adjectives) which, unlike other state-of-the-art solutions, performs the NLG task in a fully automatic way, exploiting both knowledge-based and statistical approaches. Relying on its linguistic knowledge of vocabulary and grammar, the system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user. The system, which was designed to be integrable, portable and efficient, can be easily adapted to other languages by design and can feasibly be integrated in a wide range of digital devices. During its development we also created a supplementary lexicon for Spanish, aLexiS, with wide coverage and high precision, as well as syntactic trees from a freely available definite-clause grammar. The resulting NLG library has been evaluated both automatically and manually (annotation). The system can potentially be used in different application domains such as augmentative communication and automatic generation of administrative reports or news.

A Library for Automatic Natural Language Generation of Spanish Texts

TL;DR

This work introduces a fully automatic Spanish NLG library that generates complete sentences from a minimal input set by integrating a lexical resource (aLexiS) with a rule-based DCG grammar and a statistical language model. The system follows a three-stage pipeline (Text Planner, Sentence Planner, Realizer) to achieve coherent morphosyntactic realization, supported by automatic lexicon extension and a modular architecture that facilitates cross-language adaptation. Empirical evaluation on a handcrafted corpus shows the method achieves 77.64% automatic generation success and outperforms an automatically adapted Spanish SimpleNLG baseline by about 35 percentage points, with detailed error analyses and inter-annotator reliability supporting robustness. The approach promises practical impact for augmentative communication and automated report generation, offering a scalable, extensible framework that can be extended to other languages and domains.

Abstract

In this article we present a novel system for natural language generation (NLG) of Spanish sentences from a minimum set of meaningful words (such as nouns, verbs and adjectives) which, unlike other state-of-the-art solutions, performs the NLG task in a fully automatic way, exploiting both knowledge-based and statistical approaches. Relying on its linguistic knowledge of vocabulary and grammar, the system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user. The system, which was designed to be integrable, portable and efficient, can be easily adapted to other languages by design and can feasibly be integrated in a wide range of digital devices. During its development we also created a supplementary lexicon for Spanish, aLexiS, with wide coverage and high precision, as well as syntactic trees from a freely available definite-clause grammar. The resulting NLG library has been evaluated both automatically and manually (annotation). The system can potentially be used in different application domains such as augmentative communication and automatic generation of administrative reports or news.
Paper Structure (28 sections, 6 equations, 5 figures, 15 tables, 3 algorithms)

This paper contains 28 sections, 6 equations, 5 figures, 15 tables, 3 algorithms.

Figures (5)

  • Figure 1: Example of the Spanish lemma aposento 'bedroom' in both resources.
  • Figure 2: Example of Spanish lemma aposento 'bedroom' in aLexiS entry.
  • Figure 3: Syntax tree example from the grammar.
  • Figure 4: Our three-stage NLG architecture.
  • Figure 7: Annotation example.