Table of Contents
Fetching ...

Solving morphological analogies: from retrieval to generation

Esteban Marquer, Miguel Couceiro

TL;DR

This work addresses morphological analogies by framing them as analogical proportions and separating detection from solving. It introduces a dual embedding strategy (CNN-based and AE) and two neural modules, ANNc for analogy classification and ANNr for retrieval/generation, augmented with axiom-guided data augmentation. Through extensive multilingual evaluation on Siganalogies and Sigmorphon/JBATS data, ANNr paired with AE consistently outperforms symbolic baselines and rivals or surpasses generation-based methods, while ANNc aids reliable retrieval. The study demonstrates strong cross-language performance, robustness to initialization, and practical guidelines for applying DL to APs, highlighting the value of morphologically focused embeddings and axiom-informed training. Overall, the framework advances automatic handling of morphological analogies with high accuracy across diverse languages and offers actionable insights for future research and applications in NLP and linguistics.

Abstract

Analogical inference is a remarkable capability of human reasoning, and has been used to solve hard reasoning tasks. Analogy based reasoning (AR) has gained increasing interest from the artificial intelligence community and has shown its potential in multiple machine learning tasks such as classification, decision making and recommendation with competitive results. We propose a deep learning (DL) framework to address and tackle two key tasks in AR: analogy detection and solving. The framework is thoroughly tested on the Siganalogies dataset of morphological analogical proportions (APs) between words, and shown to outperform symbolic approaches in many languages. Previous work have explored the behavior of the Analogy Neural Network for classification (ANNc) on analogy detection and of the Analogy Neural Network for retrieval (ANNr) on analogy solving by retrieval, as well as the potential of an autoencoder (AE) for analogy solving by generating the solution word. In this article we summarize these findings and we extend them by combining ANNr and the AE embedding model, and checking the performance of ANNc as an retrieval method. The combination of ANNr and AE outperforms the other approaches in almost all cases, and ANNc as a retrieval method achieves competitive or better performance than 3CosMul. We conclude with general guidelines on using our framework to tackle APs with DL.

Solving morphological analogies: from retrieval to generation

TL;DR

This work addresses morphological analogies by framing them as analogical proportions and separating detection from solving. It introduces a dual embedding strategy (CNN-based and AE) and two neural modules, ANNc for analogy classification and ANNr for retrieval/generation, augmented with axiom-guided data augmentation. Through extensive multilingual evaluation on Siganalogies and Sigmorphon/JBATS data, ANNr paired with AE consistently outperforms symbolic baselines and rivals or surpasses generation-based methods, while ANNc aids reliable retrieval. The study demonstrates strong cross-language performance, robustness to initialization, and practical guidelines for applying DL to APs, highlighting the value of morphologically focused embeddings and axiom-informed training. Overall, the framework advances automatic handling of morphological analogies with high accuracy across diverse languages and offers actionable insights for future research and applications in NLP and linguistics.

Abstract

Analogical inference is a remarkable capability of human reasoning, and has been used to solve hard reasoning tasks. Analogy based reasoning (AR) has gained increasing interest from the artificial intelligence community and has shown its potential in multiple machine learning tasks such as classification, decision making and recommendation with competitive results. We propose a deep learning (DL) framework to address and tackle two key tasks in AR: analogy detection and solving. The framework is thoroughly tested on the Siganalogies dataset of morphological analogical proportions (APs) between words, and shown to outperform symbolic approaches in many languages. Previous work have explored the behavior of the Analogy Neural Network for classification (ANNc) on analogy detection and of the Analogy Neural Network for retrieval (ANNr) on analogy solving by retrieval, as well as the potential of an autoencoder (AE) for analogy solving by generating the solution word. In this article we summarize these findings and we extend them by combining ANNr and the AE embedding model, and checking the performance of ANNc as an retrieval method. The combination of ANNr and AE outperforms the other approaches in almost all cases, and ANNc as a retrieval method achieves competitive or better performance than 3CosMul. We conclude with general guidelines on using our framework to tackle APs with DL.
Paper Structure (60 sections, 9 equations, 5 figures, 10 tables)

This paper contains 60 sections, 9 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Overview of the framework: morphological embedding models, data augmentation, analogy classification (ANNc) and analogy solving (ANNr) models.
  • Figure 2: Morphological word embedding model based on character-level CNNs. The special characters <BOW> and <EOW> allow the CNN filters to identify characters at the beginning and end of the word.
  • Figure 3: Morphological word embedding model based on character-level CNNs. The special characters <BOW> and <EOW> mark the beginning and end of the word, and are used in the generation process.
  • Figure 4: Analogy Neural Network for classification (ANNc). The embedding of the four input elements $A,B,C,D$ are shown vertically, each in a different color.
  • Figure 5: Analogy Neural Network for retrieval/generation (ANNr). The embedding of the three input elements $A,B,C$ are shown horizontally, each in a different color -- notice that $A$ appears twice.