A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

Arun Kumar Singh; Sushant Dave; Prathosh A. P.; Brejesh Lall; Shresth Mehta

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

Arun Kumar Singh, Sushant Dave, Prathosh A. P., Brejesh Lall, Shresth Mehta

TL;DR

This paper addresses the lack of a standardized benchmark for Sanskrit derivative noun analysis by introducing Pratyaya-Kosh, a corpus of Kridanta (primary derivatives) and Taddhitanta (secondary derivatives). It proposes a neural sequence-to-sequence approach to learn both formation and splitting of derivative nouns from root+pratyaya inputs, and evaluates against existing tools (JNU Sanskrit Kridanta Analyzer and UoH Morphological Generator). The results show the proposed method achieving 84.58% Kridanta formation, 80.09% Taddhitanta formation, 88.79% Kridanta split, and 41.26% Taddhitanta split on the test sets, indicating strong improvement, especially for formation and Kridanta splitting, while requiring no external lexicon. The Pratyaya-Kosh corpus is built from multiple sources and is intended to be publicly available to spur further advances in Sanskrit morphological analysis and its NLP applications.

Abstract

This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes along with neural network based approaches to process the formation and splitting of inflectional words. Inflectional words spans the primary and secondary derivative nouns as the scope of current work. Pratyayas are an important dimension of morphological analysis of Sanskrit texts. There have been Sanskrit Computational Linguistics tools for processing and analyzing Sanskrit texts. Unfortunately there has not been any work to standardize & validate these tools specifically for derivative nouns analysis. In this work, we prepared a Sanskrit suffix benchmark corpus called Pratyaya-Kosh to evaluate the performance of tools. We also present our own neural approach for derivative nouns analysis while evaluating the same on most prominent Sanskrit Morphological Analysis tools. This benchmark will be freely dedicated and available to researchers worldwide and we hope it will motivate all to improve morphological analysis in Sanskrit Language.

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

TL;DR

Abstract

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)