Table of Contents
Fetching ...

Evaluating DisCoCirc in Translation Tasks & its Limitations: A Comparative Study Between Bengali & English

Nazmoon Falgunee Moon

TL;DR

The paper evaluates the DisCoCirc framework for translating English–Bengali and reassesses its claimed ability to reduce linguistic bureaucracy. It extends DisCoCirc with Bengali-specific hybrid grammar, text diagrams, and text circuits, and analyzes how Bengali–English correspondence differs due to word order, case marking, pronouns, and idioms. The study finds that DisCoCirc captures substantial structure but encounters notable limitations from gendered adjectives, accusative markers, pronouns (including honorifics), and idioms, which hinder direct translation under the current formalism. It suggests update-rule mechanisms and future extensions to better handle non-isomorphic mappings, causal conjunctions, and tense variation, highlighting important constraints and directions for grammar-based, circuit-like representations in cross-language translation.

Abstract

In [4], the authors present the DisCoCirc (Distributed Compositional Circuits) formalism for the English language, a grammar-based framework derived from the production rules that incorporates circuit-like representations in order to give a precise categorical theoretical structure to the language. In this paper, we extend this approach to develop a similar framework for Bengali and apply it to translation tasks between English and Bengali. A central focus of our work lies in reassessing the effectiveness of DisCoCirc in reducing language bureaucracy. Unlike the result suggested in [5], our findings indicate that although it works well for a large part of the language, it still faces limitations due to the structural variation of the two languages. We discuss the possible methods that might handle these shortcomings and show that, in practice, DisCoCirc still struggles even with relatively simple sentences. This divergence from prior claims not only highlights the framework's constraints in translation but also suggest scope for future improvement. Apart from our primary focus on English-Bengali translation, we also take a short detour to examine English conjunctions, following [1], showing a connection between conjunctions and Boolean logic.

Evaluating DisCoCirc in Translation Tasks & its Limitations: A Comparative Study Between Bengali & English

TL;DR

The paper evaluates the DisCoCirc framework for translating English–Bengali and reassesses its claimed ability to reduce linguistic bureaucracy. It extends DisCoCirc with Bengali-specific hybrid grammar, text diagrams, and text circuits, and analyzes how Bengali–English correspondence differs due to word order, case marking, pronouns, and idioms. The study finds that DisCoCirc captures substantial structure but encounters notable limitations from gendered adjectives, accusative markers, pronouns (including honorifics), and idioms, which hinder direct translation under the current formalism. It suggests update-rule mechanisms and future extensions to better handle non-isomorphic mappings, causal conjunctions, and tense variation, highlighting important constraints and directions for grammar-based, circuit-like representations in cross-language translation.

Abstract

In [4], the authors present the DisCoCirc (Distributed Compositional Circuits) formalism for the English language, a grammar-based framework derived from the production rules that incorporates circuit-like representations in order to give a precise categorical theoretical structure to the language. In this paper, we extend this approach to develop a similar framework for Bengali and apply it to translation tasks between English and Bengali. A central focus of our work lies in reassessing the effectiveness of DisCoCirc in reducing language bureaucracy. Unlike the result suggested in [5], our findings indicate that although it works well for a large part of the language, it still faces limitations due to the structural variation of the two languages. We discuss the possible methods that might handle these shortcomings and show that, in practice, DisCoCirc still struggles even with relatively simple sentences. This divergence from prior claims not only highlights the framework's constraints in translation but also suggest scope for future improvement. Apart from our primary focus on English-Bengali translation, we also take a short detour to examine English conjunctions, following [1], showing a connection between conjunctions and Boolean logic.

Paper Structure

This paper contains 17 sections, 3 theorems, 2 equations, 21 figures, 3 tables.

Key Result

Lemma 3.1

Let $T_\mathcal{E}$ and $T_\mathcal{B}$ be two texts generated by the production rules of the English language and Bengali language, respectively. Let $S_\mathcal{E}$ be the set of all terminal symbols of $T_\mathcal{E}$ and $S_\mathcal{B}$ be the set of all terminal symbols of $T_\mathcal{B}$. If t

Figures (21)

  • Figure 1: Tree diagram for a simple sentence (English)
  • Figure 2: Tree diagram for a simple sentence (Bengali)
  • Figure 3: Examples of pronominal links using personal pronouns
  • Figure 4: Example 1 of connection via relative pronouns
  • Figure 5: Example 2 of connection via relative pronouns
  • ...and 16 more figures

Theorems & Definitions (4)

  • Lemma 3.1
  • Lemma 4.1
  • Lemma 5.1
  • proof