Table of Contents
Fetching ...

A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

Zhenghao Wei, Kehua Lin, Jianlin Feng

TL;DR

This work addresses parsing context-free languages within a biologically plausible neural framework by extending Assembly Calculus with Recurrent Circuits (RC) and Stack Circuits (SC). By leveraging the Chomsky-Schützenberger theorem, the BNLP can parse regular languages via RC and Dyck languages via SC, enabling CFL parsing in conjunction with a Parser Automaton (PA). The authors prove that PA can simulate both finite automata and pushdown automata, establishing PA ⊇ FA and PA ⊇ PDA, thereby equating BNLP's descriptive power with that of a PDA. This provides a rigorous brain-inspired pathway to CFL parsing in NLP, linking neural circuit motifs to formal language capabilities with potential implications for biologically grounded language processing.

Abstract

Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering it weaker than Finite Automata (FA). In this paper, we propose a new bionic natural language parser (BNLP) based on AC and integrates two new biologically rational structures, Recurrent Circuit and Stack Circuit which are inspired by RNN and short-term memory mechanism. In contrast to the original parser, the BNLP can fully handle all regular languages and Dyck languages. Therefore, leveraging the Chomsky-Sch űtzenberger theorem, the BNLP which can parse all Context-Free Languages can be constructed. We also formally prove that for any PDA, a Parser Automaton corresponding to BNLP can always be formed, ensuring that BNLP has a description ability equal to that of PDA and addressing the deficiencies of the original parser.

A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

TL;DR

This work addresses parsing context-free languages within a biologically plausible neural framework by extending Assembly Calculus with Recurrent Circuits (RC) and Stack Circuits (SC). By leveraging the Chomsky-Schützenberger theorem, the BNLP can parse regular languages via RC and Dyck languages via SC, enabling CFL parsing in conjunction with a Parser Automaton (PA). The authors prove that PA can simulate both finite automata and pushdown automata, establishing PA ⊇ FA and PA ⊇ PDA, thereby equating BNLP's descriptive power with that of a PDA. This provides a rigorous brain-inspired pathway to CFL parsing in NLP, linking neural circuit motifs to formal language capabilities with potential implications for biologically grounded language processing.

Abstract

Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering it weaker than Finite Automata (FA). In this paper, we propose a new bionic natural language parser (BNLP) based on AC and integrates two new biologically rational structures, Recurrent Circuit and Stack Circuit which are inspired by RNN and short-term memory mechanism. In contrast to the original parser, the BNLP can fully handle all regular languages and Dyck languages. Therefore, leveraging the Chomsky-Sch űtzenberger theorem, the BNLP which can parse all Context-Free Languages can be constructed. We also formally prove that for any PDA, a Parser Automaton corresponding to BNLP can always be formed, ensuring that BNLP has a description ability equal to that of PDA and addressing the deficiencies of the original parser.
Paper Structure (16 sections, 33 equations, 6 figures, 1 algorithm)

This paper contains 16 sections, 33 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: When OP accepts multiple consecutive adjectives, there will be confusion among the neurons in the assemblies of two adjacent adjectives.
  • Figure 2: An example of how OP operates, when the input sentence is "cat chase mice". Green areas and fibers are disinhibited and red ones are inhibited, $\oslash$ means the area/fiber will be inhibited by the post-rules, and $\bigcirc$ means it will be disinhibited.
  • Figure 3: A sketch of RC. To make the figure clear, fibers connecting Lex and other areas are omitted. Fibers inside RC are all directed fibers, and fibers connected RC areas with other areas are undirected.
  • Figure 4: An example of inputting four consecutive adjectives while parses a string conforms to "$(Adj)^*\ N$", where $Adj$ accepts any adjective. In the figure, $Adj^1$ and $Adj^2$ compose a minimum RC. Through "tossing about" projection, the relationship of a series of adjectives modifying a noun can be stored in RC.
  • Figure 5: A sketch of SC, while the input string is $"((()"$.
  • ...and 1 more figures