Table of Contents
Fetching ...

TopoLedgerBERT: Topological Learning of Ledger Description Embeddings using Siamese BERT-Networks

Sander Noels, Sébastien Viaene, Tijl De Bie

TL;DR

A novel solution, TopoLedgerBERT, a unique sentence embedding method devised specifically for ledger account mapping that integrates hierarchical information from the charts of accounts into the sentence embedding process, aiming to accurately capture both the semantic similarity and the hierarchical structure of the ledger accounts.

Abstract

This paper addresses a long-standing problem in the field of accounting: mapping company-specific ledger accounts to a standardized chart of accounts. We propose a novel solution, TopoLedgerBERT, a unique sentence embedding method devised specifically for ledger account mapping. This model integrates hierarchical information from the charts of accounts into the sentence embedding process, aiming to accurately capture both the semantic similarity and the hierarchical structure of the ledger accounts. In addition, we introduce a data augmentation strategy that enriches the training data and, as a result, increases the performance of our proposed model. Compared to benchmark methods, TopoLedgerBERT demonstrates superior performance in terms of accuracy and mean reciprocal rank.

TopoLedgerBERT: Topological Learning of Ledger Description Embeddings using Siamese BERT-Networks

TL;DR

A novel solution, TopoLedgerBERT, a unique sentence embedding method devised specifically for ledger account mapping that integrates hierarchical information from the charts of accounts into the sentence embedding process, aiming to accurately capture both the semantic similarity and the hierarchical structure of the ledger accounts.

Abstract

This paper addresses a long-standing problem in the field of accounting: mapping company-specific ledger accounts to a standardized chart of accounts. We propose a novel solution, TopoLedgerBERT, a unique sentence embedding method devised specifically for ledger account mapping. This model integrates hierarchical information from the charts of accounts into the sentence embedding process, aiming to accurately capture both the semantic similarity and the hierarchical structure of the ledger accounts. In addition, we introduce a data augmentation strategy that enriches the training data and, as a result, increases the performance of our proposed model. Compared to benchmark methods, TopoLedgerBERT demonstrates superior performance in terms of accuracy and mean reciprocal rank.
Paper Structure (22 sections, 6 equations, 4 figures, 2 tables)

This paper contains 22 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Left: Assets subsection of the balance sheet. Right: A vertex-labeled tree representation of the assets subsection of the balance sheet.
  • Figure 2: Diagram of the Sentence-BERT architecture for computing ledger account description similarity scores for the ledger account mapping problem.
  • Figure 3: Example construction of an augmented dataset, $D_{aug}$, by TopoLedgerBERT for ledger account mapping.
  • Figure 4: Misprediction Distance (MD) difference distribution between TopoLedgerBERT@20 model and Fine-tuned SBERT model.