Table of Contents
Fetching ...

Discovery of Rare Causal Knowledge from Financial Statement Summaries

Hiroki Sakaji, Jason Bennett, Risa Murono, Kiyoshi Izumi, Hiroyuki Sakai

TL;DR

This work tackles the problem of identifying rare causal knowledge in Japanese financial statement summaries to aid investors. It presents a three‑stage pipeline: first, a supervised SVM extracts sentences containing cause–effect cues using syntactic features and an extended language ontology; second, patterns are used to extract concrete cause and effect expressions; third, a rare knowledge scorer using company keywords and co‑occurrence signals isolates the rarest causal knowledge. Evaluation on over 100k PDFs from thousands of companies shows that the method achieves an average precision of about 0.80 and a MAP of 0.20, outperforming a baseline, and reveals non‑obvious insights such as how a hot summer might boost demand for certain products. The approach holds practical value for investors by surfacing actionable, non‑general causal links and paves the way for constructing causal knowledge chains to uncover investment opportunities, while future work aims to improve parsing robustness and broaden the causal network.

Abstract

What would happen if temperatures were subdued and result in a cool summer? One can easily imagine that air conditioner, ice cream or beer sales would be suppressed as a result of this. Less obvious is that agricultural shipments might be delayed, or that sound proofing material sales might decrease. The ability to extract such causal knowledge is important, but it is also important to distinguish between cause-effect pairs that are known and those that are likely to be unknown, or rare. Therefore, in this paper, we propose a method for extracting rare causal knowledge from Japanese financial statement summaries produced by companies. Our method consists of three steps. First, it extracts sentences that include causal knowledge from the summaries using a machine learning method based on an extended language ontology. Second, it obtains causal knowledge from the extracted sentences using syntactic patterns. Finally, it extracts the rarest causal knowledge from the knowledge it has obtained.

Discovery of Rare Causal Knowledge from Financial Statement Summaries

TL;DR

This work tackles the problem of identifying rare causal knowledge in Japanese financial statement summaries to aid investors. It presents a three‑stage pipeline: first, a supervised SVM extracts sentences containing cause–effect cues using syntactic features and an extended language ontology; second, patterns are used to extract concrete cause and effect expressions; third, a rare knowledge scorer using company keywords and co‑occurrence signals isolates the rarest causal knowledge. Evaluation on over 100k PDFs from thousands of companies shows that the method achieves an average precision of about 0.80 and a MAP of 0.20, outperforming a baseline, and reveals non‑obvious insights such as how a hot summer might boost demand for certain products. The approach holds practical value for investors by surfacing actionable, non‑general causal links and paves the way for constructing causal knowledge chains to uncover investment opportunities, while future work aims to improve parsing robustness and broaden the causal network.

Abstract

What would happen if temperatures were subdued and result in a cool summer? One can easily imagine that air conditioner, ice cream or beer sales would be suppressed as a result of this. Less obvious is that agricultural shipments might be delayed, or that sound proofing material sales might decrease. The ability to extract such causal knowledge is important, but it is also important to distinguish between cause-effect pairs that are known and those that are likely to be unknown, or rare. Therefore, in this paper, we propose a method for extracting rare causal knowledge from Japanese financial statement summaries produced by companies. Our method consists of three steps. First, it extracts sentences that include causal knowledge from the summaries using a machine learning method based on an extended language ontology. Second, it obtains causal knowledge from the extracted sentences using syntactic patterns. Finally, it extracts the rarest causal knowledge from the knowledge it has obtained.
Paper Structure (17 sections, 5 equations, 9 figures, 7 tables)

This paper contains 17 sections, 5 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: An exmaple of semantic feature
  • Figure 2: List of syntactic patterns
  • Figure 3: An example of pattern A
  • Figure 4: An example of pattern C
  • Figure 5: Extraction of rare cause–effect expressions
  • ...and 4 more figures