Table of Contents
Fetching ...

StrokeNet: Unveiling How to Learn Fine-Grained Interactions in Online Handwritten Stroke Classification

Yiheng Huang, Shuang She, Zewei Wei, Jianmin Lin, Ming Yang, Wenyin Liu

TL;DR

<3-5 sentence high-level summary> StrokeNet introduces a point-cloud inspired framework for online handwritten stroke classification that represents each stroke as a dynamic sequence of reference-point pairs. It integrates Inline Sequence Attention (ISA), Cross-Ellipse Query (CEQ), hierarchical set learning, and an Auxiliary Branch to capture fine-grained intra- and inter-stroke interactions and semantic transitions. The approach yields state-of-the-art results on four public datasets (CASIA-onDo, IAMonDo, FC, FA), with notable gains on complex layouts and asymmetric stroke categories, and ablations validate the contribution of each component. By balancing detailed stroke representation with computational efficiency, StrokeNet offers a robust, scalable solution for precise online handwriting understanding and suggests avenues for extending to higher-level content tasks.

Abstract

Stroke classification remains challenging due to variations in writing style, ambiguous content, and dynamic writing positions. The core challenge in stroke classification is modeling the semantic relationships between strokes. Our observations indicate that stroke interactions are typically localized, making it difficult for existing deep learning methods to capture such fine-grained relationships. Although viewing strokes from a point-level perspective can address this issue, it introduces redundancy. However, by selecting reference points and using their sequential order to represent strokes in a fine-grained manner, this problem can be effectively solved. This insight inspired StrokeNet, a novel network architecture encoding strokes as reference pair representations (points + feature vectors), where reference points enable spatial queries and features mediate interaction modeling. Specifically, we dynamically select reference points for each stroke and sequence them, employing an Inline Sequence Attention (ISA) module to construct contextual features. To capture spatial feature interactions, we devised a Cross-Ellipse Query (CEQ) mechanism that clusters reference points and extracts features across varying spatial scales. Finally, a joint optimization framework simultaneously predicts stroke categories via reference points regression and adjacent stroke semantic transition modeling through an Auxiliary Branch (Aux-Branch). Experimental results show that our method achieves state-of-the-art performance on multiple public online handwritten datasets. Notably, on the CASIA-onDo dataset, the accuracy improves from 93.81$\%$ to 95.54$\%$, demonstrating the effectiveness and robustness of our approach.

StrokeNet: Unveiling How to Learn Fine-Grained Interactions in Online Handwritten Stroke Classification

TL;DR

<3-5 sentence high-level summary> StrokeNet introduces a point-cloud inspired framework for online handwritten stroke classification that represents each stroke as a dynamic sequence of reference-point pairs. It integrates Inline Sequence Attention (ISA), Cross-Ellipse Query (CEQ), hierarchical set learning, and an Auxiliary Branch to capture fine-grained intra- and inter-stroke interactions and semantic transitions. The approach yields state-of-the-art results on four public datasets (CASIA-onDo, IAMonDo, FC, FA), with notable gains on complex layouts and asymmetric stroke categories, and ablations validate the contribution of each component. By balancing detailed stroke representation with computational efficiency, StrokeNet offers a robust, scalable solution for precise online handwriting understanding and suggests avenues for extending to higher-level content tasks.

Abstract

Stroke classification remains challenging due to variations in writing style, ambiguous content, and dynamic writing positions. The core challenge in stroke classification is modeling the semantic relationships between strokes. Our observations indicate that stroke interactions are typically localized, making it difficult for existing deep learning methods to capture such fine-grained relationships. Although viewing strokes from a point-level perspective can address this issue, it introduces redundancy. However, by selecting reference points and using their sequential order to represent strokes in a fine-grained manner, this problem can be effectively solved. This insight inspired StrokeNet, a novel network architecture encoding strokes as reference pair representations (points + feature vectors), where reference points enable spatial queries and features mediate interaction modeling. Specifically, we dynamically select reference points for each stroke and sequence them, employing an Inline Sequence Attention (ISA) module to construct contextual features. To capture spatial feature interactions, we devised a Cross-Ellipse Query (CEQ) mechanism that clusters reference points and extracts features across varying spatial scales. Finally, a joint optimization framework simultaneously predicts stroke categories via reference points regression and adjacent stroke semantic transition modeling through an Auxiliary Branch (Aux-Branch). Experimental results show that our method achieves state-of-the-art performance on multiple public online handwritten datasets. Notably, on the CASIA-onDo dataset, the accuracy improves from 93.81 to 95.54, demonstrating the effectiveness and robustness of our approach.

Paper Structure

This paper contains 17 sections, 11 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Comparison of stroke and point levels to demonstrate the locality of stroke interactions and the redundancy of points within strokes.
  • Figure 2: Overview of StrokeNet.
  • Figure 3: Structures of the Inline Sequence Attention (ISA) and Cross-Ellipse Query (CEQ).
  • Figure 4: Use auxiliary branch to assist the main branch.
  • Figure 5: The recognition results of different numbers of reference points.