Table of Contents
Fetching ...

LSP-YOLO: A Lightweight Single-Stage Network for Sitting Posture Recognition on Embedded Devices

Nanjun Li, Ziyue Hao, Quanqiang Wang, Xuanyin Wang

TL;DR

This work tackles real-time sitting posture recognition on embedded edge devices by replacing traditional two-stage pipelines with a single-stage, end-to-end network, LSP-YOLO. It fuses keypoint estimation and posture classification through point convolution, aided by Light-C3k2, which combines Partial Convolution and SimAM attention to minimize computation. A 5,000-image dataset across six postures supports training, and the smallest model achieves 94.2% accuracy at 251 FPS with a 1.9 MB size, while edge deployment on SV830C+GC030A confirms practical performance. Overall, the method delivers a compact, high-speed, deployable solution that outperforms two-stage approaches in inference efficiency while maintaining competitive accuracy for posture monitoring and HCI applications.

Abstract

With the rise in sedentary behavior, health problems caused by poor sitting posture have drawn increasing attention. Most existing methods, whether using invasive sensors or computer vision, rely on two-stage pipelines, which result in high intrusiveness, intensive computation, and poor real-time performance on embedded edge devices. Inspired by YOLOv11-Pose, a lightweight single-stage network for sitting posture recognition on embedded edge devices termed LSP-YOLO was proposed. By integrating partial convolution(PConv) and Similarity-Aware Activation Module(SimAM), a lightweight module, Light-C3k2, was designed to reduce computational cost while maintaining feature extraction capability. In the recognition head, keypoints were directly mapped to posture classes through pointwise convolution, and intermediate supervision was employed to enable efficient fusion of pose estimation and classification. Furthermore, a dataset containing 5,000 images across six posture categories was constructed for model training and testing. The smallest trained model, LSP-YOLO-n, achieved 94.2% accuracy and 251 Fps on personal computer(PC) with a model size of only 1.9 MB. Meanwhile, real-time and high-accuracy inference under constrained computational resources was demonstrated on the SV830C + GC030A platform. The proposed approach is characterized by high efficiency, lightweight design and deployability, making it suitable for smart classrooms, rehabilitation, and human-computer interaction applications.

LSP-YOLO: A Lightweight Single-Stage Network for Sitting Posture Recognition on Embedded Devices

TL;DR

This work tackles real-time sitting posture recognition on embedded edge devices by replacing traditional two-stage pipelines with a single-stage, end-to-end network, LSP-YOLO. It fuses keypoint estimation and posture classification through point convolution, aided by Light-C3k2, which combines Partial Convolution and SimAM attention to minimize computation. A 5,000-image dataset across six postures supports training, and the smallest model achieves 94.2% accuracy at 251 FPS with a 1.9 MB size, while edge deployment on SV830C+GC030A confirms practical performance. Overall, the method delivers a compact, high-speed, deployable solution that outperforms two-stage approaches in inference efficiency while maintaining competitive accuracy for posture monitoring and HCI applications.

Abstract

With the rise in sedentary behavior, health problems caused by poor sitting posture have drawn increasing attention. Most existing methods, whether using invasive sensors or computer vision, rely on two-stage pipelines, which result in high intrusiveness, intensive computation, and poor real-time performance on embedded edge devices. Inspired by YOLOv11-Pose, a lightweight single-stage network for sitting posture recognition on embedded edge devices termed LSP-YOLO was proposed. By integrating partial convolution(PConv) and Similarity-Aware Activation Module(SimAM), a lightweight module, Light-C3k2, was designed to reduce computational cost while maintaining feature extraction capability. In the recognition head, keypoints were directly mapped to posture classes through pointwise convolution, and intermediate supervision was employed to enable efficient fusion of pose estimation and classification. Furthermore, a dataset containing 5,000 images across six posture categories was constructed for model training and testing. The smallest trained model, LSP-YOLO-n, achieved 94.2% accuracy and 251 Fps on personal computer(PC) with a model size of only 1.9 MB. Meanwhile, real-time and high-accuracy inference under constrained computational resources was demonstrated on the SV830C + GC030A platform. The proposed approach is characterized by high efficiency, lightweight design and deployability, making it suitable for smart classrooms, rehabilitation, and human-computer interaction applications.

Paper Structure

This paper contains 20 sections, 14 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Comparison between the conventional multi-stage pipeline and the proposed single-stage approach. (a) Conventional two-stage method; (b) Proposed single-stage method.
  • Figure 2: The network structure of YOLOv11-Pose
  • Figure 3: The overall structure of LSP-YOLO
  • Figure 4: Point convolution–based sitting posture classification
  • Figure 5: Comparison between YOLOv11-Pose Head and the proposed LSP-Head. (a) YOLOv11-Pose Head; (b) Proposed LSP-Head.
  • ...and 13 more figures