CyanKitten: AI-Driven Markerless Motion Capture for Improved Elderly Well-Being
Mengyao Guo, Yu Nie, Jinda Han, Zongxing Li, Ze Gao
TL;DR
The paper addresses loneliness among aging populations and proposes CyanKitten, an AI-driven virtual companion that uses markerless motion capture with a dual-camera stereo setup to reconstruct up to 208 3D joints per frame via CNN/RNN models. A cat avatar responds to three core gestures (greeting, petting, heart-making) through motion-mirroring and complementary actions, enabled by real-time retargeting with Cyan_SLAM and multimodal fusion of visual and audio cues. The authors outline a comprehensive technical pipeline, an ethical framework, and a detailed plan for user studies in community centers to evaluate engagement and emotional well-being, with planned improvements such as transformer-based fusion and adaptive learning. Overall, CyanKitten aims to provide an engaging, accessible digital companion to alleviate isolation and improve quality of life for elderly users through empathetic, non-verbal interaction and robust pose recognition. The work contributes a cohesive elder-focused AI companion architecture and a roadmap for real-world validation in elderly care settings.
Abstract
This paper introduces CyanKitten, an interactive virtual companion system tailored for elderly users, integrating advanced posture recognition, behavior recognition, and multimodal interaction capabilities. The system utilizes a three-tier architecture to process and interpret user movements and gestures, leveraging a dual-camera setup and a convolutional neural network trained explicitly on elderly movement patterns. The behavior recognition module identifies and responds to three key interactive gestures: greeting waves, petting motions, and heart-making gestures. A multimodal integration layer also combines visual and audio inputs to facilitate natural and intuitive interactions. This paper outlines the technical implementation of each component, addressing challenges such as elderly-specific movement characteristics, real-time processing demands, and environmental adaptability. The result is an engaging and accessible virtual interaction experience designed to enhance the quality of life for elderly users.
