Table of Contents
Fetching ...

KeySense: LLM-Powered Hands-Down, Ten-Finger Typing on Commodity Touchscreens

Tony Li, Yan Ma, Zhuojun Li, Chun Yu, IV Ramakrishnan, Xiaojun Bi

TL;DR

KeySense isolates intentional taps from resting-finger noise using cognitive-motor timing patterns, and then uses a fine-tuned LLM decoder to convert the resulting noisy letter sequence into the intended word.

Abstract

Existing touchscreen software keyboards prevent users from resting their hands, forcing slow and fatiguing index-finger tapping ("chicken typing") instead of familiar hands-down ten-finger typing. We present KeySense, a purely software solution that preserves physical keyboard motor skills. KeySense isolates intentional taps from resting-finger noise using cognitive-motor timing patterns, and then uses a fine-tuned LLM decoder to convert the resulting noisy letter sequence into the intended word. In controlled component tests, the decoder substantially outperforms two statistical baselines (top-1 accuracy 84.8% vs 75.7% and 79.3%). A 12-participant study shows clear ergonomic and performance benefits: compared with the conventional hover-style keyboard, users rated KeySense as markedly less physically demanding (NASA-TLX median 1.5 vs 4.0), and after brief practice typed significantly faster (WPM 28.3 vs 26.2, p < 0.01). These results indicate that KeySense enables accurate, efficient, and comfortable ten-finger text entry on commodity touchscreens without any extra hardware.

KeySense: LLM-Powered Hands-Down, Ten-Finger Typing on Commodity Touchscreens

TL;DR

KeySense isolates intentional taps from resting-finger noise using cognitive-motor timing patterns, and then uses a fine-tuned LLM decoder to convert the resulting noisy letter sequence into the intended word.

Abstract

Existing touchscreen software keyboards prevent users from resting their hands, forcing slow and fatiguing index-finger tapping ("chicken typing") instead of familiar hands-down ten-finger typing. We present KeySense, a purely software solution that preserves physical keyboard motor skills. KeySense isolates intentional taps from resting-finger noise using cognitive-motor timing patterns, and then uses a fine-tuned LLM decoder to convert the resulting noisy letter sequence into the intended word. In controlled component tests, the decoder substantially outperforms two statistical baselines (top-1 accuracy 84.8% vs 75.7% and 79.3%). A 12-participant study shows clear ergonomic and performance benefits: compared with the conventional hover-style keyboard, users rated KeySense as markedly less physically demanding (NASA-TLX median 1.5 vs 4.0), and after brief practice typed significantly faster (WPM 28.3 vs 26.2, p < 0.01). These results indicate that KeySense enables accurate, efficient, and comfortable ten-finger text entry on commodity touchscreens without any extra hardware.
Paper Structure (36 sections, 16 equations, 10 figures, 2 tables)

This paper contains 36 sections, 16 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: End-to-end overview. Top (pre-decoding): raw touch events are aggregated into touch threads, grouped into time clusters, and resolved by a reach-sensitive travel score to yield one representative per cluster. Each representative is mapped to a key, producing a letter sequence. Bottom (LLM decoder): we parameterize a synthetic corpus from the collected typing logs, fine-tune FLAN-T5-small on these pairs, and at inference decode the letter sequence into a word.
  • Figure 2: Touch distributions from the ResType dataset li2023restype. Intentional taps concentrate over key centers (left), whereas unintentional contacts follow a resting-hand arc (right).
  • Figure 3: Inter-onset intervals by user (violins) with a 100 ms reference line. The rightmost violin aggregates all users. Labels show counts and percentages of gaps $\leq 100$ ms. Means (green) and medians (red) are overlaid.
  • Figure 4: Cluster-level selection for a representative word instance (eligible). (a) timeline with cluster assignments; (b) spatial distribution of cluster start positions with nearest keys faintly shown. Our in-cluster rule yields ekigible.
  • Figure 5: Word collection on an iPad Pro. (a) A target word appears at the top; instrumentation gates precise logging. During typing, the interface shows only a password-like progress bar (gray dots) instead of intermediate letters, encouraging natural input without mid-word corrections. (b) A participant types with all ten fingers resting on the surface.
  • ...and 5 more figures