Feasibility of Real-Time Automated Vocal Fold Motion Tracking for In-Office Laryngoscopy

  • Aki Koivu
  • , Obinna I. Nwosu
  • , Mitsuki Ota
  • , Kristina Simonyan
  • , Matthew R. Naunheim

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Objectives: Major advancements have been made in applying artificial intelligence and computer vision to analyze videolaryngoscopy data. These models are limited to post hoc analysis and are aimed at research settings. In this work, we assess the feasibility of a real-time solution for automated vocal fold tracking during in-office laryngoscopy. Methods: We trained a keypoint detection model that tracks 39 individual keypoints of the larynx, optimized for real-time deployment, on 1254 frames extracted from 57 laryngoscopy videos. Images were annotated with keypoints along the true and false vocal folds, and supraglottic structures. After training, the model was evaluated on a validation dataset of 140 test images, and second independent test dataset of 50 images. Performance was assessed by comparing manual keypoint annotations to predicted annotations and computing mean keypoint accuracy ((Formula presented.)), and by calculating temporal consistency ((Formula presented.)) from a calibration video clip. The live feed of a flexible laryngoscope, mounted to an endoscopic tower, was passed via a video capture card into a laptop running the motion tracking model in real-time. Results: The model demonstrated robust detection accuracy with an (Formula presented.) of 85% and 75% with validation and test datasets, and promising temporal consistence (Formula presented.) of 5.85 average pixel movement. During testing, the model ran at 30 frames per second, with minimal appreciable latency while processing the laryngoscope's live feed, ensuring the laryngologist's work was not affected. Conclusion: The model accurately tracks laryngeal structures with strong detection, localization, and consistency, is compatible with affordable hardware, and enables real-time metric development. Level of Evidence: 4.

Original languageEnglish
Pages (from-to)596-604
Number of pages9
JournalLaryngoscope
Volume136
Issue number2
DOIs
StatePublished - Feb 2026
Externally publishedYes

Keywords

  • artificial intelligence
  • videolaryngoscopy
  • vocal fold tracking

Fingerprint

Dive into the research topics of 'Feasibility of Real-Time Automated Vocal Fold Motion Tracking for In-Office Laryngoscopy'. Together they form a unique fingerprint.

Cite this