Show simple item record

dc.contributor.authorNguyen, Hiep
dc.date.accessioned2024-11-05T16:37:23Z
dc.date.available2024-11-05T16:37:23Z
dc.date.issued2024-05-19
dc.identifier.urihttps://repository.tcu.edu/handle/116099117/66801
dc.description.abstractSpeech impairment ranks among the world's most prevalent disabilities, affecting over 430 million adults [1]. Despite its widespread impact, many existing video-conferencing applications lack a comprehensive end-to-end solution for this challenge. In response, we present a holistic approach to translating American Sign Language (ASL) to subtitles in real-time by leveraging advancements in Google Mediapipe, Transformer models, and web technologies. In March 2024, Google released the largest dataset for the problem domain with over 180 GB in size, containing ASL gesture sequences represented as Mediapipe numeric values. Our methodology begins with the implementation and training of a Transformer model using a preprocessed Google dataset, followed by the establishment of a back-end server that encapsulates the trained model for application integration. This server handles video input preprocessing and real-time inference, communicating with client services as a Representational State Transfer (REST) endpoint. To demonstrate the practicality of our approach, we developed a video conferencing application utilizing the AgoraRTC Software Development Kit (SDK), which communicates with our back-end server to transcribe user gestures to text and display the characters on the receiving end. Through this end-to-end system, we enable video calls enhanced by the real-time transcription of fingerspelled gestures with low latency and high accuracy, effectively bridging the communication gap for individuals with speech disabilities. With a growing imperative for AI applications engineered for human well-being, our project seeks to promote the integration of AI in applications designed to enhance human wellness, thus bringing broader awareness and adoption of this endeavor.
dc.subjectAmerican Sign Language
dc.subjectTransformers
dc.subjectVideo conferencing
dc.subjectReal-time Text Transcription
dc.subjectAccessibility
dc.subjectAI for Human Well-beings
dc.titleFROM GESTURES TO WORDS: AMERICAN SIGN LANGUAGE END-TO-END DEEP LEARNING INTEGRATION WITH TRANSFORMERS AND MEDIAPIPE
etd.degree.departmentComputer Science


Files in this item

Thumbnail
This item appears in the following Collection(s)

Show simple item record