FROM GESTURES TO WORDS: AMERICAN SIGN LANGUAGE END-TO-END DEEP LEARNING INTEGRATION WITH  TRANSFORMERS AND MEDIAPIPE

Nguyen, Hiep

dc.contributor.author	Nguyen, Hiep
dc.date.accessioned	2024-11-05T16:37:23Z
dc.date.available	2024-11-05T16:37:23Z
dc.date.issued	2024-05-19
dc.identifier.uri	https://repository.tcu.edu/handle/116099117/66801
dc.description.abstract	Speech impairment ranks among the world's most prevalent disabilities, affecting over 430 million adults [1]. Despite its widespread impact, many existing video-conferencing applications lack a comprehensive end-to-end solution for this challenge. In response, we present a holistic approach to translating American Sign Language (ASL) to subtitles in real-time by leveraging advancements in Google Mediapipe, Transformer models, and web technologies. In March 2024, Google released the largest dataset for the problem domain with over 180 GB in size, containing ASL gesture sequences represented as Mediapipe numeric values. Our methodology begins with the implementation and training of a Transformer model using a preprocessed Google dataset, followed by the establishment of a back-end server that encapsulates the trained model for application integration. This server handles video input preprocessing and real-time inference, communicating with client services as a Representational State Transfer (REST) endpoint. To demonstrate the practicality of our approach, we developed a video conferencing application utilizing the AgoraRTC Software Development Kit (SDK), which communicates with our back-end server to transcribe user gestures to text and display the characters on the receiving end. Through this end-to-end system, we enable video calls enhanced by the real-time transcription of fingerspelled gestures with low latency and high accuracy, effectively bridging the communication gap for individuals with speech disabilities. With a growing imperative for AI applications engineered for human well-being, our project seeks to promote the integration of AI in applications designed to enhance human wellness, thus bringing broader awareness and adoption of this endeavor.
dc.subject	American Sign Language
dc.subject	Transformers
dc.subject	Video conferencing
dc.subject	Real-time Text Transcription
dc.subject	Accessibility
dc.subject	AI for Human Well-beings
dc.title	FROM GESTURES TO WORDS: AMERICAN SIGN LANGUAGE END-TO-END DEEP LEARNING INTEGRATION WITH TRANSFORMERS AND MEDIAPIPE
etd.degree.department	Computer Science

Files in this item

Name:: Nguyen_NgocHiep-Honors_Project.pdf
Size:: 6.569Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Undergraduate Honors Papers [1463]

Show simple item record

TCU Digital Repository

FROM GESTURES TO WORDS: AMERICAN SIGN LANGUAGE END-TO-END DEEP LEARNING INTEGRATION WITH TRANSFORMERS AND MEDIAPIPEShow simple item record

Files in this item

This item appears in the following Collection(s)