Gesture Recognition-Based Sign Language Translation System
DOI:
https://doi.org/10.70088/4zcjsw50Keywords:
gesture recognition, sign language translation, deep learning, natural language processing, Yolov9 modelAbstract
To address communication barriers between deaf-mute individuals and non-sign language users, a gesture-based sign language translation system was developed for the real-time translation of sign language into text or speech. The system utilizes the YOLOv9 model and transfer learning techniques, integrating deep learning and natural language processing (NLP) to achieve gesture recognition and translation. The system design encompasses data preprocessing, feature extraction, model training and optimization, and real-time translation processing modules, adopting an end-to-end architecture to optimize user experience. Experimental results demonstrate that the proposed system exhibits superior performance in sign language recognition accuracy, response speed, and translation quality.References
H. Bhavsar, and J. Trivedi, "Performance comparison of svm, cnn, hmm and neuro-fuzzy approach for indian sign language recognition," Indian J Comput Sci Eng, vol. 12, no. 4, pp. 1093-1101, 2021.
K. Myagila, D. G. Nyambo, and M. A. Dida, "Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures," Frontiers in Artificial Intelligence, vol. 8, p. 1630743, 2025. doi: 10.3389/frai.2025.1630743
A. O. Tur, and H. Y. Keles, "Evaluation of hidden markov models using deep cnn features in isolated sign recognition," Multimedia tools and applications, vol. 80, no. 13, pp. 19137-19155, 2021. doi: 10.1007/s11042-021-10593-w
A. B. Aziz, N. Basnin, M. Farshid, M. Akhter, T. Mahmud, K. Andersson, and M. S. Kaiser, "Yolo-v4 based detection of varied hand gestures in heterogeneous settings," In International Conference on Applied Intelligence and Informatics, October, 2023, pp. 325-338. doi: 10.1007/978-3-031-68639-9_21
P. Yu, L. Zhang, B. Fu, and Y. Chen, "Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder," In IJCAI, August, 2023, pp. 5260-5268. doi: 10.24963/ijcai.2023/584
W. Jia, and C. Li, "SLR-YOLO: An improved YOLOv8 network for real-time sign language recognition," Journal of Intelligent & Fuzzy Systems, vol. 46, no. 1, pp. 1663-1680, 2024. doi: 10.3233/jifs-235132
F. Zhou, and T. Van de Cruys, "Non-autoregressive modeling for sign-gloss to texts translation," In Proceedings of Machine Translation Summit XX: Volume 1, June, 2025, pp. 220-230.
Y. Min, A. Hao, X. Chai, and X. Chen, "Visual alignment constraint for continuous sign language recognition," In proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11542-11551.
J. M. Blair, "Architectures for Real-Time Automatic Sign Language Recognition on Resource-Constrained Device," 2018.
R. San-Segundo, J. M. Montero, R. Cordoba, V. Sama, F. Fernández, L. F. D'Haro, and A. García, "Design, development and field evaluation of a Spanish into sign language translation system," Pattern Analysis and Applications, vol. 15, no. 2, pp. 203-224, 2012.
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," In Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
F. Yu, and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.
K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015. doi: 10.1109/tpami.2015.2389824
T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768. doi: 10.1109/cvpr.2018.00913
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Mengsen Yao, Chen Zhou, Zhixiong Liu, Anchi Zhang (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.






