Gesture Recognition-Based Sign Language Translation System

Authors

  • Mengsen Yao School of Computer and Information Engineering, Bengbu University, Bengbu, Anhui, 233000, China Author
  • Chen Zhou School of Computer and Information Engineering, Bengbu University, Bengbu, Anhui, 233000, China Author
  • Zhixiong Liu School of Computer and Information Engineering, Bengbu University, Bengbu, Anhui, 233000, China Author
  • Anchi Zhang School of Computer and Information Engineering, Bengbu University, Bengbu, Anhui, 233000, China Author

DOI:

https://doi.org/10.70088/4zcjsw50

Keywords:

gesture recognition, sign language translation, deep learning, natural language processing, Yolov9 model

Abstract

To address communication barriers between deaf-mute individuals and non-sign language users, a gesture-based sign language translation system was developed for the real-time translation of sign language into text or speech. The system utilizes the YOLOv9 model and transfer learning techniques, integrating deep learning and natural language processing (NLP) to achieve gesture recognition and translation. The system design encompasses data preprocessing, feature extraction, model training and optimization, and real-time translation processing modules, adopting an end-to-end architecture to optimize user experience. Experimental results demonstrate that the proposed system exhibits superior performance in sign language recognition accuracy, response speed, and translation quality.

References

H. Bhavsar, and J. Trivedi, "Performance comparison of svm, cnn, hmm and neuro-fuzzy approach for indian sign language recognition," Indian J Comput Sci Eng, vol. 12, no. 4, pp. 1093-1101, 2021.

K. Myagila, D. G. Nyambo, and M. A. Dida, "Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures," Frontiers in Artificial Intelligence, vol. 8, p. 1630743, 2025. doi: 10.3389/frai.2025.1630743

A. O. Tur, and H. Y. Keles, "Evaluation of hidden markov models using deep cnn features in isolated sign recognition," Multimedia tools and applications, vol. 80, no. 13, pp. 19137-19155, 2021. doi: 10.1007/s11042-021-10593-w

A. B. Aziz, N. Basnin, M. Farshid, M. Akhter, T. Mahmud, K. Andersson, and M. S. Kaiser, "Yolo-v4 based detection of varied hand gestures in heterogeneous settings," In International Conference on Applied Intelligence and Informatics, October, 2023, pp. 325-338. doi: 10.1007/978-3-031-68639-9_21

P. Yu, L. Zhang, B. Fu, and Y. Chen, "Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder," In IJCAI, August, 2023, pp. 5260-5268. doi: 10.24963/ijcai.2023/584

W. Jia, and C. Li, "SLR-YOLO: An improved YOLOv8 network for real-time sign language recognition," Journal of Intelligent & Fuzzy Systems, vol. 46, no. 1, pp. 1663-1680, 2024. doi: 10.3233/jifs-235132

F. Zhou, and T. Van de Cruys, "Non-autoregressive modeling for sign-gloss to texts translation," In Proceedings of Machine Translation Summit XX: Volume 1, June, 2025, pp. 220-230.

Y. Min, A. Hao, X. Chai, and X. Chen, "Visual alignment constraint for continuous sign language recognition," In proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11542-11551.

J. M. Blair, "Architectures for Real-Time Automatic Sign Language Recognition on Resource-Constrained Device," 2018.

R. San-Segundo, J. M. Montero, R. Cordoba, V. Sama, F. Fernández, L. F. D'Haro, and A. García, "Design, development and field evaluation of a Spanish into sign language translation system," Pattern Analysis and Applications, vol. 15, no. 2, pp. 203-224, 2012.

S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," In Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.

F. Yu, and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015. doi: 10.1109/tpami.2015.2389824

T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768. doi: 10.1109/cvpr.2018.00913

Downloads

Published

13 January 2026

Issue

Section

Article

How to Cite

Yao, M., Zhou, C., Liu, Z., & Zhang, A. (2026). Gesture Recognition-Based Sign Language Translation System. Artificial Intelligence and Digital Technology, 3(1), 10-20. https://doi.org/10.70088/4zcjsw50