ACCENTS Journals

Download PDF

Leveraging ensemble methods with pretrained CNNs for image-based sign language classification

Yasir Altaf¹, Abdul Wahid¹ and Mudasir Manzoor Kirmani²

Maulana Azad National Urdu University,Gachibowli, Hyderabad, 500032,Telangana,India¹
Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir,Ganderbal, Srinagar, 190025,Jammu and Kashmir,India²

Corresponding Author : Yasir Altaf

Recieved : 19-April-2025; Revised : 23-May-2026; Accepted : 25-May-2026

Abstract

Sign language serves as a vital means of communication for individuals who are deaf and hard of hearing (DHH), providing an effective visual alternative when spoken interaction is impractical or impossible. Among the various modalities of sign language, hand gestures constitute the most expressive and frequently used component for conveying meaning. However, accurately recognizing and classifying these gestures remains a significant challenge in the development of reliable assistive communication technologies. This study proposes an ensemble convolutional neural network (ECNN) framework that integrates three high-performing pre-trained convolutional neural network (CNN) architectures: densely connected convolutional network 201 (DenseNet201), residual network 152 (ResNet152), and visual geometry group 19 (VGG19), through a logistic regression (LR)–based meta-learner. The ensemble leverages the complementary feature representations of its constituent models using adaptive feature distillation and selection (AFDS) to improve recognition accuracy and generalization across multiple sign language datasets. Experimental evaluations demonstrate that the proposed ECNN achieves superior classification performance, attaining accuracies of 99.86% and 98.65% on the American Sign Language (ASL) and Indian Sign Language (ISL) datasets, respectively. Furthermore, the model generalizes effectively to cross-domain benchmarks, achieving accuracies of 98.0% on the National University of Singapore (NUS) Hand Posture dataset and 96.0% on the Arabic Sign Language (ArSL) dataset, thereby outperforming existing state-of-the-art approaches. These results validate the robustness, scalability, and cross-lingual adaptability of the proposed ensemble model, highlighting its potential as a reliable foundation for real-time sign language recognition (SLR) and assistive communication systems.

Keywords

Sign language recognition (SLR), Ensemble convolutional neural network, Adaptive feature distillation and selection (AFDS), Hand gesture classification, Assistive communication systems.

Cite this article

Altaf Y, Wahid A, Kirmani MM. Leveraging ensemble methods with pretrained CNNs for image-based sign language classification. International Journal of Advanced Technology and Engineering Exploration. 2026;13(138):701-730. DOI : 10.19101/IJATEE.2025.121220510

References

[1]

Ingoley S, Bakal J. Interpretation of Indian sign language to text and speech to communicate with speech and hearing impaired community. Procedia Computer Science. 2025; 258:1980-92.

[Crossref] [Google Scholar]

[2]

Aly M, Fathi IS. Recognizing American sign language gestures efficiently and accurately using a hybrid transformer model. Scientific Reports. 2025; 15(1):1-27.

[Crossref] [Google Scholar]

[3]

Chu C, Xiao Q, Zhang Y, Liu X. Multi-modal fusion sign language recognition based on residual network and attention mechanism. International Journal of Pattern Recognition and Artificial Intelligence. 2022; 36(12):2250036.

[Crossref] [Google Scholar]

[4]

Chang V, Eniola RO, Golightly L, Xu QA. An exploration into human–computer interaction: hand gesture recognition management in a challenging environment. SN Computer Science. 2023; 4(5):1-17.

[Crossref] [Google Scholar]

[5]

Alaftekin M, Pacal I, Cicek K. Real-time sign language recognition based on YOLO algorithm. Neural Computing and Applications. 2024; 36(14):7609-24.

[Crossref] [Google Scholar]

[6]

Pigou L, Dieleman S, Kindermans PJ, Schrauwen B. Sign language recognition using convolutional neural networks. In European conference on computer vision 2014 (pp. 572-8). Cham: Springer International Publishing.

[Crossref] [Google Scholar]

[7]

Hugar G, Kagalkar RM, Das A. Comparative study of hybrid deep learning models for Kannada sign language recognition. International Journal of Computational Intelligence Systems. 2025 ;18(1):1-23.

[Crossref] [Google Scholar]

[8]

Gupta K, Singh A, Yeduri SR, Srinivas MB, Cenkeramaddi LR. Hand gestures recognition using edge computing system based on vision transformer and lightweight CNN. Journal of Ambient Intelligence and Humanized Computing. 2023; 14(3):2601-15.

[Crossref] [Google Scholar]

[9]

Abd AST, Yussof S, Ahmad A, Khadim S. Deep learning for sign language recognition: a comparative review. Journal of Smart Internet of Things. 2024; 2024(1):77-116.

[Crossref] [Google Scholar]

[10]

Zakariah M, Alotaibi YA, Koundal D, Guo Y, Mamun EM. Sign language recognition for Arabic alphabets using transfer learning technique. Computational Intelligence and Neuroscience. 2022; 2022(1):1-15.

[Crossref] [Google Scholar]

[11]

Lahiani H, Frikha M. Exploring CNN-based transfer learning approaches for Arabic alphabets sign language recognition using the ArSL2018 dataset. International Journal of Intelligent Engineering Informatics. 2024; 12(2):236-60.

[Crossref] [Google Scholar]

[12]

John J, Deshpande S. Static hand gesture recognition using multi-dilated densenet-based deep learning architecture. The Imaging Science Journal. 2023; 71(3):221-43.

[Crossref] [Google Scholar]

[13]

Adithya V, Rajesh R. A deep convolutional neural network approach for static hand gesture recognition. Procedia Computer Science. 2020; 171:2353-61.

[Crossref] [Google Scholar]

[14]

Awaluddin BA, Chao CT, Chiou JS. A hybrid image augmentation technique for user-and environment-independent hand gesture recognition based on deep learning. Mathematics. 2024; 12(9):1-34.

[Crossref] [Google Scholar]

[15]

Baytaş İM, Erdoğan İ. Signer-independent sign language recognition with feature disentanglement. Turkish Journal of Electrical Engineering and Computer Sciences. 2024; 32(3):420-35.

[Crossref] [Google Scholar]

[16]

Eid A, Schwenker F. Visual static hand gesture recognition using convolutional neural network. Algorithms. 2023; 16(8):1-19.

[Crossref] [Google Scholar]

[17]

Karsh B, Laskar RH, Karsh RK. mIV3Net: modified inception V3 network for hand gesture recognition. Multimedia Tools and Applications. 2024; 83(4):10587-613.

[Crossref] [Google Scholar]

[18]

Aldhahri E, Aljuhani R, Alfaidi A, Alshehri B, Alwadei H, Aljojo N, et al. Arabic sign language recognition using convolutional neural network and mobilenet. Arabian Journal for Science and Engineering. 2023; 48(2):2147-54.

[Crossref] [Google Scholar]

[19]

Hrúz M, Gruber I, Kanis J, Boháček M, Hlaváč M, Krňoul Z. One model is not enough: ensembles for isolated sign language recognition. Sensors. 2022; 22(13):1-17.

[Crossref] [Google Scholar]

[20]

Suardi C, Handayani AN, Asmara RA, Wibawa AP, Hayati LN, Azis H. Design of sign language recognition using E-CNN. In 3rd east Indonesia conference on computer and information technology (EIConCIT) 2021 (pp. 166-70). IEEE.

[Crossref] [Google Scholar]

[21]

Zhou Y, Xia Z, Chen Y, Neidle C, Metaxas DN. A multimodal spatio-temporal GCN model with enhancements for isolated sign recognition. In proceedings of the 11th workshop on the representation and processing of sign languages: evaluation of sign language resources 2024 (pp. 408-19). ELRA.

[Google Scholar]

[22]

Kumar H, Sachan R, Tiwari M, Katiyar AK, Awasthi N, Mamoria P. Hybrid sign language recognition framework leveraging MobileNetV3, mult-head self attention and LightGBM. Journal of Electronics, Electromedical Engineering, and Medical Informatics. 2025; 7(2):318-29.

[Crossref] [Google Scholar]

[23]

Bhaumik G, Govil MC. SpAtNet: a spatial feature attention network for hand gesture recognition. Multimedia Tools and Applications. 2024; 83(14):41805-22.

[Crossref] [Google Scholar]

[24]

Alsulami A, Bajbaa K, Luqman H, Laradji I. Few-shot learning for sign language recognition with embedding propagation. Nafath. 2024; 9(27):1-19.

[Crossref] [Google Scholar]

[25]

Wang Y, Jiang H, Sun Y, Xu L. A static sign language recognition method enhanced with self-attention mechanisms. Sensors. 2024; 24(21):1-19.

[Crossref] [Google Scholar]

[26]

Ma Y, Xu T, Han S, Kim K. Ensemble learning of multiple deep CNNs using accuracy-based weighted voting for ASL recognition. Applied Sciences. 2022; 12(22):1-17.

[Crossref] [Google Scholar]

[27]

Ahmadabadi H, Manzari ON, Ayatollahi A. Distilling knowledge from CNN-transformer models for enhanced human action recognition. In 13th international conference on computer and knowledge engineering (ICCKE) 2023 (pp. 180-4). IEEE.

[Crossref] [Google Scholar]

[28]

Shin J, Musa MAS, Hasan MA, Hirooka K, Suzuki K, Lee HS, et al. Korean sign language recognition using transformer-based deep neural network. Applied Sciences. 2023; 13(5):1-16.

[Crossref] [Google Scholar]

[29]

Qin J, Wang M. Sign language recognition based on dual-channel star-attention convolutional neural network. Scientific Reports. 2025; 15(1):1-14.

[Crossref] [Google Scholar]

[30]

Kumari D, Anand RS. Fusion of attention-based convolution neural network and HOG features for static sign language recognition. Applied Sciences. 2023; 13(21):1-15.

[Crossref] [Google Scholar]

[31]

Alkhoraif AA, Alsulaiman M, Abdul W, Bencherif M. Ensemble transformer-based word-level sign language recognition with multi-modal input fusion. Journal of Engineering Research. 2025; 14(1):738-47.

[Crossref] [Google Scholar]

[32]

Barbhuiya AA, Karsh RK, Jain R. ASL hand gesture classification and localization using deep ensemble neural network. Arabian Journal for Science and Engineering. 2023; 48(5):6689-702.

[Crossref] [Google Scholar]

[33]

Baihan A, Alutaibi AI, Alshehri M, Sharma SK. Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach. Scientific Reports. 2024; 14(1):1-22.

[Crossref] [Google Scholar]

[34]

Kothadiya DR, Bhatt CM, Rehman A, Alamri FS, Saba T. SignExplainer: an explainable AI-enabled framework for sign language recognition with ensemble learning. IEEE Access. 2023; 11:47410-9.

[Crossref] [Google Scholar]

[35]

Shivayogi P. Sign language recognition using a hybrid machine learning model. Master's Projects, San Jose State University. 2024.

[Crossref] [Google Scholar]

[36]

Al-saidi M, Ballagi Á, Hassen OA, Darwish SM. Adaptive sign language recognition for deaf users: integrating markov chains with niching genetic algorithm. AI. 2025; 6(8):1-43.

[Crossref] [Google Scholar]

[37]

Khanna S, Nagpal K. Sign language interpretation using ensembled deep learning models. In ITM web of conferences 2023 (pp. 1-10). EDP Sciences.

[Crossref] [Google Scholar]

[38]

Gupta R. Stacking ensemble of convolutional neural networks for sign language recognition. In international conference on computer communication and informatics (ICCCI) 2022 (pp. 1-5). IEEE.

[Crossref] [Google Scholar]

[39]

Wang K, Gao X, Zhao Y, Li X, Dou D, Xu CZ. Pay attention to features, transfer learn faster CNNs. In international conference on learning representations 2019 (pp.1-20).

[Google Scholar]

[40]

Huang G, Liu Z, Van DML, Weinberger KQ. Densely connected convolutional networks. In proceedings of the conference on computer vision and pattern recognition 2017 (pp. 4700-8). IEEE.

[Crossref] [Google Scholar]

[41]

Hossain MZ, Sohel F, Shiratuddin MF, Laga H, Bennamoun M. Attention-based image captioning using densenet features. In international conference on neural information processing 2019 (pp. 109-17). Cham: Springer International Publishing.

[Crossref] [Google Scholar]

[42]

Lodhi B, Kang J. Multipath-densenet: a supervised ensemble architecture of densely connected convolutional networks. Information Sciences. 2019; 482:63-72.

[Crossref] [Google Scholar]

[43]

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In proceedings of the conference on computer vision and pattern recognition 2016 (pp. 770-8). IEEE.

[Crossref] [Google Scholar]

[44]

Zilly JG, Srivastava RK, Koutnık J, Schmidhuber J. Recurrent highway networks. In international conference on machine learning 2017 (pp. 4189-98). PMLR.

[Google Scholar]

[45]

Chouhan V, Singh SK, Khamparia A, Gupta D, Tiwari P, Moreira C, et al. A novel transfer learning based approach for pneumonia detection in chest X-ray images. Applied Sciences. 2020; 10(2):1-17.

[Crossref] [Google Scholar]

[46]

Raiaan MA, Sakib S, Fahad NM, Al MA, Rahman MA, Shatabda S, et al. A systematic review of hyperparameter optimization techniques in convolutional neural networks. Decision Analytics Journal. 2024; 11:1-32.

[Crossref] [Google Scholar]

[47]

https://github.com/imRishabhGupta/Indian-Sign-Language-Recognition. Accessed 06 October 2025.

[48]

https://github.com/parakh-gupta/Sign_language_alphabet_recognizer. Accessed 06 October 2025.

[49]

Pisharady PK, Vadakkepat P, Loh AP. Attention based detection and recognition of hand postures against complex backgrounds. International Journal of Computer Vision. 2013; 101(3):403-19.

[Crossref] [Google Scholar]

[50]

Latif G, Mohammad N, Alghazo J, Alkhalaf R, Alkhalaf R. ArASL: Arabic alphabets sign language dataset. Data in Brief. 2019; 23:1-4.

[Crossref] [Google Scholar]

[51]

Awaluddin BA, Chao CT, Chiou JS. Investigating effective geometric transformation for image augmentation to improve static hand gestures with a pre-trained convolutional neural network. Mathematics. 2023; 11(23):1-23.

[Crossref] [Google Scholar]

[52]

Alsaadi Z, Alshamani E, Alrehaili M, Alrashdi AA, Albelwi S, Elfaki AO. A real time Arabic sign language alphabets (ArSLA) recognition model using deep learning architecture. Computers. 2022; 11(5):1-20.

[Crossref] [Google Scholar]