International Journal of Advanced Technology and Engineering Exploration ISSN (Print): 2394-5443    ISSN (Online): 2394-7454 Volume-13 Issue-138 May-2026
  1. 4774
    Citations
  2. 2.8
    CiteScore
Leveraging ensemble methods with pretrained CNNs for image-based sign language classification

Yasir Altaf1, Abdul Wahid1 and Mudasir Manzoor Kirmani2

Maulana Azad National Urdu University,Gachibowli, Hyderabad, 500032,Telangana,India1
Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir,Ganderbal, Srinagar, 190025,Jammu and Kashmir,India2
Corresponding Author : Yasir Altaf

Recieved : 19-April-2025; Revised : 23-May-2026; Accepted : 25-May-2026

Abstract

Sign language serves as a vital means of communication for individuals who are deaf and hard of hearing (DHH), providing an effective visual alternative when spoken interaction is impractical or impossible. Among the various modalities of sign language, hand gestures constitute the most expressive and frequently used component for conveying meaning. However, accurately recognizing and classifying these gestures remains a significant challenge in the development of reliable assistive communication technologies. This study proposes an ensemble convolutional neural network (ECNN) framework that integrates three high-performing pre-trained convolutional neural network (CNN) architectures: densely connected convolutional network 201 (DenseNet201), residual network 152 (ResNet152), and visual geometry group 19 (VGG19), through a logistic regression (LR)–based meta-learner. The ensemble leverages the complementary feature representations of its constituent models using adaptive feature distillation and selection (AFDS) to improve recognition accuracy and generalization across multiple sign language datasets. Experimental evaluations demonstrate that the proposed ECNN achieves superior classification performance, attaining accuracies of 99.86% and 98.65% on the American Sign Language (ASL) and Indian Sign Language (ISL) datasets, respectively. Furthermore, the model generalizes effectively to cross-domain benchmarks, achieving accuracies of 98.0% on the National University of Singapore (NUS) Hand Posture dataset and 96.0% on the Arabic Sign Language (ArSL) dataset, thereby outperforming existing state-of-the-art approaches. These results validate the robustness, scalability, and cross-lingual adaptability of the proposed ensemble model, highlighting its potential as a reliable foundation for real-time sign language recognition (SLR) and assistive communication systems.

Keywords

Sign language recognition (SLR), Ensemble convolutional neural network, Adaptive feature distillation and selection (AFDS), Hand gesture classification, Assistive communication systems.

Cite this article

Altaf Y, Wahid A, Kirmani MM. Leveraging ensemble methods with pretrained CNNs for image-based sign language classification. International Journal of Advanced Technology and Engineering Exploration. 2026;13(138):701-730. DOI : 10.19101/IJATEE.2025.121220510

References
[1]
Ingoley S, Bakal J. Interpretation of Indian sign language to text and speech to communicate with speech and hearing impaired community. Procedia Computer Science. 2025; 258:1980-92.
[2]
Aly M, Fathi IS. Recognizing American sign language gestures efficiently and accurately using a hybrid transformer model. Scientific Reports. 2025; 15(1):1-27.
[3]
Chu C, Xiao Q, Zhang Y, Liu X. Multi-modal fusion sign language recognition based on residual network and attention mechanism. International Journal of Pattern Recognition and Artificial Intelligence. 2022; 36(12):2250036.
[4]
Chang V, Eniola RO, Golightly L, Xu QA. An exploration into human–computer interaction: hand gesture recognition management in a challenging environment. SN Computer Science. 2023; 4(5):1-17.
[5]
Alaftekin M, Pacal I, Cicek K. Real-time sign language recognition based on YOLO algorithm. Neural Computing and Applications. 2024; 36(14):7609-24.
[6]
Pigou L, Dieleman S, Kindermans PJ, Schrauwen B. Sign language recognition using convolutional neural networks. In European conference on computer vision 2014 (pp. 572-8). Cham: Springer International Publishing.
[7]
Hugar G, Kagalkar RM, Das A. Comparative study of hybrid deep learning models for Kannada sign language recognition. International Journal of Computational Intelligence Systems. 2025 ;18(1):1-23.
[8]
Gupta K, Singh A, Yeduri SR, Srinivas MB, Cenkeramaddi LR. Hand gestures recognition using edge computing system based on vision transformer and lightweight CNN. Journal of Ambient Intelligence and Humanized Computing. 2023; 14(3):2601-15.
[9]
Abd AST, Yussof S, Ahmad A, Khadim S. Deep learning for sign language recognition: a comparative review. Journal of Smart Internet of Things. 2024; 2024(1):77-116.
[10]
Zakariah M, Alotaibi YA, Koundal D, Guo Y, Mamun EM. Sign language recognition for Arabic alphabets using transfer learning technique. Computational Intelligence and Neuroscience. 2022; 2022(1):1-15.
[11]
Lahiani H, Frikha M. Exploring CNN-based transfer learning approaches for Arabic alphabets sign language recognition using the ArSL2018 dataset. International Journal of Intelligent Engineering Informatics. 2024; 12(2):236-60.
[12]
John J, Deshpande S. Static hand gesture recognition using multi-dilated densenet-based deep learning architecture. The Imaging Science Journal. 2023; 71(3):221-43.
[13]
Adithya V, Rajesh R. A deep convolutional neural network approach for static hand gesture recognition. Procedia Computer Science. 2020; 171:2353-61.
[14]
Awaluddin BA, Chao CT, Chiou JS. A hybrid image augmentation technique for user-and environment-independent hand gesture recognition based on deep learning. Mathematics. 2024; 12(9):1-34.
[15]
Baytaş İM, Erdoğan İ. Signer-independent sign language recognition with feature disentanglement. Turkish Journal of Electrical Engineering and Computer Sciences. 2024; 32(3):420-35.
[16]
Eid A, Schwenker F. Visual static hand gesture recognition using convolutional neural network. Algorithms. 2023; 16(8):1-19.
[17]
Karsh B, Laskar RH, Karsh RK. mIV3Net: modified inception V3 network for hand gesture recognition. Multimedia Tools and Applications. 2024; 83(4):10587-613.
[18]
Aldhahri E, Aljuhani R, Alfaidi A, Alshehri B, Alwadei H, Aljojo N, et al. Arabic sign language recognition using convolutional neural network and mobilenet. Arabian Journal for Science and Engineering. 2023; 48(2):2147-54.
[19]
Hrúz M, Gruber I, Kanis J, Boháček M, Hlaváč M, Krňoul Z. One model is not enough: ensembles for isolated sign language recognition. Sensors. 2022; 22(13):1-17.
[20]
Suardi C, Handayani AN, Asmara RA, Wibawa AP, Hayati LN, Azis H. Design of sign language recognition using E-CNN. In 3rd east Indonesia conference on computer and information technology (EIConCIT) 2021 (pp. 166-70). IEEE.
[21]
Zhou Y, Xia Z, Chen Y, Neidle C, Metaxas DN. A multimodal spatio-temporal GCN model with enhancements for isolated sign recognition. In proceedings of the 11th workshop on the representation and processing of sign languages: evaluation of sign language resources 2024 (pp. 408-19). ELRA.
[22]
Kumar H, Sachan R, Tiwari M, Katiyar AK, Awasthi N, Mamoria P. Hybrid sign language recognition framework leveraging MobileNetV3, mult-head self attention and LightGBM. Journal of Electronics, Electromedical Engineering, and Medical Informatics. 2025; 7(2):318-29.
[23]
Bhaumik G, Govil MC. SpAtNet: a spatial feature attention network for hand gesture recognition. Multimedia Tools and Applications. 2024; 83(14):41805-22.
[24]
Alsulami A, Bajbaa K, Luqman H, Laradji I. Few-shot learning for sign language recognition with embedding propagation. Nafath. 2024; 9(27):1-19.
[25]
Wang Y, Jiang H, Sun Y, Xu L. A static sign language recognition method enhanced with self-attention mechanisms. Sensors. 2024; 24(21):1-19.
[26]
Ma Y, Xu T, Han S, Kim K. Ensemble learning of multiple deep CNNs using accuracy-based weighted voting for ASL recognition. Applied Sciences. 2022; 12(22):1-17.
[27]
Ahmadabadi H, Manzari ON, Ayatollahi A. Distilling knowledge from CNN-transformer models for enhanced human action recognition. In 13th international conference on computer and knowledge engineering (ICCKE) 2023 (pp. 180-4). IEEE.
[28]
Shin J, Musa MAS, Hasan MA, Hirooka K, Suzuki K, Lee HS, et al. Korean sign language recognition using transformer-based deep neural network. Applied Sciences. 2023; 13(5):1-16.
[29]
Qin J, Wang M. Sign language recognition based on dual-channel star-attention convolutional neural network. Scientific Reports. 2025; 15(1):1-14.
[30]
Kumari D, Anand RS. Fusion of attention-based convolution neural network and HOG features for static sign language recognition. Applied Sciences. 2023; 13(21):1-15.
[31]
Alkhoraif AA, Alsulaiman M, Abdul W, Bencherif M. Ensemble transformer-based word-level sign language recognition with multi-modal input fusion. Journal of Engineering Research. 2025; 14(1):738-47.
[32]
Barbhuiya AA, Karsh RK, Jain R. ASL hand gesture classification and localization using deep ensemble neural network. Arabian Journal for Science and Engineering. 2023; 48(5):6689-702.
[33]
Baihan A, Alutaibi AI, Alshehri M, Sharma SK. Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach. Scientific Reports. 2024; 14(1):1-22.
[34]
Kothadiya DR, Bhatt CM, Rehman A, Alamri FS, Saba T. SignExplainer: an explainable AI-enabled framework for sign language recognition with ensemble learning. IEEE Access. 2023; 11:47410-9.
[35]
Shivayogi P. Sign language recognition using a hybrid machine learning model. Master's Projects, San Jose State University. 2024.
[36]
Al-saidi M, Ballagi Á, Hassen OA, Darwish SM. Adaptive sign language recognition for deaf users: integrating markov chains with niching genetic algorithm. AI. 2025; 6(8):1-43.
[37]
Khanna S, Nagpal K. Sign language interpretation using ensembled deep learning models. In ITM web of conferences 2023 (pp. 1-10). EDP Sciences.
[38]
Gupta R. Stacking ensemble of convolutional neural networks for sign language recognition. In international conference on computer communication and informatics (ICCCI) 2022 (pp. 1-5). IEEE.
[39]
Wang K, Gao X, Zhao Y, Li X, Dou D, Xu CZ. Pay attention to features, transfer learn faster CNNs. In international conference on learning representations 2019 (pp.1-20).
[40]
Huang G, Liu Z, Van DML, Weinberger KQ. Densely connected convolutional networks. In proceedings of the conference on computer vision and pattern recognition 2017 (pp. 4700-8). IEEE.
[41]
Hossain MZ, Sohel F, Shiratuddin MF, Laga H, Bennamoun M. Attention-based image captioning using densenet features. In international conference on neural information processing 2019 (pp. 109-17). Cham: Springer International Publishing.
[42]
Lodhi B, Kang J. Multipath-densenet: a supervised ensemble architecture of densely connected convolutional networks. Information Sciences. 2019; 482:63-72.
[43]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In proceedings of the conference on computer vision and pattern recognition 2016 (pp. 770-8). IEEE.
[44]
Zilly JG, Srivastava RK, Koutnık J, Schmidhuber J. Recurrent highway networks. In international conference on machine learning 2017 (pp. 4189-98). PMLR.
[45]
Chouhan V, Singh SK, Khamparia A, Gupta D, Tiwari P, Moreira C, et al. A novel transfer learning based approach for pneumonia detection in chest X-ray images. Applied Sciences. 2020; 10(2):1-17.
[46]
Raiaan MA, Sakib S, Fahad NM, Al MA, Rahman MA, Shatabda S, et al. A systematic review of hyperparameter optimization techniques in convolutional neural networks. Decision Analytics Journal. 2024; 11:1-32.
[47]
https://github.com/imRishabhGupta/Indian-Sign-Language-Recognition. Accessed 06 October 2025.
[48]
https://github.com/parakh-gupta/Sign_language_alphabet_recognizer. Accessed 06 October 2025.
[49]
Pisharady PK, Vadakkepat P, Loh AP. Attention based detection and recognition of hand postures against complex backgrounds. International Journal of Computer Vision. 2013; 101(3):403-19.
[50]
Latif G, Mohammad N, Alghazo J, Alkhalaf R, Alkhalaf R. ArASL: Arabic alphabets sign language dataset. Data in Brief. 2019; 23:1-4.
[51]
Awaluddin BA, Chao CT, Chiou JS. Investigating effective geometric transformation for image augmentation to improve static hand gestures with a pre-trained convolutional neural network. Mathematics. 2023; 11(23):1-23.
[52]
Alsaadi Z, Alshamani E, Alrehaili M, Alrashdi AA, Albelwi S, Elfaki AO. A real time Arabic sign language alphabets (ArSLA) recognition model using deep learning architecture. Computers. 2022; 11(5):1-20.