International Journal of Advanced Technology and Engineering Exploration ISSN (Print): 2394-5443    ISSN (Online): 2394-7454 Volume-12 Issue-131 October-2025
  1. 3843
    Citations
  2. 2.7
    CiteScore
Monocular camera-based 3D point cloud reconstruction and traffic sign detection using vision transformers and YOLOv8

Luis Alberto Chavarría Zamora 1 and Pablo Soto- Quiros2

Escuela de Ingeniería en Computadores,Instituto Tecnológico de Costa Rica,Cartago 30101,Costa Rica1
Escuela de Matemática,Instituto Tecnológico de Costa Rica,Cartago 30101,Costa Rica2
Corresponding Author : Luis Alberto Chavarría Zamora

Recieved : 23-Jan-2025; Revised : 22-Oct-2025; Accepted : 25-Oct-2025

Abstract

This study presents a novel method for extracting point clouds of traffic signs using a single monocular camera sensor. Traditional light detection and ranging (LiDAR) techniques, although highly accurate, are expensive, require integration with cameras for segmentation tasks, and increase overall system complexity. The proposed approach is significant as it enables the generation of accurately segmented point clouds without relying on a LiDAR sensor, which was not available to the research group. The solution is flexible, allowing substitution with equivalent algorithms for monocular depth estimation, image segmentation, camera calibration, and global positioning system (GPS) association. Furthermore, the integration of machine learning techniques is proposed for traffic sign classification.

Keywords

Monocular vision, Point cloud extraction, Traffic sign detection, Depth estimation, Image segmentation, Machine learning.

Cite this article

Zamora LA, Quiros PS. Monocular camera-based 3D point cloud reconstruction and traffic sign detection using vision transformers and YOLOv8.International Journal of Advanced Technology and Engineering Exploration.2025;12(131):1-13

References

[1] Tychola KA, Vrochidou E, Papakostas GA. Deep learning based computer vision under the prism of 3D point clouds: a systematic review. The Visual Computer. 2024; 40(11):8287-329.

[2] Zhang X, Wang H, Dong H. A survey of deep learning-driven 3D object detection: sensor modalities, technical architectures, and applications. Sensors. 2025; 25(12):1-39.

[3] Feng S, Wu X, Cao J. A survey of multi-view stereo 3D reconstruction algorithms based on deep learning. Digital Signal Processing. 2025; 165:105291.

[4] Zhang N, Nex F, Vosselman G, Kerle N. Lite-mono: a lightweight CNN and transformer architecture for self-supervised monocular depth estimation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023 (pp. 18537-46). IEEE.

[5] Schiavella C, Cirillo L, Papa L, Russo P, Amerini I. Efficient attention vision transformers for monocular depth estimation on resource-limited hardware. Scientific Reports. 2025; 15(1):1-24.

[6] Tesema KW, Hill L, Jones MW, Ahmad MI, Tam GK. Point cloud completion: a survey. IEEE Transactions on Visualization and Computer Graphics. 2023; 30(10):6880-99.

[7] Kissling WD, Shi Y, Koma Z, Meijer C, Ku O, Nattino F, et al. Country-wide data of ecosystem structure from the third Dutch airborne laser scanning survey. Data in Brief. 2023; 46:1-15.

[8] Vélez S, Ariza-sentís M, Valente J. VineLiDAR: high-resolution UAV-LiDAR vineyard dataset acquired over two years in northern Spain. Data in Brief. 2023; 51:1-10.

[9] Flores-calero M, Astudillo CA, Guevara D, Maza J, Lita BS, Defaz B, et al. Traffic sign detection and recognition using YOLO object detection algorithm: a systematic review. Mathematics. 2024; 12(2):1-31.

[10] Murat AA, Kiran MS. A comprehensive review on YOLO versions for object detection. Engineering Science and Technology, an International Journal. 2025; 70:1-19.

[11] Boehme MG, Al-turjman F. Enhancing object detection capabilities: a comprehensive exploration and fine-tuning of YOLOv5 algorithm across diverse datasets. In international conference on artificial intelligence of things for smart societies 2024 (pp. 99-104). Cham: Springer Nature Switzerland.

[12] Rajasekhar MJ. Understanding yolo: real-time object detection explained. Interantional Journal of Scientific Research in Engineering and Management. 2024; 8(7):1-9.

[13] https://www.kaggle.com/discussions/general/839. Accessed 20 September 2025.

[14] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite. In conference on computer vision and pattern recognition 2012 (pp. 3354-61). IEEE.

[15] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In European conference on computer vision 2012 (pp. 746-60). Berlin, Heidelberg: Springer Berlin Heidelberg.

[16] Kummerle R, Hahnel D, Dolgov D, Thrun S, Burgard W. Autonomous driving in a multi-level parking structure. In international conference on robotics and automation 2009 (pp. 3395-400). IEEE.

[17] Beltrán J, Guindel C, Moreno FM, Cruzado D, Garcia F, De LEA. Birdnet: a 3d object detection framework from lidar information. In 21st international conference on intelligent transportation systems (ITSC) 2018 (pp. 3517-23). IEEE.

[18] Xiao Y, Li J, Lu X, Lin X, Fang T. Monocular 3D object detection: a survey and new outlooks. Computer vision – ECCV 2020 (pp.17-34).

[19] Janai J, Güney F, Behl A, Geiger A. Computer vision for autonomous vehicles: problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision. 2020; 12(1–3):1-308.

[20] Caltagirone L, Bellone M, Svensson L, Wahde M. LIDAR–camera fusion for road detection using fully convolutional neural networks. Robotics and Autonomous Systems. 2019; 111:125-31.

[21] Zhou L, Deng Z. LIDAR and vision-based real-time traffic sign detection and recognition algorithm for intelligent vehicle. In17th international IEEE conference on intelligent transportation systems (ITSC) 2014 (pp. 578-83). IEEE.

[22] Prakash IV, Palanivelan M. A study of YOLO (You Only Look Once) to YOLOv8. In algorithms in advanced artificial intelligence 2024 (pp. 257-66). CRC Press.

TamuBetTAMUBET