ACCENTS Journals

Download PDF

Monocular camera-based 3D point cloud reconstruction and traffic sign detection using vision transformers and YOLOv8

Luis Alberto Chavarría Zamora¹ and Pablo Soto- Quiros²

Escuela de Ingeniería en Computadores,Instituto Tecnológico de Costa Rica,Cartago 30101,Costa Rica¹
Escuela de Matemática,Instituto Tecnológico de Costa Rica,Cartago 30101,Costa Rica²

Corresponding Author : Luis Alberto Chavarría Zamora

Recieved : 23-Jan-2025; Revised : 22-Oct-2025; Accepted : 25-Oct-2025

Abstract

This study presents a novel method for extracting point clouds of traffic signs using a single monocular camera sensor. Traditional light detection and ranging (LiDAR) techniques, although highly accurate, are expensive, require integration with cameras for segmentation tasks, and increase overall system complexity. The proposed approach is significant as it enables the generation of accurately segmented point clouds without relying on a LiDAR sensor, which was not available to the research group. The solution is flexible, allowing substitution with equivalent algorithms for monocular depth estimation, image segmentation, camera calibration, and global positioning system (GPS) association. Furthermore, the integration of machine learning techniques is proposed for traffic sign classification.

Keywords

Monocular vision, Point cloud extraction, Traffic sign detection, Depth estimation, Image segmentation, Machine learning.

Cite this article

Zamora LA, Quiros PS. Monocular camera-based 3D point cloud reconstruction and traffic sign detection using vision transformers and YOLOv8.International Journal of Advanced Technology and Engineering Exploration.2025;12(131):1-13

References

[1] Tychola KA, Vrochidou E, Papakostas GA. Deep learning based computer vision under the prism of 3D point clouds: a systematic review. The Visual Computer. 2024; 40(11):8287-329.

[Crossref] [Google Scholar]

[2] Zhang X, Wang H, Dong H. A survey of deep learning-driven 3D object detection: sensor modalities, technical architectures, and applications. Sensors. 2025; 25(12):1-39.

[Crossref] [Google Scholar]

[3] Feng S, Wu X, Cao J. A survey of multi-view stereo 3D reconstruction algorithms based on deep learning. Digital Signal Processing. 2025; 165:105291.

[Crossref] [Google Scholar]

[4] Zhang N, Nex F, Vosselman G, Kerle N. Lite-mono: a lightweight CNN and transformer architecture for self-supervised monocular depth estimation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023 (pp. 18537-46). IEEE.

[Google Scholar]

[5] Schiavella C, Cirillo L, Papa L, Russo P, Amerini I. Efficient attention vision transformers for monocular depth estimation on resource-limited hardware. Scientific Reports. 2025; 15(1):1-24.

[Crossref] [Google Scholar]

[6] Tesema KW, Hill L, Jones MW, Ahmad MI, Tam GK. Point cloud completion: a survey. IEEE Transactions on Visualization and Computer Graphics. 2023; 30(10):6880-99.

[Crossref] [Google Scholar]

[7] Kissling WD, Shi Y, Koma Z, Meijer C, Ku O, Nattino F, et al. Country-wide data of ecosystem structure from the third Dutch airborne laser scanning survey. Data in Brief. 2023; 46:1-15.

[Crossref] [Google Scholar]

[8] Vélez S, Ariza-sentís M, Valente J. VineLiDAR: high-resolution UAV-LiDAR vineyard dataset acquired over two years in northern Spain. Data in Brief. 2023; 51:1-10.

[Crossref] [Google Scholar]

[9] Flores-calero M, Astudillo CA, Guevara D, Maza J, Lita BS, Defaz B, et al. Traffic sign detection and recognition using YOLO object detection algorithm: a systematic review. Mathematics. 2024; 12(2):1-31.

[Crossref] [Google Scholar]

[10] Murat AA, Kiran MS. A comprehensive review on YOLO versions for object detection. Engineering Science and Technology, an International Journal. 2025; 70:1-19.

[Crossref] [Google Scholar]

[11] Boehme MG, Al-turjman F. Enhancing object detection capabilities: a comprehensive exploration and fine-tuning of YOLOv5 algorithm across diverse datasets. In international conference on artificial intelligence of things for smart societies 2024 (pp. 99-104). Cham: Springer Nature Switzerland.

[Crossref] [Google Scholar]

[12] Rajasekhar MJ. Understanding yolo: real-time object detection explained. Interantional Journal of Scientific Research in Engineering and Management. 2024; 8(7):1-9.

[Google Scholar]

[13] https://www.kaggle.com/discussions/general/839. Accessed 20 September 2025.

[14] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite. In conference on computer vision and pattern recognition 2012 (pp. 3354-61). IEEE.

[Crossref] [Google Scholar]

[15] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In European conference on computer vision 2012 (pp. 746-60). Berlin, Heidelberg: Springer Berlin Heidelberg.

[Crossref] [Google Scholar]

[16] Kummerle R, Hahnel D, Dolgov D, Thrun S, Burgard W. Autonomous driving in a multi-level parking structure. In international conference on robotics and automation 2009 (pp. 3395-400). IEEE.

[Crossref] [Google Scholar]

[17] Beltrán J, Guindel C, Moreno FM, Cruzado D, Garcia F, De LEA. Birdnet: a 3d object detection framework from lidar information. In 21st international conference on intelligent transportation systems (ITSC) 2018 (pp. 3517-23). IEEE.

[Crossref] [Google Scholar]

[18] Xiao Y, Li J, Lu X, Lin X, Fang T. Monocular 3D object detection: a survey and new outlooks. Computer vision – ECCV 2020 (pp.17-34).

[Crossref]

[19] Janai J, Güney F, Behl A, Geiger A. Computer vision for autonomous vehicles: problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision. 2020; 12(1–3):1-308.

[Crossref] [Google Scholar]

[20] Caltagirone L, Bellone M, Svensson L, Wahde M. LIDAR–camera fusion for road detection using fully convolutional neural networks. Robotics and Autonomous Systems. 2019; 111:125-31.

[Crossref] [Google Scholar]

[21] Zhou L, Deng Z. LIDAR and vision-based real-time traffic sign detection and recognition algorithm for intelligent vehicle. In17th international IEEE conference on intelligent transportation systems (ITSC) 2014 (pp. 578-83). IEEE.

[Crossref] [Google Scholar]

[22] Prakash IV, Palanivelan M. A study of YOLO (You Only Look Once) to YOLOv8. In algorithms in advanced artificial intelligence 2024 (pp. 257-66). CRC Press.

[Crossref] [Google Scholar]