Advancing dot plot accessibility: a synthetic dataset and a YOLO-based approach for enhanced data extraction
Kashmi Sultana1
Corresponding Author : Kashmi Sultana
Recieved : 27-November-2024; Revised : 22-July-2025; Accepted : 24-July-2025
Abstract
Dot plots are widely used for data visualization across various domains due to their simplicity. However, extracting data from dot plot images remains a challenge, primarily because of the lack of dedicated methodologies and comprehensive datasets. Existing chart analysis approaches often overlook dot plots, and resources such as the Benetech – Making Graphs Accessible dataset fail to adequately capture the diversity of dot plot designs. To address this gap, a synthetic dataset has been developed, incorporating a wide range of variations in dot size, shape, color, background, and grid configurations. Additionally, a novel pipeline is proposed for extracting data from dot plot images. This pipeline leverages the context-aware chart extraction and data encoding (CACHED) model to detect key chart components, including plot areas, axis titles, legends, labels, and ticks. For dot detection, a you only look once (YOLO)-based model has been developed and trained on the synthetic dataset. To evaluate performance, 124 real-world dot plot images were manually annotated with dot positions and plot areas. A comparative analysis was conducted by training EfficientUNet and YOLOv5 models on both the Benetech dataset and the proposed synthetic dataset, followed by testing on the annotated real-world images. On the synthetic hold-out set, the YOLOv5 model achieved 99.1% precision, 98.5% recall, and a mAP@0.5 of 99.5%. When evaluated on real-world dot plot images, it maintained high performance, achieving 95.99% precision and a 90.19% F1-score at a normalized distance threshold of 0.02—demonstrating strong accuracy and generalization capability.
Keywords
Dot plot extraction, Synthetic datasets, Chart component detection, Data visualization, Object detection models, YOLO and EfficientUNet.
Cite this article
Sultana K. Advancing dot plot accessibility: a synthetic dataset and a YOLO-based approach for enhanced data extraction. International Journal of Advanced Technology and Engineering Exploration. 2025;12(128):1035-1055. DOI : 10.19101/IJATEE.2024.111102095
