(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-9 Issue-86 January-2022
Full-Text PDF
Paper Title : Feature-driven label generation for congestion detection in smart cities under big data
Author Name : Aamish Izhar, Ajay Rastogi, Syed Shafat Ali, S. M. K. Quadri and S. A. M. Rizvi
Abstract :

Due to rapid urbanization and the emergence of smart cities, the problem of traffic congestion has materialized into a major issue for smart city planners. Therefore, traffic congestion prediction is needed to effectively reduce traffic congestion and enhance the road capacity. There have been various studies which have tried to solve the problem of traffic congestion. However, it is difficult to properly judge the effectiveness of such studies given the absence of properly labeled datasets. Additionally, current studies use datasets with relatively lesser number of data instances, which does not correctly reflect the big data nature of the traffic data. Motivated by these problems and challenges, in this paper, we aim to study the problem of traffic congestion with respect to effective label-generation under big data perspective. Essentially, we provide two sound and intuitive techniques for label generation which help in the correct annotation of unlabeled data. One of the techniques is based on the number of vehicles plying on the road and the other is based on the amalgamation of average speed and number of vehicles. For this purpose, we consider a publicly available CityPulse traffic dataset with 13.5 million data instances. Using our techniques, we generate “congested” and “not-congested” labels depicting whether there is congestion on the road or not. To tackle the class imbalance problem, besides using random undersampling and oversampling techniques, we also introduce a mixture of the two techniques to negate any bias inherent to two individual sampling techniques. To test the effectiveness of our label generation approaches, we make the extensive use of various machine learning techniques and for performance evaluation we use all the standard classification evaluation metrics. Finally, we compare our techniques with a previous work which only considered average speed for label generation. Our results demonstrate the effectiveness of the proposed approaches against the comparing method. For example, in random undersampling the F1-score of every classifier under the proposed techniques is close to 1, whereas that under the comparing method, F1-score is as low as 0.70 in multinomial naïve Bayes (MNB) classifier and 0.88 in support vector machine (SVM). Similarly, in oversampling, our approaches have a close F1-score of 1 across all the classifiers, whereas the comparing method gets as low as 0.70 in MNB. The same trend can be seen in the mixture of both the sampling techniques.

Keywords : Smart cities, Big data, Label generation, Classification, Traffic congestion.
Cite this article : Izhar A, Rastogi A, Ali SS, Quadri SM, Rizvi SA. Feature-driven label generation for congestion detection in smart cities under big data. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(86):94-110. DOI:10.19101/IJATEE.2021.874739.
References :
[1]Negara JG, Emanuel AW. A conceptual smart city framework for future industrial city in Indonesia. International Journal of Advanced Computer Science and Applications. 2019; 10(7):453-7.
[Google Scholar]
[2]Nour MK, Naseer A, Alkazemi B, Jamil MA. Road traffic accidents injury data analytics. International Journal of Advanced Computer Science and Applications. 2020; 11(12):762-70.
[Google Scholar]
[3]Dabiri S, Heaslip K. Transport-domain applications of widely used data sources in the smart transportation: a survey. arXiv preprint arXiv:1803.10902. 2018.
[Google Scholar]
[4]Christantonis K, Tjortjis C, Manos A, Filippidou DE, Mougiakou Ε, Christelis E. Using classification for traffic prediction in smart cities. In IFIP international conference on artificial intelligence applications and innovations 2020 (pp. 52-61). Springer, Cham.
[Crossref] [Google Scholar]
[5]Mystakidis A, Tjortjis C. Big data mining for smart cities: predicting traffic congestion using classification. In international conference on information, intelligence, systems and applications 2020 (pp. 1-8). IEEE.
[Crossref] [Google Scholar]
[6]Majumdar S, Subhani MM, Roullier B, Anjum A, Zhu R. Congestion prediction for smart sustainable cities using IoT and machine learning approaches. Sustainable Cities and Society. 2021.
[Crossref] [Google Scholar]
[7]Zafar N, Ul HI. Traffic congestion prediction based on estimated time of arrival. PloS One. 2020; 15(12):1-19.
[Crossref] [Google Scholar]
[8]Zheng J, Huang M. Traffic flow forecast through time series analysis based on deep learning. IEEE Access. 2020; 8:82562-70.
[Crossref] [Google Scholar]
[9]Yu J, Yan Y, Chen X, Luo T. Short-term road traffic flow prediction based on multi-dimensional data. In international conference on intelligent transportation, big data & smart city 2021 (pp. 43-6). IEEE.
[Crossref] [Google Scholar]
[10]Wang Z, Thulasiraman P. Foreseeing congestion using LSTM on urban traffic flow clusters. In 6th international conference on systems and informatics (ICSAI) 2019 (pp. 768-74). IEEE.
[Crossref] [Google Scholar]
[11]Li Y, Huang C, Jiang J. Research of bus arrival prediction model based on GPS and SVM. In chinese control and decision conference 2018 (pp. 575-9). IEEE.
[Crossref] [Google Scholar]
[12]Liu Y, Wu H. Prediction of road traffic congestion based on random forest. In 10th international symposium on computational intelligence and design 2017 (pp. 361-4). IEEE.
[Crossref] [Google Scholar]
[13]Bartlett Z, Han L, Nguyen TT, Johnson P. Prediction of road traffic flow based on deep recurrent neural networks. In smartworld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications. 2019 (pp. 102-9). IEEE.
[Crossref] [Google Scholar]
[14]Wang Y, Li L, Xu X. A piecewise hybrid of ARIMA and SVMs for short-term traffic flow prediction. In international conference on neural information processing 2017 (pp. 493-502). Springer, Cham.
[Crossref] [Google Scholar]
[15]Kumar SV, Vanajakshi L. Short-term traffic flow prediction using seasonal ARIMA model with limited input data. European Transport Research Review. 2015; 7(3):1-9.
[Crossref] [Google Scholar]
[16]Li KL, Zhai CJ, Xu JM. Short-term traffic flow prediction using a methodology based on ARIMA and RBF-ANN. In Chinese automation congress 2017 (pp. 2804-7). IEEE.
[Crossref] [Google Scholar]
[17]Singh M, Srivastava VM. Prediction and avoidance of real-time traffic congestion system for Indian metropolitan cities. International Journal of Vehicle Information and Communication Systems. 2020; 5(1):109-18.
[Google Scholar]
[18]Ali SS, Anwar T, Rastogi A, Rizvi SA. EPA: exoneration and prominence based age for infection source identification. In proceedings of the international conference on information and knowledge management 2019 (pp. 891-900).
[Crossref] [Google Scholar]
[19]Lv Y, Duan Y, Kang W, Li Z, Wang FY. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems. 2014; 16(2):865-73.
[Crossref] [Google Scholar]
[20]Devi S, Neetha T. Machine learning based traffic congestion prediction in a IoT based smart city. International Research Journal of Engineering and Technology. 2017; 4(5):3442-5.
[Google Scholar]
[21]Ren C, Chai C, Yin C, Ji H, Cheng X, Gao G, et al. Short-term traffic flow prediction: a method of combined deep learnings. Journal of Advanced Transportation. 2021.
[Crossref] [Google Scholar]
[22]Saddad E, Mokhtar HM, El-Bastawissy A, Hazman M. Lake data warehouse architecture for big data solutions. International Journal of Advanced Computer Science and Applications. 2020; 11(8):417-24.
[Google Scholar]
[23]Petalas YG, Ammari A, Georgakis P, Nwagboso C. A big data architecture for traffic forecasting using multi-source information. In international workshop of algorithmic aspects of cloud computing 2016 (pp. 65-83). Springer, Cham.
[Crossref] [Google Scholar]
[24]Trovati M. Big-data analytics and cloud computing. Theory, Algorithms and Applications. 2015.
[Google Scholar]
[25]Yin C, Lin Y, Yang C. A classification and predication framework for taxi-hailing based on big data. In international conference on intelligent computing 2017 (pp. 747-58). Springer, Cham.
[Crossref] [Google Scholar]
[26]Florido E, Castaño O, Troncoso A, Martínez-alvarez F. Data mining for predicting traffic congestion and its application to Spanish data. In international conference on soft computing models in industrial and environmental applications 2015 (pp. 341-51). Springer, Cham.
[Crossref] [Google Scholar]
[27]Meng M, Shao CF, Wong YD, Wang BB, Li HX. A two-stage short-term traffic flow prediction method based on AVL and AKNN techniques. Journal of Central South University. 2015; 22(2):779-86.
[Crossref] [Google Scholar]
[28]Xie J, Choi YK. Hybrid traffic prediction scheme for intelligent transportation systems based on historical and real-time data. International Journal of Distributed Sensor Networks. 2017; 13(11):1-11.
[Crossref] [Google Scholar]
[29]Kundu S, Desarkar MS, Srijith PK. Traffic forecasting with deep learning. In region 10 symposium 2020 (pp. 1074-7). IEEE.
[Crossref] [Google Scholar]
[30]Joseph LL, Goel P, Jain A, Rajyalakshmi K, Gulati K, Singh P. A novel hybrid deep learning algorithm for smart city traffic congestion predictions. In international conference on signal processing, computing and control 2021 (pp. 561-5). IEEE.
[Crossref] [Google Scholar]
[31]Zahid M, Chen Y, Jamal A, Memon MQ. Short term traffic state prediction via hyperparameter optimization based classifiers. Sensors. 2020; 20(3):1-22.
[Crossref] [Google Scholar]
[32]Pramanik M, Rahman MM, Anam AS, Ali AA, Amin MA, Rahman AK. Modeling traffic congestion in developing countries using google maps data. In future of information and communication conference 2021 (pp. 513-31). Springer, Cham.
[Crossref] [Google Scholar]
[33]Karau H, Konwinski A, Wendell P, Zaharia M. Learning spark: lightning-fast big data analysis. OReilly Media, Inc.; 2015.
[Google Scholar]
[34]Ali MI, Gao F, Mileo A. Citybench: a configurable benchmark to evaluate rsp engines using smart city datasets. In international semantic web conference 2015 (pp. 374-89). Springer, Cham.
[Crossref] [Google Scholar]
[35]Barnaghi P, Tönjes R, Höller J, Hauswirth M, Sheth A, Anantharam P. Citypulse: real-time iot stream processing and large-scale data analytics for smart city applications. In European semantic web conference (ESWC) 2014.
[Google Scholar]
[36]Kolozali S, Bermudez-edo M, Puschmann D, Ganz F, Barnaghi P. A knowledge-based approach for real-time iot data stream annotation and processing. In international conference on internet of things (iThings), and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) 2014 (pp. 215-22). IEEE.
[Crossref] [Google Scholar]
[37]Rastogi A, Mehrotra M, Ali SS. Effective opinion spam detection: a study on review metadata versus content. Journal of Data and Information Science. 2020; 5(2):76-110.
[Crossref] [Google Scholar]
[38]Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002; 16:321-57.
[Google Scholar]
[39]Deshpande M, Bajaj PR. Performance analysis of support vector machine for traffic flow prediction. In international conference on global trends in signal processing, information computing and communication 2016 (pp. 126-9). IEEE.
[Crossref] [Google Scholar]
[40]Schütze H, Manning CD, Raghavan P. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.
[Google Scholar]
[41]Hosmer JDW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons; 2013.
[Google Scholar]
[42]Breiman L. Random forests. Machine Learning. 2001; 45(1):5-32.
[Google Scholar]
[43]Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016.
[Google Scholar]