Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities

TLDR

Air pollution in smart cities is a major environmental challenge, and real‑time IoT‑based monitoring has transformed air‑quality prediction, yet comparative studies of machine‑learning techniques and their processing times across datasets are lacking. The study aims to comparatively analyze machine‑learning techniques for air‑quality prediction, focusing on processing time across multiple datasets. The authors conducted experiments on multiple datasets using Apache Spark, evaluating regression models with MAE and RMSE and measuring processing time for both standalone learning and hyperparameter‑tuned fitting. The comparative study identified the best regression model for accurate air‑quality prediction, balancing data size and processing time.

Abstract

Dealing with air pollution presents a major environmental challenge in smart city environments. Real-time monitoring of pollution data enables local authorities to analyze the current traffic situation of the city and make decisions accordingly. Deployment of the Internet of Things-based sensors has considerably changed the dynamics of predicting air quality. Existing research has used different machine learning tools for pollution prediction; however, comparative analysis of these techniques is required to have a better understanding of their processing time for multiple datasets. In this paper, we have performed pollution prediction using four advanced regression techniques and present a comparative study to determine the best model for accurately predicting air quality with reference to data size and processing time. We have conducted experiments using Apache Spark and performed pollution estimation using multiple datasets. The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) have been used as evaluation criteria for the comparison of these regression models. Furthermore, the processing time of each technique through standalone learning and through fitting the hyperparameter tuning on Apache Spark has also been calculated to find the best-fit model in terms of processing time and lowest error rate.

References

Page 1

	Year	Citations

Page 1