A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization

TLDR

Machine learning can efficiently train models on large datasets, yet prior air‑quality studies mainly use long‑term data and simple regression models to predict hourly pollutant levels. This study aims to forecast hourly concentrations of ozone, PM₂.₅, and SO₂ using refined machine‑learning models based on previous days’ meteorological data. The authors formulate the 24‑hour prediction as a multi‑task learning problem, applying regularization that enforces consecutive‑hour predictions to be similar and comparing Frobenius, nuclear, and ℓ₂,₁ norms. Experiments show that these parameter‑reducing and consecutive‑hour regularizations outperform standard regression models and other regularization schemes.

Abstract

In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., ozone, particle matter ( PM 2.5 ) and sulfur dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model on big data by using large-scale optimization algorithms. Although there exist some works applying machine learning to air quality prediction, most of the prior studies are restricted to several-year data and simply train standard regression models (linear or nonlinear) to predict the hourly air pollution concentration. In this work, we propose refined models to predict the hourly air pollution concentration on the basis of meteorological data of previous days by formulating the prediction over 24 h as a multi-task learning (MTL) problem. This enables us to select a good model with different regularization techniques. We propose a useful regularization by enforcing the prediction models of consecutive hours to be close to each other and compare it with several typical regularizations for MTL, including standard Frobenius norm regularization, nuclear norm regularization, and ℓ 2 , 1 -norm regularization. Our experiments have showed that the proposed parameter-reducing formulations and consecutive-hour-related regularizations achieve better performance than existing standard regression models and existing regularizations.

References

Page 1

	Year	Citations

Page 1