COMPARATIVE ASSESSMENT OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR NO2 AND SO2 CONCENTRATION PREDICTION /
OLAOLUWA SHADRACK ARAROMI; SUPERVISOR: ASSOC. PROF. DR. SEDEF ÇAKIR
- xi, 88 sheets; 31 cm. Includes CD
Thesis (MSc) - Cyprus International University. Institute of Graduate Studies and Research Environmental Sciences Department
Includes bibliography (sheets 69-82)
ABSTRACT Air pollution is a significant environmental concern due to its negative impacts on human health and the environment. Accurate prediction of NO2 and SO2 levels is important for air quality management efforts, as it allows authorities to take preventative measures to mitigate potentially hazardous conditions. In this study, machine-learning techniques were used to predict hourly concentration of NO2 and daily concentration of SO2 pollutant levels using a range of input variables including wind speed, temperature, pressure, relative humidity, hour, day, year, and month. Data was used between 2012 and 2015, and 15 different models were constructed using various combinations of input variables for both pollutant considered. The algorithms utilized were extreme gradient boost, light gradient boost, random forest, support vector regression. The machine learning models were compared with the multiple linear regression. Seasonal predictions were also made using these models, and the results showed that the combination of input variables and model used significantly influenced the accuracy of the predictions. The models with the lowest root mean squared error (RMSE) and mean squared error (MSE) values were the Light GBM and XGBoost models, with the Light GBM algorithm showing the best prediction with an R-value of 0.778 for model B, which included 7 input variables. The results also showed that the inclusion of the pressure variable in model A (8 input variables) reduced the prediction accuracy for the XGBoost model. The random forest model provided the most accurate predictions overall, particularly for the fall season. These findings demonstrate the importance of both the number and type of input variables in predicting NO2 levels and the potential for machine learning to support air quality management efforts. It is also seen in the result that the XGBoost showed the most accurate prediction for the seasons considered, except from summer, where the random forest algorithm gave a better accuracy for SO2 prediction Key words: Machine learning, NO2, Prediction, Random Forest, Root Mean Square, Seasonal, SO2