Araromi, Olaoluwa Shadrack

COMPARATIVE ASSESSMENT OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR NO2 AND SO2 CONCENTRATION PREDICTION / OLAOLUWA SHADRACK ARAROMI; SUPERVISOR: ASSOC. PROF. DR. SEDEF ÇAKIR - xi, 88 sheets; 31 cm. Includes CD

Thesis (MSc) - Cyprus International University. Institute of Graduate Studies and Research Environmental Sciences Department

Includes bibliography (sheets 69-82)

ABSTRACT
Air pollution is a significant environmental concern due to its negative impacts on
human health and the environment. Accurate prediction of NO2 and SO2 levels is
important for air quality management efforts, as it allows authorities to take
preventative measures to mitigate potentially hazardous conditions. In this study,
machine-learning techniques were used to predict hourly concentration of NO2 and
daily concentration of SO2 pollutant levels using a range of input variables including
wind speed, temperature, pressure, relative humidity, hour, day, year, and month. Data
was used between 2012 and 2015, and 15 different models were constructed using
various combinations of input variables for both pollutant considered. The algorithms
utilized were extreme gradient boost, light gradient boost, random forest, support
vector regression. The machine learning models were compared with the multiple
linear regression. Seasonal predictions were also made using these models, and the
results showed that the combination of input variables and model used significantly
influenced the accuracy of the predictions. The models with the lowest root mean
squared error (RMSE) and mean squared error (MSE) values were the Light GBM and
XGBoost models, with the Light GBM algorithm showing the best prediction with an
R-value of 0.778 for model B, which included 7 input variables. The results also
showed that the inclusion of the pressure variable in model A (8 input variables)
reduced the prediction accuracy for the XGBoost model. The random forest model
provided the most accurate predictions overall, particularly for the fall season. These
findings demonstrate the importance of both the number and type of input variables in
predicting NO2 levels and the potential for machine learning to support air quality
management efforts. It is also seen in the result that the XGBoost showed the most
accurate prediction for the seasons considered, except from summer, where the random
forest algorithm gave a better accuracy for SO2 prediction
Key words: Machine learning, NO2, Prediction, Random Forest, Root Mean Square,
Seasonal, SO2

Subjects--Topical Terms:
Machine learning--Dissertations, Academic
Nitrogen dioxide--Dissertations, Academic
Sulfur dioxide--Dissertations, Academic