Forecasting the occurrence of ozone episode days can be regarded as an imbalanced dataset classification problem. Since the standard artificial neural network (ANN) methods cannot make accurate predictions of such a problem, two cost-sensitive ANN methods, cost-penalty and moving threshold, were used in this study. The models classify each day as episode or non-episode according to the standard of daily maximum 8 h O3 concentration. The ozone measurements from six monitoring stations in Taiwan were used for model training and performance evaluation. Two different input datasets, regional and single-site, were generated from raw air quality and meteorological observations. According to the numerical experiments, the predictions based on the regional dataset are much better than those obtained from the single-site dataset. Two cost-sensitive ANN methods were evaluated by receiver operating characteristic (ROC) curves. It was found that the results obtained by the two approaches are similar. If the misclassification costs are known, the cost-sensitive method can minimise the total costs. If the misclassification costs are unknown, the cost-sensitive ANN can obtain a better forecast than the standard ANN method when an appropriate cost ratio is used. For clean areas where episode days are very rare, the forecasts are poor for all methods.
Science of the Total Environment 407, pp.2124-2135