Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2020, Vol. 56 ›› Issue (5): 931-938.DOI: 10.13209/j.0479-8023.2020.070

Previous Articles     Next Articles

Improving Air Quality Forecast Accuracy in Urumqi-Changji-Shihezi Region Using an Ensemble Deep Learning Approach

ZHANG Bin1, LÜ Baolei2, WANG Xinlu3, ZHANG Wenxian3,†, HU Yongtao4   

  1. 1. Xinjiang Bingtuan Environmental Protection Sciences Research Institute, Urumqi 830002 2. Huayun Sounding Meteorological Technology Company, Ltd., Beijing 102299 3. Hangzhou AiMa Technologies, Hangzhou 311121 4. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332
  • Received:2019-09-11 Revised:2019-10-12 Online:2020-09-20 Published:2020-09-20
  • Contact: ZHANG Wenxian, E-mail: pkuzhangwx(at)gmail.com

利用集合深度学习方法订正空气质量数值预报结果——以新疆乌昌石城市群为例

张斌1, 吕宝磊2, 王馨陆3, 张雯娴3,†, 胡泳涛4   

  1. 1. 新疆生产建设兵团环境保护科学研究所, 乌鲁木齐 830002 2. 华云升达(北京)气象科技有限责任公司, 北京 102299 3. 杭州矮马科技有限公司, 杭州 311121 4. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332
  • 通讯作者: 张雯娴, E-mail: pkuzhangwx(at)gmail.com
  • 基金资助:
    矮马科技自主研发项目(研字2016-003)和华云集团科技项目(HYKJXM-201803)资助

Abstract:

A post-correction framework based on raw forecasts from the numerical air quality model CMAQ is implemented in the Urumqi-Changji-Shihezi region of Xinjiang Autonomous Region to achieve better forecasting performance of PM2.5. An ensemble deep learning method is used to correct the error of original forecasts of CMAQ. The method integrates four machine learning models: deep neural network model, random forest model, gradient boosting model and generalized linear model. In each model, the original meteorological forecasts, air quality forecasts and land use types are used as input data. With the independent evaluation data in 2018, the accuracy of the “bias-corrected” forecasts is significantly improved. The R2 values of the 5-day forecast is 0.41–0.60, which are improved from the original forecasts by 60%–160%, while the RMSE values are reduced by ~40%. As for the cross evaluation, the R2 values of post-corrected results increase by 50%–80%, while RMSE values are reduced by ~30%. The post-correction method is computationally efficient and can be deployed operationally for reliable daily forecasting.

Key words: objective correction, multi-source data, machine learning, ensemble forecast

摘要:

开展基于空气质量数值模式CMAQ (社区多尺度空气质量模型)预报结果的后校正算法研究。利用集合深度学习方法, 对CMAQ的PM2.5 (细颗粒物)原始预报结果进行误差订正, 以期提高预报准确率。该方法集合了深度神经网络模型、随机森林模型、梯度提升模型和广义线性模型4种机器学习模型, 在每一个模型中结合原始的气象预报、空气质量预报和土地利用类型等多源数据作为辅助变量, 对PM2.5预报浓度进行校正, 最后求取4个模型的集合结果。将该方法应用于订正新疆乌(鲁木齐)昌(吉)石(河子)城市群的CMAQ预报结果, 利用2018年的独立样本进行评估, 订正预报结果的准确性显著提升, 站点5天预报的决定系数R2为0.41~0.60, 比原始预报提高60%~160%, 均方根误差RMSE降低 40%左右; 交叉验证的站点预报R2同样提升50%~80%, RMSE下降30%左右。该订正方法的计算效率高, 可以部署于业务化预报平台, 进行可靠的运行。

关键词: 客观订正, 多源数据, 机器学习, 集合预报