北京大学学报自然科学版 ›› 2021, Vol. 57 ›› Issue (5): 938-950.DOI: 10.13209/j.0479-8023.2021.070

上一篇    下一篇

基于机器学习方法的臭氧和PM2.5污染潜势预报模型——以成都市为例

王馨陆1, 黄冉1,†, 张雯娴1, 吕宝磊2, 杜云松3, 张巍3, 李波兰3, 胡泳涛4   

  1. 1. 杭州矮马科技有限公司, 杭州 311121 2. 华云升达(北京)气象科技有限责任公司, 北京 102299 3. 四川省生态环境监测总站, 成都 610091 4. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332
  • 收稿日期:2020-07-31 修回日期:2020-09-15 出版日期:2021-09-20 发布日期:2021-09-20
  • 通讯作者: 黄冉, E-mail: ranhuang2019(at)163.com
  • 基金资助:
    国家重点研发计划(2018YFC0214004)、四川省环境保护科技计划(2019HB03)和四川省重大科技专项(2018SZDZX0023)资助

Forecasting Ozone and PM2.5 Pollution Potentials Using Machine Learning Algorithms: A Case Study in Chengdu

WANG Xinlu1, HUANG Ran1,†, ZHANG Wenxian1, LÜ Baolei2, DU Yunsong3, ZHANG Wei3, LI Bolan3, HU Yongtao4#br#   

  1. 1. Hangzhou AiMa Technologies, Hangzhou 311121 2. Huayun Sounding Meteorological Technology Company, Ltd., Beijing 102299 3. Sichuan Bio-Environmental Monitoring Center, Chengdu 610091 4. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332
  • Received:2020-07-31 Revised:2020-09-15 Online:2021-09-20 Published:2021-09-20
  • Contact: HUANG Ran, E-mail: ranhuang2019(at)163.com

摘要:

以成都市为例, 以多项可能影响污染物时空分布的变量为潜在预报因子, 筛选关键入模因子, 利用2016—2018年数据为训练集, 采用多元线性回归、BP神经网络和随机森林算法, 建立成都市夏季(4—8月)臭氧及冬季(11—2月) PM2.5污染潜势模型, 并利用2019年数据对模型的中长期污染潜势浓度的预报性能进行评估。结果表明, 建立的多元线性回归、BP神经网络和随机森林模型对成都市臭氧及PM2.5的短期(1~3天)污染潜势都具有良好的预报效果, 对7~15天的中长期潜势预报表现稳定。其中, 多元线性回归模型和随机森林模型分别对臭氧和PM2.5表现出相对最佳的预报性能。

关键词: 多元线性回归, BP神经网络, 随机森林, 中长期潜势预报

Abstract:

Potential forecast models have been developed for air pollution of summertime (Apr.–Aug.) ozone and wintertime (Nov.–Feb.) PM2.5 in Chengdu using the multiple linear regression (MLR), back-propagation (BP) neural network (NN) and random forest (RF) algorithms. The key predicting factors for each of the models are selected from various potential factors that may impact the spatiotemporal distribution of pollutions. The models are trained and established with 2016–2018 datasets and evaluated with a data-withheld method and further with independent 2019 dataset. The results show that the MLR, NN and RF models are all capable to accurately predict O3 and PM2.5 pollution potentials in short lead-time (1–3 days) in Chengdu. The models are also found having quite stable performances in medium- and long-term (7–15 days lead time) forecasts. Among the three models, the MLR model performs the best in prediction of O3, while RF model performs the best for PM2.5.

Key words: multiple linear regression, BP neural network, random forest, medium- and long-term air pollution potential forecast