Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2019, Vol. 55 ›› Issue (2): 335-341.DOI: 10.13209/j.0479-8023.2019.001

Previous Articles     Next Articles

Clustering of Lake Variables Based on Pattern Recognition Method

REN Tingyu, LIANG Zhongyao, CHEN Huili, LIU Yong   

  1. College of Environmental Science and Engineering, Key Laboratory of Water and Sediment Sciences Ministry of Education, Peking University, Beijing 100871
  • Received:2018-03-16 Revised:2018-07-21 Online:2019-03-20 Published:2019-03-20
  • Contact: LIU Yong, E-mail: yongliu(at)pku.edu.cn

基于模式识别方法的湖泊水质污染特征聚类研究

任婷玉, 梁中耀, 陈会丽, 刘永   

  1. 北京大学环境科学与工程学院, 水沙科学教育部重点实验室, 北京 100871
  • 通讯作者: 刘永, E-mail: yongliu(at)pku.edu.cn
  • 基金资助:
    国家自然科学基金(51779002)资助

Abstract:

The self-organizing feature map (SOFM) and random forest (RF) method were integrated to recognize water quality patterns of nine water quality indicators for 63 lakes in China for 11 years (5110 data). The SOFM was built firstly to cluster lakes to identify the pollution conditions. Then, the RF was used to explore the good-offitness of water quality variables on the clustering result and to determine the important water quality indicators. The result of SOFM shows that the lakes can be clustered into three types. And the result of RF shows that permanganate index and chlorophyll a can determine the pollution condition when the classification accuracy is 80%. The integrated method can identify the water quality indicators reflecting the pollution conditions from complex data. In practice, the method can be used to determine the pollution conditions and direct the monitoring indicators.

Key words: pattern recognition, water pollution, self-organizing feature map, random forest

摘要:

构建耦合自组织映射神经网络(SOFM)和随机森林(RF)的方法, 对中国63个湖泊11年的9种水质指标(5110条数据)进行模式识别。首先采用SOFM对湖泊进行聚类, 以识别污染状况, 然后采用RF分析水质指标对湖泊类别的决定效果, 以确定代表性指标。SOFM的结果显示, 湖泊可以按污染程度分为3类。RF的结果发现, 在分类准确率为80%时, 根据高锰酸盐指数和叶绿素a浓度即可判定湖泊污染程度。该方法可从庞杂的数据中识别出反映水体污染特征的水质指标, 为快速认知水体污染状况及选取监测指标提供参考。

关键词: 模式识别, 水质污染, 自组织映射神经网络, 随机森林