北京大学学报自然科学版 ›› 2023, Vol. 59 ›› Issue (3): 456-466.DOI: 10.13209/j.0479-8023.2023.030

上一篇    下一篇

基于BERT的民生问题文本分类模型——以浙江省政务热线数据为例

孔祥夫1,2,†, 董波1, 徐可2,3, 陶永亮1   

  1. 1. 之江实验室, 人工智能社会治理研究中心, 杭州 311121 2. 北京大学深圳研究生院, 城市规划与设计学院, 深圳 518055 3. 浙江省发展规划研究院, 城镇发展研究所, 杭州 310030
  • 收稿日期:2022-05-17 修回日期:2022-06-23 出版日期:2023-05-20 发布日期:2023-05-20
  • 通讯作者: 孔祥夫, E-mail: 1601111702(at)pku.edu.cn
  • 基金资助:
    浙江省软科学研究计划重点项目(2021C25021)资助

Text Classification Model for Livelihood Issues Based on BERT: A Study Based on Hotline Compliant Data of Zhejiang Province

KONG Xiangfu1,2,†, DONG Bo1, XU Ke2,3, TAO Yongliang1   

  1. 1. Research Center for AI Social Governance, Zhejiang Lab, Hangzhou 311121 2. School of Urban Planning and Design, Peking University Shenzhen Graduate School, Shenzhen 518055 3. Institute of Urban and Rural Development, Zhejiang Development and Planning Institute, Hangzhou 310030
  • Received:2022-05-17 Revised:2022-06-23 Online:2023-05-20 Published:2023-05-20
  • Contact: KONG Xiangfu, E-mail: 1601111702(at)pku.edu.cn

摘要:

基于2017—2021年浙江省12345政务热线数据, 从居民视角构建细粒度的民生问题三级分类体系, 并利用BERT预训练模型来构建文本分类模型, 将居民诉求文本转化为民生问题标签。研究结果表明, 在政务热线数据中加入30%的人工生成诉求样本, 可以使模型的分类准确率提升约10个百分点, 准确率最高可达84.59%。对浙江省各类民生问题占比的分析结果表明, 环境保护、违规经营和市政服务等诉求的比例呈现下降趋势, 而公共服务、交通问题、购房问题和新兴消费模式的诉求比例呈上升趋势。研究结果有助于加强政府对于民情民意的了解, 提升数据驱动的社会治理能力。

关键词: 民生问题文本分类, BERT, 政务热线数据, 数据治理

Abstract:

Using the 12345 hotline compliant data from 2017 to 2021 in Zhejiang Province, a fine-grained three-level classification system for livelihood issues was constructed from the perspective of social cognition. A BERT pre-training model was developed to convert complaint texts into labels for livelihood issues. The validation result showed that adding 30% artificial complaint texts in the training set could increase roughly the accuracy rate by 10 percent, and the accuracy rate could be as high as 84.59%. Moreover, livelihood issue proportions of environmental protection, irregular business and municipal services had shown downward trends, while proportions of public services, traffic managements, house purchase issues, and emerging consumption patterns had shown upward trends. This study showed great values of combining the deep learning technology with 12345 hotline compliant data in improving data-driven social governance capabilities.

Key words: livelihood issue text classification, BERT, hotline complaint data, data-driven governance