北京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (3): 466-474.DOI: 10.13209/j.0479-8023.2017.167

上一篇    下一篇

助词“的”用法自动识别研究

刘秋慧1, 张坤丽1,†, 许鸿飞1, 俞士汶2, 昝红英1   

  1. 1. 郑州大学信息工程学院 郑州 450001
    2. 北京大学计算语言学教育部重点实验室, 北京 100871
  • 收稿日期:2017-01-21 修回日期:2017-11-26 出版日期:2018-05-20 发布日期:2018-05-20
  • 通讯作者: 张坤丽, E-mail: ieklzhang(at)zzu.edu.cn
  • 基金资助:
    国家重点基础研究发展计划(2014CB340504)、国家自然科学基金(61402419, 60970083)、国家社会科学基金(14BYY096)、计算语言学教育部重点实验室开放课题项目、河南省科技厅基础研究项目(142300410231, 142300410308)、河南省教育厅科学技术研究重点项目(15A520098)和河南省科技厅科技攻关项目(172102210478)资助

Research on Automatic Recognition of Auxiliary “DE”

LIU Qiuhui1, ZHANG Kunli1,†, XU Hongfei1, YU Shiwen2, ZAN Hongying1   

  1. 1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001
    2. Key Laboratory of Computational Linguistics (MOE), Peking University, Beijing 100871
  • Received:2017-01-21 Revised:2017-11-26 Online:2018-05-20 Published:2018-05-20
  • Contact: ZHANG Kunli, E-mail: ieklzhang(at)zzu.edu.cn

摘要:

在“三位一体”虚词用法知识库的基础上, 分别采用基于规则、基于CRF模型和神经网络模型的门循环单元, 对助词“的”用法进行自动识别, 识别的准确率分别为 34.4%, 77.5%和81.3%。在对助词“的”用法进行分析的基础上, 合并了部分用法, 并采用CRF模型和神经网络模型进行粗粒度用法识别, 准确率分别达到81.8%和 84.5%, 得到较明显的提高。期望识别结果可以应用于其他自然语言处理任务中。

关键词: “的”, 门循环单元, 规则, CRF

Abstract:

Based on the triune Chinese function word usage knowledge base (CFKB), the rule-based method, CRF (conditional random field) model and GRU (gated recurrent unit) are adopted to automatically recognize the usages of auxiliary “DE”, and the accuracy rates are 34.4%, 77.5% and 81.3% respectively. In order to improve the accuracy, some usages of auxiliary “DE” are combined and formed coarse-grained usage. The accuracy of CRF achieves 81.8%, and the accuracy of neural network model achieves 84.5%. It is expected that the recognition result of auxiliary “DE” can improve the performance of other NLP task.

Key words: “DE”, GRU (gated recurrent unit), rule, CRF (conditional random field)

中图分类号: