北京大学学报自然科学版 ›› 2019, Vol. 55 ›› Issue (1): 8-14.DOI: 10.13209/j.0479-8023.2018.056

上一篇    下一篇

中文嵌套命名实体关系抽取研究

许浩亮, 李雁群, 何云琪, 钱龙华   

  1. 苏州大学计算机科学与技术学院, 苏州 215006
  • 收稿日期:2018-04-15 修回日期:2018-08-06 出版日期:2019-01-20 发布日期:2019-01-20
  • 通讯作者: 钱龙华, E-mail: qianlonghua(at)suda.edu.cn
  • 基金资助:
    国家自然科学基金(2017YFB1002101)资助

Research on Chinese Nested Named Entity Relation Extraction

XU Haoliang, LI Yanqun, HE Yunqi, QIAN Longhua   

  1. School of Computer Science & Technology, Soochow University, Suzhou 215006
  • Received:2018-04-15 Revised:2018-08-06 Online:2019-01-20 Published:2019-01-20
  • Contact: QIAN Longhua, E-mail: qianlonghua(at)suda.edu.cn

摘要:

为了解决嵌套命名实体关系抽取研究缺乏相关语料库这一问题, 在现有中文命名实体语料库的基础上, 将人工标注与机器学习相结合来抽取其语义关系。人工标注一个中文嵌套命名实体关系语料库, 然后分别采用支持向量机和卷积神经网络等方法, 进行中文嵌套实体关系抽取实验。实验结果表明, 在人工标注实体的中文嵌套命名实体语料上, 嵌套实体关系抽取的性能非常好, F1指数达到95%以上, 而在自动识别实体上的抽取性能尚不理想。

关键词: 嵌套实体关系抽取, 信息抽取, 支持向量机, 卷积神经网络

Abstract:

Nested named entities relationship extraction research lacks corresponding benchmark corpora. To solve this problem, manual annotation with machine learning are combined to extract their semantic relationships from an existing Chinese named entity recognition corpus. The authors manually annotate a Chinese nested named entity relation corpus from existing Chinese named entity recognition and conduct experiments with relation extraction between nested named entities via support vector machines (SVM) and convolutional neural network (CNN) models respectively. The experimental results show that the nested entity relation extraction performs excellently on the corpus with manually labeled entities, obtaining an F1 score of over 95%, while it falls short of expectations with automatically recognized entities.

Key words: nested entity relation extraction, information extraction, support vector machines, convolutional neural network