北京大学学报自然科学版 ›› 2021, Vol. 57 ›› Issue (1): 53-60.DOI: 10.13209/j.0479-8023.2020.083

上一篇    下一篇

基于分层序列标注的实体关系联合抽取方法

田佳来1, 吕学强1, 游新冬1,†, 肖刚2, 韩君妹2   

  1. 1. 北京信息科技大学, 网络文化与数字传播北京市重点实验室, 北京 100101 2. 复杂系统仿真总体重点实验室, 军事科学院系统工程研究院, 北京 100101
  • 收稿日期:2020-06-11 修回日期:2020-08-14 出版日期:2021-01-20 发布日期:2021-01-20
  • 通讯作者: 游新冬, E-mail: youxindong(at)bistu.edu.cn
  • 基金资助:
    国家自然科学基金(61671070)、国家语委重点项目(ZDI135-53)、国防科技重点实验室基金(6142006190301)、北京信息科技大学促进高校内涵发展科研水平提高项目(2019KYNH226)和北京信息科技大学“勤信人才”培育计划(QXTCPB201908)资助

Joint Extraction of Entities and Relations Based on Hierarchical Sequence Labeling

TIAN Jialai1, LÜ Xueqiang1, YOU Xindong1,†, XIAO Gang2, HAN Junmei2   

  1. 1. Beijing Information Science and Technology University, Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing 100101

    2. National Key Laboratory for Complex Systems Simulation, Institute of Systems Engineering, Beijing 100101

  • Received:2020-06-11 Revised:2020-08-14 Online:2021-01-20 Published:2021-01-20
  • Contact: YOU Xindong, E-mail: youxindong(at)bistu.edu.cn

摘要:

为了提高实体关系联合抽取的效果, 提出一种端到端的联合抽取模型(HSL)。HSL模型采取一种新的标记方案, 将实体和关系的联合抽取转化成序列标注问题, 同时采用分层的序列标注方式来解决三元组重叠问题。实验证明, HSL模型能有效地解决三元组重叠问题, 在军事语料数据集上F1值达到80.84%, 在公开的WebNLG数据集上F1值达到86.4%, 均超过目前主流的三元组抽取模型, 提升了三元组抽取的效果。

关键词: 实体关系联合抽取, 三元组重叠, 序列标注, 知识图谱, HSL

Abstract:

In order to further improve the effect of entity relationship joint extraction, this paper proposes an endto-end joint extraction model (HSL). HSL model adopts a new labeling scheme to transform the joint extraction of entities and relationships into sequence labeling problems, and uses a layered sequence labeling method to solve the problem of triple overlap. The experiments demonstrates that HSL model can effectively deal with the problem of triple overlap and improve the extraction effect. The F1 value on the military corpus data set reaches 80.84%, and 86.4% on the WebNLG open data set, which exceeds the current mainstream triple extraction model, improving the effect of triple extraction.

Key words: entity relationship joint extraction, triple overlap, sequence annotation, knowledge graph, HSL