北京大学学报自然科学版 ›› 2022, Vol. 58 ›› Issue (3): 391-404.DOI: 10.13209/j.0479-8023.2021.118

上一篇    下一篇

面向武器装备领域的复杂实体识别

游新冬1, 葛昊杰1, 韩君妹2, 李育贤1, 吕学强1,†   

  1. 1. 北京信息科技大学网络文化与数字传播北京市重点实验室, 北京 100101 2. 军事科学院系统工程研究院复杂系统仿真总体重点实验室, 北京 100101
  • 收稿日期:2021-09-01 修回日期:2021-11-01 出版日期:2022-05-20 发布日期:2022-05-20
  • 通讯作者: 吕学强, E-mail: lxq(at)bistu.edu.cn
  • 基金资助:
    北京市自然科学基金(4212020)、国家自然科学基金(62171043)、国防科技重点实验室基金(6412006200404)、北京信息科技大学“勤信人才”培育计划项目(QXTCP B201908)和北京市市教委科研计划(KM202111232001)资助

Recognition of Complex Entities in Weapons and Equipment Field

YOU Xindong1, GE Haojie1, HAN Junmei2 , LI Yuxian1, LÜ Xueqiang1,†   

  1. 1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101

    2. National Key Laboratory for Complex Systems Simulation, Institute of Systems Engineering, Beijing 100101

  • Received:2021-09-01 Revised:2021-11-01 Online:2022-05-20 Published:2022-05-20
  • Contact: LÜ Xueqiang, E-mail: lxq(at)bistu.edu.cn

摘要:

针对武器装备领域复杂实体的特点, 提出一种融合多特征后挂载武器装备领域知识的复杂命名实体识别方法。首先, 使用BERT 模型对武器装备领域数据进行预训练, 得到数据向量, 使用Word2Vec模型学习郑码、五笔、拼音和笔画的上下位特征, 获取特征向量。然后, 将数据向量与特征向量融合, 利用Bi-LSTM模型进行编码, 使用CRF解码得到标签序列。最后, 基于武器装备领域知识, 对标签序列进行复杂实体的触发检测, 完成复杂命名实体识别。使用环球军事网数据作为语料进行实验, 分析不同的特征组合、不同神经网络模型下的识别效果, 并提出适用于评价复杂命名实体识别结果的计算方法。实验结果表明, 提出的挂载领域知识且融合多特征的武器装备复杂命名实体识别方法的F1值达到95.37%, 优于现有方法。

关键词: 武器装备, 复杂命名实体识别, 郑码, 领域规则, BERT, 评价方法

Abstract:

Aiming at the characteristics of complex entities in weapons and equipment field, a complex named entity recognition method is proposed which integrates multi-features and mounts the domain knowledge of weapons and equipment. First, we use the BERT model to pre-train on the weapon equipment field data to obtain the data vector, and use the Word2Vec model to learn context features of Zhengma, Wubi, Pinyin, and strokes to obtain the feature vector. Then the data vector and the feature vector are fused, the Bi-LSTM model is used for encoding, and the CRF decoder is used to obtain the tag sequence. Finally, the detection of complex entities on the label sequence is triggered to complete the recognition of complex named entities. In the experiments, we use the data collected from Global Military Network as the corpus, and analyze the recognition effect of different feature combinations and neural network models. A calculation method suitable for evaluating the recognition results of complex named entities is also proposed. The experimental results show that the F1-value of the proposed method for recognizing complex named entities of weapons and equipment with domain knowledge and fusion of multifeatures reaches 95.37%, which outperforms the existing methods.

Key words: weapon and equipment, complex named entity recognition, Zhengma, domain rules, BERT, evaluation method