北京大学学报自然科学版 ›› 2020, Vol. 56 ›› Issue (1): 45-52.DOI: 10.13209/j.0479-8023.2019.092

上一篇    下一篇

基于句法结构的神经网络复述识别模型

刘明童, 张玉洁, 徐金安, 陈钰枫   

  1. 北京交通大学计算机与信息技术学院, 北京 100044
  • 收稿日期:2019-05-22 修回日期:2019-09-20 出版日期:2020-01-20 发布日期:2020-01-20
  • 通讯作者: 张玉洁, E-mail: yjzhang(at)bjtu.edu.cn
  • 基金资助:
    国家自然科学基金(61876198, 61976015, 61370130, 61473294)、中央高校基本科研业务费专项资金(2018YJS025)、北京市自然科学基金(4172047)和科学技术部国际科技合作计划(K11F100010)资助

A Neural Paraphrase Identification Model Based on Syntactic Structure

LIU Mingtong, ZHANG Yujie, XU Jin’an, CHEN Yufeng   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044
  • Received:2019-05-22 Revised:2019-09-20 Online:2020-01-20 Published:2020-01-20
  • Contact: ZHANG Yujie, E-mail: yjzhang(at)bjtu.edu.cn

摘要:

为解决已有复述语义计算方法未考虑句法结构的问题, 提出基于句法结构的神经网络复述识别模型, 设计基于树结构的神经网络模型进行语义组合计算, 使得语义表示从词语级扩展到短语级。进一步地, 提出基于短语级语义表示的句法树对齐机制, 利用跨句子注意力机制提取特征。最后, 设计自注意力机制来增强语义表示, 从而捕获全局上下文信息。在公开英语复述识别数据集Quora上进行评测, 实验结果显示, 复述识别性能得到改进, 达到89.3%的精度, 证明了提出的基于句法结构的语义组合计算方法以及基于短语级语义表示的跨句子注意力机制和自注意力机制在改进复述识别性能方面的有效性。

关键词: 复述识别, 句法结构, 树结构神经网络, 注意力机制

Abstract:

Paraphrase identification involves natural language semantic understanding. Most previous methods regarded sentences as sequential structures, and used sequential neural network for semantic composition. These methods do not consider the influence of syntactic structure on semantic computation. In this paper, we proposed a neural paraphrase identification model based on syntactic structure, and designed a tree-based neural network model for semantic composition, which extended the semantic representation from word level to phrase level. Furthermore, this paper proposed a syntactic tree alignment mechanism based on phrase-level semantic representation, and extracted features by using cross-sentence attention mechanism. Finally, a self-attention mechanism was used to enhance semantic representation, which could effectively model context information based on syntactic structure. Experiments on Quora paraphrase dataset show that the performance of paraphrase identification has been improved to 89.3% accuracy. The results further prove that the proposed semantic composition method based on syntactic structure, phrase-level cross sentence attention and self-attention are effective in improving paraphrase identification.

Key words: paraphrase identification, syntactic structure, tree-structured neural network, attention mechansim