Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2021, Vol. 57 ›› Issue (1): 1-6.DOI: 10.13209/j.0479-8023.2020.084

Previous Articles     Next Articles

Abstractive Text Summarization Based on Semantic Alignment Network

WU Shixin, HUANG Degen, LI Jiuyi   

  1. Dalian University of Technology, Dalian 116023
  • Received:2020-05-15 Revised:2020-08-12 Online:2021-01-20 Published:2021-01-20
  • Contact: HUANG Degen, E-mail: huangdg(at)dlut.edu.cn

基于语义对齐的生成式文本摘要研究

吴世鑫, 黄德根, 李玖一   

  1. 大连理工大学计算机学院, 大连 116023
  • 通讯作者: 黄德根, E-mail: huangdg(at)dlut.edu.cn
  • 基金资助:
    国家自然科学基金(U1936109, 61672127)资助

Abstract:

Aiming at the problem of insufficient utilization of the overall semantic information of abstracts in decoding by the currently abstractive summarization model, this paper proposes a neural network automatic abstract model based on semantic alignment. This model is based on the Sequence-to-Sequence model with attention, Pointer mechanism and Coverage mechanism. A semantic alignment network is added between the encoder and the decoder to achieve the semantic information alignment of the text to the abstract. The achieved semantic information is concatenated with the context vector in decoding, so that when the decoder predicts the vocabulary, it not only uses the partial semantics before decoding, but also considers the overall semantics of the digest sequence. Experiments on the Chinese news corpus LCSTS show that the proposed model can effectively improve the quality of abstractive summarization.

Key words: abstractive summarization, Sequence-to-Sequence model, semantic alignment network

摘要:

针对当前生成式文本摘要模型在解码时对摘要整体语义信息利用不充分的问题, 提出一种基于语义对齐的神经网络文本摘要方法。该方法以带注意力、Pointer机制和Coverage机制的Sequence-to-Sequence模型为基础, 在编码器与解码器之间加入语义对齐网络, 实现文本到摘要的语义信息对齐; 将获得的摘要整体语义信息与解码器的词汇预测上下文向量进行拼接, 使解码器在预测当前词汇时不仅利用已预测词汇序列的部分语义, 而且考虑拟预测摘要的整体语义。在中文新闻语料LCSTS上的实验表明, 该模型能够有效地提高文本摘要的质量, 在字粒度上的实验显示, 加入语义对齐机制可以使Rouge_L值提高5.4个百分点。

关键词: 生成式文本摘要, Sequence-to-Sequence模型, 语义对齐网络