北京大学学报自然科学版 ›› 2021, Vol. 57 ›› Issue (1): 38-44.DOI: 10.13209/j.0479-8023.2020.089

上一篇    下一篇

开放域对话系统的抗噪回复生成模型

朱钦佩, 缪庆亮   

  1. 苏州思必驰信息科技有限公司, 苏州 215000
  • 收稿日期:2020-06-04 修回日期:2020-08-10 出版日期:2021-01-20 发布日期:2021-01-20
  • 通讯作者: 朱钦佩, E-mail: ross.zhu(at)aispeech.com

An Antinoise Response Generation for Open Domain Dialogue System

ZHU Qinpei, MIAO Qingliang   

  1. AI Speech Co., Ltd., Suzhou 215000
  • Received:2020-06-04 Revised:2020-08-10 Online:2021-01-20 Published:2021-01-20
  • Contact: ZHU Qinpei, E-mail: ross.zhu(at)aispeech.com

摘要:

为缓解输入语句中噪声对回复生成模型的干扰, 提出一个基于编码–解码框架的抗噪模型。首先,在训练集输入序列中随机加入模拟噪声字符; 然后,在编码端输出层训练噪声字符识别, 提升模型对噪声特征的提取能力; 最后, 在编码端输出层融合预训练语言模型, 扩大模型对噪声的覆盖面。为验证该模型的抗噪效果, 构建首个带真实噪声的单轮开放域闲聊系统抗噪能力测试集。在该测试集上的实验结果表明, 所提出的抗噪模型自动评测和人工评测结果均优于基准模型。

关键词: 自然语言生成, 预训练语言模型, BERT, Transformer模型

Abstract:

In order to reduce the noise interference on the response generation model, this paper proposes an antinoise model based on encoder-decoder architecture. Firstly, simulation noisy characters are added to the input utterances. Then noisy character recognition is trained at the encoder output layer, thus improving the ability of extracting noise features. Finally, pre-trained language model is fused at the encoder output layer to expand the coverage of noise. An antinoise test set is presented for verifying the model’s antinoise effect, which is the first Chinese single-turn open domain dialog system corpus with real noise. Experiments show that the proposed model’s results of automatic evaluation and manual evaluation on the antinoise test set are better than the baseline models.

Key words: natural language generation, pre-training language models, BERT, Transformer model