北京大学学报自然科学版 ›› 2021, Vol. 57 ›› Issue (1): 83-90.DOI: 10.13209/j.0479-8023.2020.079

上一篇    下一篇

无监督的句法可控复述模型用于对抗样本生成

杨二光1, 刘明童1, 张玉洁1,†, 孟遥2, 胡长建2, 徐金安1, 陈钰枫1   

  1. 1. 北京交通大学计算机与信息技术学院, 北京 100044 2. 联想研究院人工智能实验室, 北京 100085
  • 收稿日期:2020-06-09 修回日期:2020-08-15 出版日期:2021-01-20 发布日期:2021-01-20
  • 通讯作者: 张玉洁, E-mail: yjzhang(at)bjtu.edu.cn
  • 基金资助:
    国家自然科学基金(61876198, 61976015, 61976016)资助

Unsupervised Syntactically Controllable Paraphrase Network for Adversarial Example Generation

YANG Erguang1, LIU Mingtong1, ZHANG Yujie1,†, MENG Yao2, HU Changjian2, XU Jin’an1, CHEN Yufeng1#br#   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044 2. Lenovo Research, AI Laboratory, Beijing 100085
  • Received:2020-06-09 Revised:2020-08-15 Online:2021-01-20 Published:2021-01-20
  • Contact: ZHANG Yujie, E-mail: yjzhang(at)bjtu.edu.cn

摘要:

针对使用句法可控的复述生成模型生成对抗样本时模型性能受限于复述平行语料的领域和规模的问题, 提出仅需要单语语料训练的无监督的句法可控复述生成模型, 用以生成对抗样本。采用变分自编码方式学习模型, 首先将句子和句法树分别映射为语义变量和句法变量, 然后基于语义变量和句法变量重构原始句子。在重构过程中, 模型可以在不使用任何平行语料的情况下学习生成句法变化的复述。在无监督复述生成和对抗样本生成任务中的实验结果表明, 所提方法在无监督复述生成任务中取得最佳性能, 在对抗样本生成任务中可以生成有效的对抗样本, 用以改进神经自然语言处理(NLP)模型的鲁棒性和泛化能力。

关键词: 无监督学习, 句法可控复述生成模型, 对抗样本

Abstract:

Prior work on adversarial example generation with syntactically controlled paraphrase networks requires large-scale paraphrase parallel corpora to train models. The performance of the model is seriously limited by the domain and scale of paraphrase parallel corpus. To solve this problem, this paper proposes an unsuprervised syntactically controlled paraphrase model to generate adversarial examples which only needs monolingual data. Specifically, variational autoencoder is used to learn model, which maps a sentence and a syntactic parse tree into semantic and syntactic variables, respectively. By learning to reconstruct the input sentence from syntactic and semantic variables, the model effectively learns to generate syntactic paraphrases without using any parallel data. Experiment results on unsupervised sentence paraphrasing and adversarial example generation demonstrate that the proposed model achieves new state-of-the-art results on unsupervised paraphrase generation and generate effective adversarial examples. These examples can be used to improve the robustness and generalization of NLP (natural language processing) model.

Key words: unsupervised learning, syntactically controllable paraphrase network, adversarial example