北京大学学报自然科学版 ›› 2022, Vol. 58 ›› Issue (1): 29-36.DOI: 10.13209/j.0479-8023.2021.109

上一篇    下一篇

基于跨语种预训练语言模型XLM-R的神经机器翻译方法

王倩, 李茂西, 吴水秀, 王明文   

  1. 江西师范大学计算机信息工程学院, 南昌 330022
  • 收稿日期:2021-06-12 修回日期:2021-08-09 出版日期:2022-01-20 发布日期:2022-01-20
  • 通讯作者: 李茂西, E-mail: mosesli(at)jxnu.edu.cn
  • 基金资助:
    国家自然科学基金(61662031)资助

Neural Machine Translation Based on XLM-R Cross-lingual Pre-training Language Model

WANG Qian, LI Maoxi, WU Shuixiu, WANG Mingwen   

  1. 江西师范大学计算机信息工程学院, 南昌 330022
  • Received:2021-06-12 Revised:2021-08-09 Online:2022-01-20 Published:2022-01-20
  • Contact: LI Maoxi, E-mail: mosesli(at)jxnu.edu.cn

摘要:

探索将XLM-R跨语种预训练语言模型应用在神经机器翻译的源语言端、目标语言端和两端, 提高机器翻译的质量。提出3种网络模型, 分别在Transformer神经网络模型的编码器、解码器以及两端同时引入预训练的XLM-R多语种词语表示。在WMT英语-德语、IWSLT英语-葡萄牙语以及英语-越南语等翻译中的实验结果表明, 对双语平行语料资源丰富的翻译任务, 引入XLM-R可以很好地对源语言句子进行编码, 从而提高翻译质量; 对双语平行语料资源匮乏的翻译任务, 引入XLM-R不仅可以很好地对源语言句子进行编码, 还可以对源语言端和目标语言端的知识同时进行补充, 提高翻译质量。

关键词: 跨语种预训练语言模型, 神经机器翻译, Transformer网络模型, XLM-R模型, 微调

Abstract:

The authors explore the application of XLM-R cross-lingual pre-training language model into the source language, into the target language and into both of them to improve the quality of machine translation, and propose three neural network models, which integrate pre-trained XLM-R multilingual word representation into the Transformer encoder, into the Transformer decoder and into both of them respectively. The experimental results on WMT English-German, IWSLT English-Portuguese and English-Vietnamese machine translation benchmarks show that integrating XLM-R model into Transformer encoder can effectively encode the source sentences and improve the system performance for resource-rich translation task. For resource-poor translation task, integrating XLM-R model can not only encode the source sentences well, but also supplement the source language knowledge and target language knowledge at the same time, thus improve the translation quality.

Key words: cross-lingual pre-training language model, neural machine translation, Transformer neural network; XLM-R model, fine-tuning