北京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (1): 113-119.DOI: 10.13209/j.0479-8023.2016.007

上一篇    下一篇

统计机器翻译中实例短语对研究

李强1, 李沐2, 张冬冬2, 朱靖波1#br#   

  1. 1. 东北大学自然语言处理实验室, 沈阳 110819
    2. 微软亚洲研究院, 北京 100080
  • 收稿日期:2015-05-28 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 李强, E-mail: liqiangneu(at)gmail.com
  • 基金资助:
    国家自然科学基金(61272376, 61300097)、东北大学基本科研业务费研究生科研创新项目(N140406003)和国家留学基金资助

Research on Example-Based Phrase Pairs in Statistical Machine Translation

LI Qiang1, LI Mu2, ZHANG Dongdong2, ZHU Jingbo1#br# #br#   

  1. 1. NLP Lab, Northeastern University, Shenyang 110819
    2. Microsoft Research Asia, Beijing 100080
  • Received:2015-05-28 Online:2016-01-20 Published:2016-01-20
  • Contact: LI Qiang, E-mail: liqiangneu(at)gmail.com

摘要:

 针对由于数据的稀疏性和双语数据规模的局限性造成的大量高质量短语对没有生成的问题, 在基于短语的统计机器翻译系统中, 通过对传统短语抽取算法抽取的短语对进行分解、替换、生成等操作, 生成传统方法无法抽取的实例短语对。在汉英新闻和汉英口语翻译任务上, 与基线系统相比, 该方法在多个测试集上明显提高了翻译系统的翻译质量, 在部分测试集上BLEU 值可提高1%左右。

关键词: 统计机器翻译, 基于短语, 基于实例, 短语对

Abstract:

Abstract Due to the sparsity of data and the limitation of bilingual data size, many high-quality phrase pairs can’t be generated. The example-based phrase pairs proposed by the authors are generated through decomposing, substituting and generating the typical phrase pairs, and the typical phrase pairs are generated by the typical phrase extraction method in phrase-based statistical machine translation. On the Chinese-to-English Newswire and Oral translation tasks, the experimental results demonstrate significant improvements achieved by the proposed methods. Moreover, a gain of about 1% BLEU score increase is yielded on some test sets.

Key words: statistical machine translation, phrase-based, example-based, phrase pair

中图分类号: