Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2017, Vol. 53 ›› Issue (2): 305-313.DOI: 10.13209/j.0479-8023.2017.036

• Orginal Article • Previous Articles     Next Articles

Integrating Voice Features into Japanese-English Hierarchical Phrase Based Model

Nan WANG, Jin’an XU(), Fang MING, Yufeng CHEN, Yujie ZHANG   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044
  • Received:2016-07-29 Revised:2016-10-07 Online:2016-11-28 Published:2017-03-20
  • Contact: Jin’an XU


王楠, 徐金安(), 明芳, 陈钰枫, 张玉洁   

  1. 北京交通大学计算机与信息技术学院, 北京 100044
  • 通讯作者: 徐金安
  • 基金资助:
    国家自然科学基金(61370130, 61473294)、中央高校基本科研业务费专项资金(2014RC040)和科学技术部国际科技合作计划(K11F100010)资助


The voice of each language usually keeps different syntactic structure. In machine translation, it causes relatively low translation quality. To resolve this problem, an approach is proposed by integrating voice features into hierarchical phrase based (HPB) models. In the proposed method, corpus is firstly classified into three categories from Japanese side: passive voice, potential voice and others. Secondly, passive and potential sentences are classified into several groups according to the characteristics of English to build maximum entropy models for rules. Finally, bilingual voice features are integrated into log linear model for improving translation results and the accuracy of rule selection during the translation of passive and potential sentences. In Japanese to English translation task, large scale experiment shows that the proposed method can not only improve the problem of long distance reordering but also improve translation quality of both passive and potential voice test sets.

Key words: passive voice, potential voice, statistical machine translation, maximum entropy models


针对不同语种的被动和可能语态的句法结构差异影响机器翻译质量的问题, 提出融合语态特征的最大熵翻译模型。首先从日语端分出被动语态、可能语态和其他语态, 然后从英语端对被动和可能语态进一步分类, 抽取双语特征训练最大熵规则分类模型, 将语态特征融合到对数线性模型中以改善翻译模型。提高解码器在翻译被动语态和可能语态时规则选择的准确性。实验结果表明, 该方法可以有效地改善日英统计机器翻译的句法结构调序和词汇翻译, 提升被动语态和可能语态句子的翻译质量。

关键词: 被动语态, 可能语态, 统计机器翻译, 最大熵模型

CLC Number: