Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2017, Vol. 53 ›› Issue (2): 295-304.DOI: 10.13209/j.0479-8023.2017.035

• Orginal Article • Previous Articles     Next Articles

A Tree-to-String EBMT Method by Integrating Joint Model of Chinese Segmentation and Dependency Parsing

Dandan WANG, Jin’an XU(), Yufeng CHEN, Yujie ZHANG, Xiaohui YANG   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044
  • Received:2016-07-22 Revised:2016-09-30 Online:2016-11-28 Published:2017-03-20
  • Contact: Jin’an XU

融合词法句法分析联合模型的树到串EBMT方法

王丹丹, 徐金安(), 陈钰枫, 张玉洁, 杨晓晖   

  1. 北京交通大学计算机与信息技术学院, 北京 100044
  • 通讯作者: 徐金安
  • 基金资助:
    国家自然科学基金(61370130, 61473294)、中央高校基本科研业务费专项资金(2014RC040)和科学技术部国际科技合作计划(K11F100010)资助

Abstract:

In consideration of the complexity and high cost of system construction in traditional examplebased machine translation (EBMT) methods, the authors propose a Chinese-English tree-to-string EBMT method. Compared with the traditional methods, the preposed approach just needed to implement the processing of source language parsing. Word segmentation, POS tagging and dependency parsing were jointed to relieve the affections of error propagation and failure of feature extraction at different levels. Moreover, the authors extracted and generalized bilingual word and phase alignments from examples and templates by using the dependency structure of source language. Experimental results show that the preposed method can achieve better performance significantly than baseline systems.

Key words: example-based machine translation, dependency tree-to-string model, joint model, generalization template

摘要:

针对传统的基于实例的机器翻译(EBMT)方法中系统构筑复杂度和成本较高的问题, 提出一种基于依存树到串的汉英实例机器翻译方法。与传统方法相比, 该方法只需进行源语言端的句法结构分析, 可以大大降低构筑系统的复杂度, 有效降低成本。为了提高翻译精度, 引入中文分词、词性标注和依存句法分析联合模型, 可以减少汉英 EBMT 中源语言端基础任务中的错误传递, 提高提取层次间特征的准确性。在此基础上, 结合依存结构的特征和中英语料的特性, 对依存树到串模型进行规则抽取以及泛化处理。实验结果表明, 相对于基线系统, 该方法可以提高实例对抽取质量, 改善泛化规则和译文质量, 提高系统性能。

关键词: 基于实例的机器翻译, 依存树到串模型, 联合模型, 泛化模板

CLC Number: