A Tree-to-String EBMT Method by Integrating Joint Model of Chinese Segmentation and Dependency Parsing

doi:10.13209/j.0479-8023.2017.035

Abstract

Abstract:

In consideration of the complexity and high cost of system construction in traditional examplebased machine translation (EBMT) methods, the authors propose a Chinese-English tree-to-string EBMT method. Compared with the traditional methods, the preposed approach just needed to implement the processing of source language parsing. Word segmentation, POS tagging and dependency parsing were jointed to relieve the affections of error propagation and failure of feature extraction at different levels. Moreover, the authors extracted and generalized bilingual word and phase alignments from examples and templates by using the dependency structure of source language. Experimental results show that the preposed method can achieve better performance significantly than baseline systems.

Key words: example-based machine translation, dependency tree-to-string model, joint model, generalization template

摘要：

针对传统的基于实例的机器翻译(EBMT)方法中系统构筑复杂度和成本较高的问题, 提出一种基于依存树到串的汉英实例机器翻译方法。与传统方法相比, 该方法只需进行源语言端的句法结构分析, 可以大大降低构筑系统的复杂度, 有效降低成本。为了提高翻译精度, 引入中文分词、词性标注和依存句法分析联合模型, 可以减少汉英 EBMT 中源语言端基础任务中的错误传递, 提高提取层次间特征的准确性。在此基础上, 结合依存结构的特征和中英语料的特性, 对依存树到串模型进行规则抽取以及泛化处理。实验结果表明, 相对于基线系统, 该方法可以提高实例对抽取质量, 改善泛化规则和译文质量, 提高系统性能。

关键词: 基于实例的机器翻译, 依存树到串模型, 联合模型, 泛化模板

CLC Number:

TP391

Dandan WANG, Jin’an XU, Yufeng CHEN, Yujie ZHANG, Xiaohui YANG. A Tree-to-String EBMT Method by Integrating Joint Model of Chinese Segmentation and Dependency Parsing[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 295-304.

王丹丹, 徐金安, 陈钰枫, 张玉洁, 杨晓晖. 融合词法句法分析联合模型的树到串EBMT方法[J]. 北京大学学报自然科学版, 2017, 53(2): 295-304.

Add to citation manager EndNote|Ris|BibTeX

URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2017.035

https://xbna.pku.edu.cn/EN/Y2017/V53/I2/295

Figures/Tables 10

References 19

[1]	Nagao M. A framework of a mechanical translation between Japanese and English by analogy principle. Artificial & Human Intelligence, 1984, 25(3): 351‒351
[2]	Dandapat S, Morrissey S, Way A, et al.Combining EBMT, SMT, TM and IR technologies for quality and scale // ESIRMT and HyTra. Avignon, 2012: 48-58
[3]	Xuan H W, Li W, Tang G Y. An advanced review of Hybrid Machine Translation (HMT). Procedia Engi-neering, 2012, 29: 3017‒3022
[4]	Somers H, McLean I, Jones D. Experiments in multilingual example based generation // Proceedings of the 3rd Conference on the Cognitive Science of Natural Language Processing (CSNLP 1994). Tokyo, 1994: 149‒164
[5]	Phillips A B, Brown R D. Cunei machine translation platform: system description // Proceedings of the 3rd Workshop on Example-Based Machine Translation. Santiago, 2009: 29‒36
[6]	Och F J, Tillmann C, Ney H, et al. Improved alignment models for statistical machine translation // Proceedings of EMNLP. Aachen, 1999: 20‒28
[7]	Sato S. MBT1: example-based word selection. Journal of Japanese Society for Artificial Intelligence, 1991, 6(4): 592‒592
[8]	Sato S. MBT2: a method for combining fragments of examples in example-based translation. Artificial Intelligence, 1995, 75(1): 31‒31
[9]	Nakazawa T, Kurohashi S. EBMT system of Kyoto team in PatentMT task at NTCIR-9 // Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (NTCIR-9). Tokyo, 2011: 657‒660
[10]	Vandeghinste V, Martens S. Top-down transfer in example-based MT // Proceedings of the 3rd Inter-national Workshop on Example-Based Machine Translation. Dublin, 2009: 69‒76
[11]	Al-Adhaileh M H, Kong T E, Yusoff Z. A synchro-nization structure of SSTC and its applications in machine translation // Proceedings of the 2002 COLING workshop on Machine translation in Asia — Volume 16. Taipei: Association for Computational Linguistics, 2002, 38(5): 1‒8
[12]	Liu Z, Wang H, Wu H. Example-based machine translation based on tree-string correspondence and statistical generation. Machine Translation, 2006, 20 (1): 25‒25
[13]	郭振, 张玉洁, 苏晨, 等. 基于字符的中文分词、词性标注和依存句法分析联合模型. 中文信息学报, 2014, 28(6): 1‒1
[14]	Och F J, Ney H. A systematic comparison of various statistical alignment models. Computational Linguis-tics, 2003, 29(1): 19‒19
[15]	Och F J, Ney H. The Alignment template approach to statistical machine translation. Computational Linguis- tics, 2004, 30(4): 417‒417
[16]	Brown R D. Example-based machine translation in the pangloss system // Proceedings of the 16th conference on Computational linguistics — Volume 1. Pennsylvania: Association for Computational Linguis-tics, 1996: 169‒174
[17]	Sato S, Nagao M. Toward memory-based translation // Proceedings of the 13th conference on Computational linguistics-Volume 3. Helsinki: Association for Com-putational Linguistics, 1990: 247‒252
[18]	刘群, 李素建. 基于《知网》的词汇语义相似度计算. 中文计算语言学, 2002, 7(2): 59‒59
[19]	殷乐. EBMT中基于依存结构的翻译知识获取和翻译系统的实现[D]. 北京: 北京交通大学, 2014

中文依存子树	英文词串	英文词串对应位置[p, q]
建筑(0) NN — 市场(1) NN	construction (0) market (1)	[0, 1]
出口(3) NN — 数量(4) NN	export (3) volume (4)	[3, 4]

中文依存子树	英文词串	英文词串对应位置[p, q]
建筑(0) NN — 市场(1) NN	construction (0) market (1)	[0, 1]
出口(3) NN — 数量(4) NN	export (3) volume (4)	[3, 4]

系统	BLEU5	NIST
KyotoEBMT	24.31	5.6563
Stan-Tree-to-string	23.96	5.5873
Tree-to-string (SMT)	24.07	5.6497
本文方法	24.41	5.6580

系统	BLEU5	NIST
KyotoEBMT	24.31	5.6563
Stan-Tree-to-string	23.96	5.5873
Tree-to-string (SMT)	24.07	5.6497
本文方法	24.41	5.6580

项目	内容
原文	建筑对外开放呈现新格局
参考译文	The opening of construction industry to the outside present a new structure.
KyotoEBMT	The opening up outside of construction industry to present a new structure.
Stan-Tree-to-string	The opening of construction industry to the outside show a new pattern.
Tree-to-string (SMT)	Opening of construction industry to the outside show a new pattern.
本文方法	The opening to the outside of construction industry present a new structure.