北京大学学报自然科学版 ›› 2017, Vol. 53 ›› Issue (2): 295-304.DOI: 10.13209/j.0479-8023.2017.035
收稿日期:
2016-07-22
修回日期:
2016-09-30
出版日期:
2017-03-20
发布日期:
2017-03-20
通讯作者:
徐金安
基金资助:
Dandan WANG, Jin’an XU†(), Yufeng CHEN, Yujie ZHANG, Xiaohui YANG
Received:
2016-07-22
Revised:
2016-09-30
Online:
2017-03-20
Published:
2017-03-20
Contact:
Jin’an XU
摘要:
针对传统的基于实例的机器翻译(EBMT)方法中系统构筑复杂度和成本较高的问题, 提出一种基于依存树到串的汉英实例机器翻译方法。与传统方法相比, 该方法只需进行源语言端的句法结构分析, 可以大大降低构筑系统的复杂度, 有效降低成本。为了提高翻译精度, 引入中文分词、词性标注和依存句法分析联合模型, 可以减少汉英 EBMT 中源语言端基础任务中的错误传递, 提高提取层次间特征的准确性。在此基础上, 结合依存结构的特征和中英语料的特性, 对依存树到串模型进行规则抽取以及泛化处理。实验结果表明, 相对于基线系统, 该方法可以提高实例对抽取质量, 改善泛化规则和译文质量, 提高系统性能。
中图分类号:
王丹丹, 徐金安, 陈钰枫, 张玉洁, 杨晓晖. 融合词法句法分析联合模型的树到串EBMT方法[J]. 北京大学学报自然科学版, 2017, 53(2): 295-304.
Dandan WANG, Jin’an XU, Yufeng CHEN, Yujie ZHANG, Xiaohui YANG. A Tree-to-String EBMT Method by Integrating Joint Model of Chinese Segmentation and Dependency Parsing[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 295-304.
中文依存子树 | 英文词串 | 英文词串对应位置[p, q] |
---|---|---|
建筑(0) NN — 市场(1) NN | construction (0) market (1) | [ |
出口(3) NN — 数量(4) NN | export (3) volume (4) | [ |
表1 依存子树到串对应关系
Table 1 Projection of dependency subtree-to-string
中文依存子树 | 英文词串 | 英文词串对应位置[p, q] |
---|---|---|
建筑(0) NN — 市场(1) NN | construction (0) market (1) | [ |
出口(3) NN — 数量(4) NN | export (3) volume (4) | [ |
系统 | BLEU5 | NIST |
---|---|---|
KyotoEBMT | 24.31 | 5.6563 |
Stan-Tree-to-string | 23.96 | 5.5873 |
Tree-to-string (SMT) | 24.07 | 5.6497 |
本文方法 | 24.41 | 5.6580 |
表2 翻译系统对比实验
Table 2 Contrast experiments of machine translation systems
系统 | BLEU5 | NIST |
---|---|---|
KyotoEBMT | 24.31 | 5.6563 |
Stan-Tree-to-string | 23.96 | 5.5873 |
Tree-to-string (SMT) | 24.07 | 5.6497 |
本文方法 | 24.41 | 5.6580 |
项目 | 内容 |
---|---|
原文 | 建筑对外开放呈现新格局 |
参考译文 | The opening of construction industry to the outside present a new structure. |
KyotoEBMT | The opening up outside of construction industry to present a new structure. |
Stan-Tree-to-string | The opening of construction industry to the outside show a new pattern. |
Tree-to-string (SMT) | Opening of construction industry to the outside show a new pattern. |
本文方法 | The opening to the outside of construction industry present a new structure. |
表3 不同翻译系统的译文对比
Table 3 Translation results comparison of different translation systems
项目 | 内容 |
---|---|
原文 | 建筑对外开放呈现新格局 |
参考译文 | The opening of construction industry to the outside present a new structure. |
KyotoEBMT | The opening up outside of construction industry to present a new structure. |
Stan-Tree-to-string | The opening of construction industry to the outside show a new pattern. |
Tree-to-string (SMT) | Opening of construction industry to the outside show a new pattern. |
本文方法 | The opening to the outside of construction industry present a new structure. |
项目 | 内容 |
---|---|
原文 | 城建是外商投资新热点。 |
参考译文 | Urban construction is a new hot spot of foreign business to invest. |
KyotoEBMT | Urban construction is a new hot spot of foreign business investment. |
Stan-Tree-to-string | Urban construction is a new hot spot of foreign investment. |
Tree-to-string (SMT) | Urban construction is new hotspot of foreign investment. |
本文方法 | Urban construction is foreign business investment hotspot. |
表4 本文方法不理想译文
Table 4 Unsatisfactory translation of the proposed method
项目 | 内容 |
---|---|
原文 | 城建是外商投资新热点。 |
参考译文 | Urban construction is a new hot spot of foreign business to invest. |
KyotoEBMT | Urban construction is a new hot spot of foreign business investment. |
Stan-Tree-to-string | Urban construction is a new hot spot of foreign investment. |
Tree-to-string (SMT) | Urban construction is new hotspot of foreign investment. |
本文方法 | Urban construction is foreign business investment hotspot. |
[1] | Nagao M. A framework of a mechanical translation between Japanese and English by analogy principle. Artificial & Human Intelligence, 1984, 25(3): 351‒351 |
[2] | Dandapat S, Morrissey S, Way A, et al.Combining EBMT, SMT, TM and IR technologies for quality and scale // ESIRMT and HyTra. Avignon, 2012: 48-58 |
[3] | Xuan H W, Li W, Tang G Y. An advanced review of Hybrid Machine Translation (HMT). Procedia Engi-neering, 2012, 29: 3017‒3022 |
[4] | Somers H, McLean I, Jones D. Experiments in multilingual example based generation // Proceedings of the 3rd Conference on the Cognitive Science of Natural Language Processing (CSNLP 1994). Tokyo, 1994: 149‒164 |
[5] | Phillips A B, Brown R D. Cunei machine translation platform: system description // Proceedings of the 3rd Workshop on Example-Based Machine Translation. Santiago, 2009: 29‒36 |
[6] | Och F J, Tillmann C, Ney H, et al. Improved alignment models for statistical machine translation // Proceedings of EMNLP. Aachen, 1999: 20‒28 |
[7] | Sato S. MBT1: example-based word selection. Journal of Japanese Society for Artificial Intelligence, 1991, 6(4): 592‒592 |
[8] | Sato S. MBT2: a method for combining fragments of examples in example-based translation. Artificial Intelligence, 1995, 75(1): 31‒31 |
[9] | Nakazawa T, Kurohashi S. EBMT system of Kyoto team in PatentMT task at NTCIR-9 // Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (NTCIR-9). Tokyo, 2011: 657‒660 |
[10] | Vandeghinste V, Martens S. Top-down transfer in example-based MT // Proceedings of the 3rd Inter-national Workshop on Example-Based Machine Translation. Dublin, 2009: 69‒76 |
[11] | Al-Adhaileh M H, Kong T E, Yusoff Z. A synchro-nization structure of SSTC and its applications in machine translation // Proceedings of the 2002 COLING workshop on Machine translation in Asia — Volume 16. Taipei: Association for Computational Linguistics, 2002, 38(5): 1‒8 |
[12] | Liu Z, Wang H, Wu H. Example-based machine translation based on tree-string correspondence and statistical generation. Machine Translation, 2006, 20 (1): 25‒25 |
[13] | 郭振, 张玉洁, 苏晨, 等. 基于字符的中文分词、词性标注和依存句法分析联合模型. 中文信息学报, 2014, 28(6): 1‒1 |
[14] | Och F J, Ney H. A systematic comparison of various statistical alignment models. Computational Linguis-tics, 2003, 29(1): 19‒19 |
[15] | Och F J, Ney H. The Alignment template approach to statistical machine translation. Computational Linguis- tics, 2004, 30(4): 417‒417 |
[16] | Brown R D. Example-based machine translation in the pangloss system // Proceedings of the 16th conference on Computational linguistics — Volume 1. Pennsylvania: Association for Computational Linguis-tics, 1996: 169‒174 |
[17] | Sato S, Nagao M. Toward memory-based translation // Proceedings of the 13th conference on Computational linguistics-Volume 3. Helsinki: Association for Com-putational Linguistics, 1990: 247‒252 |
[18] | 刘群, 李素建. 基于《知网》的词汇语义相似度计算. 中文计算语言学, 2002, 7(2): 59‒59 |
[19] | 殷乐. EBMT中基于依存结构的翻译知识获取和翻译系统的实现[D]. 北京: 北京交通大学, 2014 |
[1] | 刘秋慧, 张坤丽, 许鸿飞, 俞士汶, 昝红英. 助词“的”用法自动识别研究[J]. 北京大学学报(自然科学版), 2018, 54(3): 466-474. |
[2] | 柯永红, 朱永福, 穗志方, 俞士汶. 基于多特征的语义角色标注一致性计算方法研究[J]. 北京大学学报(自然科学版), 2018, 54(3): 475-480. |
[3] | 杨萌, 李培峰, 朱巧明. 一种基于Tree-LSTM的句子相似度计算方法[J]. 北京大学学报(自然科学版), 2018, 54(3): 481-486. |
[4] | 张雨, 曾立, 邹磊. 大规模图数据的正则路径查询[J]. 北京大学学报(自然科学版), 2018, 54(2): 236-242. |
[5] | 魏星, 王玮, 陈静萍, 解焱陆, 张劲松. 基于发音特征的汉语发音偏误自动标注[J]. 北京大学学报(自然科学版), 2018, 54(2): 243-248. |
[6] | 林心宜, 严睿, 赵东岩. 融合词、句层级信息的抽取式摘要优化框架[J]. 北京大学学报(自然科学版), 2018, 54(2): 229-235. |
[7] | 周楠, 赵悦, 李要嫱, 徐晓娜, 才旺拉姆, 吴立成. 基于瓶颈特征的藏语拉萨话连续语音识别研究[J]. 北京大学学报(自然科学版), 2018, 54(2): 249-254. |
[8] | 谭亦鸣, 王明文, 李茂西. 基于翻译质量估计的神经网络译文自动后编辑[J]. 北京大学学报(自然科学版), 2018, 54(2): 255-261. |
[9] | 吴焕钦, 张红阳, 李静梅, 朱俊国, 杨沐昀, 李生. 基于伪数据的机器翻译质量估计模型的训练[J]. 北京大学学报(自然科学版), 2018, 54(2): 279-285. |
[10] | 吕书宁, 董志安. 利用URL-Key领域术语识别方法[J]. 北京大学学报(自然科学版), 2018, 54(2): 262-270. |
[11] | 王文超, 吕学强, 张凯, 周建设. 足球赛事战报的自动写作研究[J]. 北京大学学报(自然科学版), 2018, 54(2): 271-278. |
[12] | 应文豪, 肖欣延, 李素建, 吕雅娟, 穗志方. 一种利用语义相似度改进问答摘要的方法[J]. 北京大学学报自然科学版, 2017, 53(2): 197-203. |
[13] | 栗青生, 徐强, 肖建国, 刘泉, 张解放. 汉字动态生成的结构与风格模型[J]. 北京大学学报自然科学版, 2017, 53(2): 219-229. |
[14] | 陈玉敬, 吕学强, 周建设, 李宁. NBA赛事新闻的自动写作研究[J]. 北京大学学报自然科学版, 2017, 53(2): 211-218. |
[15] | 张丽林, 李茂西, 肖文艳, 万剑怡, 王明文. 机器翻译自动评价中领域知识复述抽取研究[J]. 北京大学学报自然科学版, 2017, 53(2): 230-238. |
阅读次数 | ||||||||||||||||||||||||||||||||||||||||||||||||||
全文 462
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
摘要 1114
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||