Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2017, Vol. 53 ›› Issue (2): 230-238.DOI: 10.13209/j.0479-8023.2017.030
• Orginal Article • Previous Articles Next Articles
Lilin ZHANG, Maoxi LI, Wenyan XIAO, Jianyi WAN, Mingwen WANG†()
Received:
2016-07-23
Revised:
2016-09-23
Online:
2017-03-20
Published:
2017-03-20
Contact:
Mingwen WANG
通讯作者:
王明文
基金资助:
CLC Number:
Lilin ZHANG, Maoxi LI, Wenyan XIAO, Jianyi WAN, Mingwen WANG. Improve Automatic Evaluation of Machine Translation Using Specific-Domain Paraphrase[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017, 53(2): 230-238.
张丽林, 李茂西, 肖文艳, 万剑怡, 王明文. 机器翻译自动评价中领域知识复述抽取研究[J]. 北京大学学报自然科学版, 2017, 53(2): 230-238.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xbna.pku.edu.cn/EN/10.13209/j.0479-8023.2017.030
语料 | en-cs | en-de | en-fr | en-hi | en-ru | cs-en | de-en | fr-en | hi-en | ru-en |
---|---|---|---|---|---|---|---|---|---|---|
W14-corpus | 647 | 1920 | 2007 | 1084 | 878 | 2218 | 2218 | 2218 | 2218 | 2218 |
Table 1 Statistics of the WMT’14 Corpus
语料 | en-cs | en-de | en-fr | en-hi | en-ru | cs-en | de-en | fr-en | hi-en | ru-en |
---|---|---|---|---|---|---|---|---|---|---|
W14-corpus | 647 | 1920 | 2007 | 1084 | 878 | 2218 | 2218 | 2218 | 2218 | 2218 |
语料 | en-cs | en-de | en-fr | en-fi | en-ru | cs-en | de-en | fr-en | fi-en | ru-en |
---|---|---|---|---|---|---|---|---|---|---|
W15-corpus | 1000 | 1920 | 2007 | 1926 | 1074 | 2218 | 2218 | 2218 | 2218 | 2218 |
Table 2 Statistics of the WMT’15 Corpus
语料 | en-cs | en-de | en-fr | en-fi | en-ru | cs-en | de-en | fr-en | fi-en | ru-en |
---|---|---|---|---|---|---|---|---|---|---|
W15-corpus | 1000 | 1920 | 2007 | 1926 | 1074 | 2218 | 2218 | 2218 | 2218 | 2218 |
评价级别 | 方法 | de-en | cs-en | fr-en | fi-en | ru-en | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.775 | 0.989 | 0.952 | 0.629 | 0.809 | 0.831 |
TER-Markov | 0.775 | 0.989 | 0.952 | 0.629 | 0.809 | 0.831 | |
TER-SD-Markov | 0.784 | 0.989 | 0.955 | 0.629 | 0.802 | 0.832 | |
METEOR | 0.885 | 0.952 | 0.971 | 0.515 | 0.789 | 0.822 | |
METEOR-Markov | 0.913 | 0.955 | 0.971 | 0.488 | 0.804 | 0.826 | |
METEOR-SD-Makov | 0.926 | 0.951 | 0.975 | 0.488 | 0.804 | 0.829 | |
句子级别 | TER | 0.270 | 0.218 | 0.384 | 0.326 | 0.270 | 0.294 |
TER-Markov | 0.270 | 0.218 | 0.383 | 0.326 | 0.270 | 0.294 | |
TER-SD-Makov | 0.295 | 0.233 | 0.392 | 0.342 | 0.281 | 0.308 | |
METEOR | 0.302 | 0.253 | 0.397 | 0.378 | 0.297 | 0.325 | |
METEOR-Markov | 0.325 | 0.272 | 0.399 | 0.400 | 0.313 | 0.342 | |
METEOR-SD-Makov | 0.333 | 0.285 | 0.406 | 0.417 | 0.330 | 0.354 |
Table 3 Correlation between automatic metrics METEOR and TER, which use specific-domain paraphrase or general domain paraphrase, and human judgments on into-English translation evaluation of WMT’14 Metrics task
评价级别 | 方法 | de-en | cs-en | fr-en | fi-en | ru-en | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.775 | 0.989 | 0.952 | 0.629 | 0.809 | 0.831 |
TER-Markov | 0.775 | 0.989 | 0.952 | 0.629 | 0.809 | 0.831 | |
TER-SD-Markov | 0.784 | 0.989 | 0.955 | 0.629 | 0.802 | 0.832 | |
METEOR | 0.885 | 0.952 | 0.971 | 0.515 | 0.789 | 0.822 | |
METEOR-Markov | 0.913 | 0.955 | 0.971 | 0.488 | 0.804 | 0.826 | |
METEOR-SD-Makov | 0.926 | 0.951 | 0.975 | 0.488 | 0.804 | 0.829 | |
句子级别 | TER | 0.270 | 0.218 | 0.384 | 0.326 | 0.270 | 0.294 |
TER-Markov | 0.270 | 0.218 | 0.383 | 0.326 | 0.270 | 0.294 | |
TER-SD-Makov | 0.295 | 0.233 | 0.392 | 0.342 | 0.281 | 0.308 | |
METEOR | 0.302 | 0.253 | 0.397 | 0.378 | 0.297 | 0.325 | |
METEOR-Markov | 0.325 | 0.272 | 0.399 | 0.400 | 0.313 | 0.342 | |
METEOR-SD-Makov | 0.333 | 0.285 | 0.406 | 0.417 | 0.330 | 0.354 |
评价级别 | 方法 | en-de | en-cs | en-fr | en-hi | en-ru | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.322 | 0.979 | 0.955 | 0.828 | 0.934 | 0.803 |
TER-Markov | 0.322 | 0.979 | 0.955 | 0.828 | 0.934 | 0.803 | |
TER-SD-Makov | 0.337 | 0.976 | 0.954 | 0.828 | 0.934 | 0.806 | |
METEOR | 0.240 | 0.979 | 0.939 | 0.924 | 0.932 | 0.803 | |
METEOR-Markov | 0.226 | 0.979 | 0.939 | 0.924 | 0.913 | 0.796 | |
METEOR-SD- Makov | 0.263 | 0.976 | 0.940 | 0.924 | 0.923 | 0.805 | |
句子级别 | TER | 0.210 | 0.292 | 0.261 | 0.183 | 0.392 | 0.268 |
TER-Markov | 0.210 | 0.292 | 0.261 | 0.183 | 0.392 | 0.268 | |
TER-SD-Makov | 0.217 | 0.292 | 0.270 | 0.183 | 0.392 | 0.271 | |
METEOR | 0.212 | 0.310 | 0.274 | 0.303 | 0.407 | 0.301 | |
METEOR-Markov | 0.222 | 0.310 | 0.275 | 0.303 | 0.422 | 0.306 | |
METEOR-SD- Makov | 0.238 | 0.318 | 0.277 | 0.303 | 0.427 | 0.313* |
Table 4 Correlation between automatic metrics METEOR and TER, which use specific-domain paraphrase or general domain paraphrase, and human judgments on out-of-English translation evaluation of WMT’14 Metrics task
评价级别 | 方法 | en-de | en-cs | en-fr | en-hi | en-ru | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.322 | 0.979 | 0.955 | 0.828 | 0.934 | 0.803 |
TER-Markov | 0.322 | 0.979 | 0.955 | 0.828 | 0.934 | 0.803 | |
TER-SD-Makov | 0.337 | 0.976 | 0.954 | 0.828 | 0.934 | 0.806 | |
METEOR | 0.240 | 0.979 | 0.939 | 0.924 | 0.932 | 0.803 | |
METEOR-Markov | 0.226 | 0.979 | 0.939 | 0.924 | 0.913 | 0.796 | |
METEOR-SD- Makov | 0.263 | 0.976 | 0.940 | 0.924 | 0.923 | 0.805 | |
句子级别 | TER | 0.210 | 0.292 | 0.261 | 0.183 | 0.392 | 0.268 |
TER-Markov | 0.210 | 0.292 | 0.261 | 0.183 | 0.392 | 0.268 | |
TER-SD-Makov | 0.217 | 0.292 | 0.270 | 0.183 | 0.392 | 0.271 | |
METEOR | 0.212 | 0.310 | 0.274 | 0.303 | 0.407 | 0.301 | |
METEOR-Markov | 0.222 | 0.310 | 0.275 | 0.303 | 0.422 | 0.306 | |
METEOR-SD- Makov | 0.238 | 0.318 | 0.277 | 0.303 | 0.427 | 0.313* |
评级级别 | 方法 | de-en | cs-en | fr-en | fi-en | ru-en | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.890 | 0.914 | 0.980 | 0.878 | 0.910 | 0.914 |
TER-Markov | 0.888 | 0.926 | 0.977 | 0.885 | 0.912 | 0.918 | |
TER-SD- Makov | 0.907 | 0.914 | 0.977 | 0.865 | 0.932 | 0.919 | |
METEOR | 0.726 | 0.973 | 0.979 | 0.929 | 0.959 | 0.953 | |
METEOR-Markov | 0.950 | 0.974 | 0.978 | 0.929 | 0.965 | 0.959 | |
METEOR-SD-Makov | 0.959 | 0.974 | 0.979 | 0.939 | 0.963 | 0.963 | |
句子级别 | TER | 0.362 | 0.391 | 0.359 | 0.278 | 0.330 | 0.348 |
TER-Markov | 0.358 | 0.394 | 0.357 | 0.297 | 0.333 | 0.348 | |
TER-SD-Makov | 0.375 | 0.391 | 0.352 | 0.315 | 0.340 | 0.355 | |
METEOR | 0.389 | 0.406 | 0.375 | 0.385 | 0.358 | 0.378 | |
METEOR-Markov | 0.421 | 0.429 | 0.386 | 0.393 | 0.367 | 0.400 | |
METEOR-SD-Makov | 0.431 | 0.434 | 0.376 | 0.404 | 0.383 | 0.406 |
Table 5 Correlation between automatic metrics METEOR and TER, which use specific-domain paraphrase or general domain paraphrase, and human judgments on into-English translation evaluation of WMT’15 Metrics task
评级级别 | 方法 | de-en | cs-en | fr-en | fi-en | ru-en | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.890 | 0.914 | 0.980 | 0.878 | 0.910 | 0.914 |
TER-Markov | 0.888 | 0.926 | 0.977 | 0.885 | 0.912 | 0.918 | |
TER-SD- Makov | 0.907 | 0.914 | 0.977 | 0.865 | 0.932 | 0.919 | |
METEOR | 0.726 | 0.973 | 0.979 | 0.929 | 0.959 | 0.953 | |
METEOR-Markov | 0.950 | 0.974 | 0.978 | 0.929 | 0.965 | 0.959 | |
METEOR-SD-Makov | 0.959 | 0.974 | 0.979 | 0.939 | 0.963 | 0.963 | |
句子级别 | TER | 0.362 | 0.391 | 0.359 | 0.278 | 0.330 | 0.348 |
TER-Markov | 0.358 | 0.394 | 0.357 | 0.297 | 0.333 | 0.348 | |
TER-SD-Makov | 0.375 | 0.391 | 0.352 | 0.315 | 0.340 | 0.355 | |
METEOR | 0.389 | 0.406 | 0.375 | 0.385 | 0.358 | 0.378 | |
METEOR-Markov | 0.421 | 0.429 | 0.386 | 0.393 | 0.367 | 0.400 | |
METEOR-SD-Makov | 0.431 | 0.434 | 0.376 | 0.404 | 0.383 | 0.406 |
评价级别 | 方法 | en-de | en-cs | en-fr | en-fi | en-ru | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.557 | 0.918 | 0.946 | 0.617 | 0.890 | 0.786 |
TER-Markov | 0.557 | 0.916 | 0.946 | 0.616 | 0.890 | 0.785 | |
TER-SD-Makov | 0.584 | 0.909 | 0.944 | 0.617 | 0.890 | 0.789 | |
METEOR | 0.680 | 0.957 | 0.951 | 0.713 | 0.864 | 0.833 | |
METEOR-Markov | 0.705 | 0.954 | 0.949 | 0.712 | 0.845 | 0.833 | |
METEOR-SD-Makov | 0.735 | 0.938 | 0.955 | 0.714 | 0.851 | 0.839 | |
句子级别 | TER | 0.289 | 0.358 | 0.326 | 0.215 | 0.357 | 0.309 |
TER-Markov | 0.289 | 0.358 | 0.326 | 0.216 | 0.357 | 0.309 | |
TER-SD- Makov | 0.301 | 0.354 | 0.330 | 0.215 | 0.357 | 0.311 | |
METEOR | 0.319 | 0.389 | 0.335 | 0.251 | 0.373 | 0.333 | |
METEOR-Markov | 0.332 | 0.389 | 0.339 | 0.251 | 0.381 | 0.338 | |
METEOR-SD-Makov | 0.342 | 0.385 | 0.341 | 0.251 | 0.381 | 0.340 |
Table 6 Correlation between automatic metrics METEOR and TER, which use specific-domain paraphrase or general domain paraphrase, and human judgments on out-of-English translation evaluation of WMT’15 Metrics task
评价级别 | 方法 | en-de | en-cs | en-fr | en-fi | en-ru | Average |
---|---|---|---|---|---|---|---|
系统级别 | TER | 0.557 | 0.918 | 0.946 | 0.617 | 0.890 | 0.786 |
TER-Markov | 0.557 | 0.916 | 0.946 | 0.616 | 0.890 | 0.785 | |
TER-SD-Makov | 0.584 | 0.909 | 0.944 | 0.617 | 0.890 | 0.789 | |
METEOR | 0.680 | 0.957 | 0.951 | 0.713 | 0.864 | 0.833 | |
METEOR-Markov | 0.705 | 0.954 | 0.949 | 0.712 | 0.845 | 0.833 | |
METEOR-SD-Makov | 0.735 | 0.938 | 0.955 | 0.714 | 0.851 | 0.839 | |
句子级别 | TER | 0.289 | 0.358 | 0.326 | 0.215 | 0.357 | 0.309 |
TER-Markov | 0.289 | 0.358 | 0.326 | 0.216 | 0.357 | 0.309 | |
TER-SD- Makov | 0.301 | 0.354 | 0.330 | 0.215 | 0.357 | 0.311 | |
METEOR | 0.319 | 0.389 | 0.335 | 0.251 | 0.373 | 0.333 | |
METEOR-Markov | 0.332 | 0.389 | 0.339 | 0.251 | 0.381 | 0.338 | |
METEOR-SD-Makov | 0.342 | 0.385 | 0.341 | 0.251 | 0.381 | 0.340 |
方法 | 准确率/% | 召回率/% | F1 |
---|---|---|---|
METEOR-Markov | 55 | 65 | 0.63 |
METEOR-SD-Makov | 63 | 82 | 0.80 |
Table 7 Precision, recall, and F1-measure for the paraphrase matching of the top 300 translations of the Illiois. 4083 translation system
方法 | 准确率/% | 召回率/% | F1 |
---|---|---|---|
METEOR-Markov | 55 | 65 | 0.63 |
METEOR-SD-Makov | 63 | 82 | 0.80 |
[1] | Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine trans-lation // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Phila-delphia, 2002: 311-318 |
[2] | Doddington G.Automatic evaluation of machine translation quality using n-gram co-occurrence statistics // Proceedings of the second international conference on Human Language Technology Re-search (HLT’02). San Diego, 2002: 138-145 |
[3] | Banerjee S, Lavie A.METEOR: an automatic metric for MT evaluation with improved correlation with human judgments // Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summa-rization. Ann Arbor, 2005: 65-72 |
[4] | Snover M, Dorr B, Schwartz R, et al.A study of translation edit rate with targeted human annotation // Proceedings of Association for Machine Trans-lation in the Americas. Cambridge, 2006: 223-231 |
[5] | 李茂西, 江爱文, 王明文. 基于ListMLE排序学习方法的机器译文自动评价研究. 中文信息学报, 2013, 27(4): 22-29 |
[6] | Li M, Wang M, Li H, et al.Modeling monolingual character alignment for automatic evaluation of Chinese translation. ACM Transactions on Asian and Low—Resource Language Information Processing, 2016, 15(3): 1-16 |
[7] | Denkowski M, Lavie A.Meteor universal: language specific translation evaluation for any target language // Proceedings of the Ninth Workshop on Statistical Machine Translation (WMT). Baltimore, 2014: 376-380 |
[8] | Snover M, Madnani N, Dorr B, et al.TER-Plus: paraphrase, semantic, and alignment enhancements to translation edit rate. Machine Translation, 2009, 23(2): 117-127 |
[9] | 翁贞, 李茂西, 王明文. 利用Markov 网络抽取复述增强机器译文自动评价方法. 中文信息学报, 2015, 29(5): 136-142 |
[10] | Moore R C, Lewis W.Intelligent selection of lan-guage model training data // Proceedings of the ACL 2010 Conference. Uppsala, 2010: 220-224 |
[11] | Axelrod A, He X, Gao J.Domain adaptation via pseudo in-domain data selection // Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, 2011: 355-362 |
[12] | 赵世奇, 刘挺, 李生. 复述技术研究. 软件学报, 2009, 20(8): 2124-2137 |
[13] | 李莉, 刘知远, 孙茂松. 基于中英平行专利语料的短语复述自动抽取研究. 中文信息学报, 2013, 27(6): 151-157 |
[14] | 胡金铭, 史晓东, 苏劲松, 等. 引入复述技术的统计机器翻译研究综述. 智能系统学报, 2013, 8(3): 199-207 |
[15] | 苏晨, 张玉洁, 郭振, 等. 使用源语言复述知识改善统计机器翻译性能. 北京大学学报: 自然科学版, 2015, 51(2): 342-348 |
[16] | Barzilay R, McKeown K R. Extracting paraphrases from a parallel corpus // Proceedings of 39th Annual Meeting of the Association for Computational Ling-uistics. Toulouse, 2001: 50-57 |
[17] | Bannard C, Callison-Burch C.Paraphrasing with Bilingual Parallel Corpora // Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, 2005: 597-604 |
[18] | Shinyama Y, Sekine S, Sudo K.Automatic para-phrase acquisition from news articles // Proceedings of the second international conference on Human Language Technology Research. 2002: 313-318 |
[19] | Barzilay R, Lee L.Learning to paraphrase: an unsu-pervised approach using multiple-sequence alignment// Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003: 16-23 |
[20] | Pavlick E, Ganitkevitch J, Chan T P, et al.Domain-specific paraphrase extraction // Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, 2015: 57-62 |
[21] | 洪欢, 王明文, 万剑怡, 等. 基于迭代方法的多层Markov 网络信息检索模型. 中文信息学报, 2013, 27(5): 122-128 |
[22] | Bojar O, Buck C, Federmann C, et al.Findings of the 2014 workshop on statistical machine translation // Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore, 2014: 12-58 |
[23] | Bojar O, Chatterjee R, Federmann C, et al.Findings of the 2015 workshop on statistical machine trans-lation // Proceedings of the Tenth Workshop on Statistical Machine Translation. Lisbon, 2015: 1-46 |
[24] | Zhang L, Weng Z, Xiao W, et al.Extract domain-specific paraphrase from monolingual corpus for automatic evaluation of machine translation // Pro-ceedings of the First Conference on Machine Translation. Berlin, 2016: 511-517 |
[1] | XU Yinxin, YANG Zongbao, LIN Yuchen, HU Jinlong, DONG Shoubin. Interpretable Biomedical Reasoning via Deep Fusion of Knowledge Graph and Pre-trained Language Models [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(1): 62-70. |
[2] | LI Ruifan, WEI Zhiyu, FAN Yuantao, YE Shuqin, ZHANG Guangwei. Enhanced Prompt Learning for Few-shot Text Classification Method [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(1): 1-12. |
[3] | JIANG Yanting. English Books Automatic Classification According to CLC [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2023, 59(1): 11-20. |
[4] | MA Chao, WAN Zhang, ZHANG Yujie, XU Jin’an, CHEN Yufeng. Multi-modality Paraphrase Generation Model Integrating Image Information [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(1): 45-53. |
[5] | WANG Qian, LI Maoxi, WU Shuixiu, WANG Mingwen. Neural Machine Translation Based on XLM-R Cross-lingual Pre-training Language Model [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58(1): 29-36. |
[6] | YANG Erguang, LIU Mingtong, ZHANG Yujie, MENG Yao, HU Changjian, XU Jin’an, CHEN Yufeng. Unsupervised Syntactically Controllable Paraphrase Network for Adversarial Example Generation [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(1): 83-90. |
[7] | WANG Yasong, LIU Mingtong, ZHANG Yujie, XU Jin’an, CHEN Yufeng. Research on the Construction and Application of Paraphrase Parallel Corpus [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(1): 68-74. |
[8] | ZHU Qinpei, MIAO Qingliang. An Antinoise Response Generation for Open Domain Dialogue System [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(1): 38-44. |
[9] | YANG Shuangtao, FU Bo, YU Chenchen, HU Changjian. Multi-Turn Conversation Rewriter Model Based on Masked-Pointer [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(1): 31-37. |
[10] | LIU Mingtong, ZHANG Yujie, ZHANG Shu, MENG Yao, XU Jin’an, CHEN Yufeng. A Multi-Mechanism Fused Paraphrase Generation Model with Joint Auto-Encoding Learning [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(1): 53-60. |
[11] | LIU Mingtong, ZHANG Yujie, XU Jin’an, CHEN Yufeng. A Neural Paraphrase Identification Model Based on Syntactic Structure [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56(1): 45-52. |
[12] | SU Chen,ZHANG Yujie,GUO Zhen,XU Jin’an. Improved Statistical Machine Translation with Source Language Paraphrase [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 342-348. |
[13] | YU Wei,WANG Mingwen,WAN Jianyi,ZUO Jiali. Positional Language Models with Semantic Information [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(2): 203-212. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||