Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2016, Vol. 52 ›› Issue (1): 148-154.DOI: 10.13209/j.0479-8023.2016.006

Previous Articles     Next Articles

Chinese-Slavic Mongolian Named Entity Translation Based on Word Alignment

YANG Ping1,2, HOU Hongxu1, JIANG Yupeng1, SHEN Zhipeng1, DU Jian1   

  1. 1. College of Computer Science, Inner Mongolia University, Hohhot 010021
    2. Department of Computing, Linfen Vocational and Technical College, Linfen 041000
  • Received:2015-06-07 Online:2016-01-20 Published:2016-01-20
  • Contact: HOU Hongxu, E-mail: cshhx(at)imu.edu.cn

基于双语对齐的汉语–新蒙古文命名实体翻译

杨萍1,2, 侯宏旭1, 蒋玉鹏1, 申志鹏1, 杜健1   

  1. 1. 内蒙古大学计算机学院, 呼和浩特 010021
    2. 临汾职业技术学院计算机系, 临汾 041000
  • 通讯作者: 侯宏旭, E-mail: cshhx(at)imu.edu.cn
  • 基金资助:
    国家自然科学基金(61362028)资助

Abstract:

Chinese to Slavic Mongolian Named Entity Translation in cross Chinese and Slavic Mongolian information processing has a very important significance. However, using the machine translation method directly cannot achieve satisfactory result. In order to solve the above problem, a novel approach was proposed to extract Chinese-Slavic Mongolian Named Entity pairs automatically. Only the Chinese named entities need to be identified, then extracting all of the candidate named entity pairs using sliding window method based on HMM word alignment result. Finally filtering all of the candidate named entity translation units based on Max Entropy Model integrated with five features, and choose the most probable aligned Slavic Mongolian NEs to the Chinese NEs. Experimental results show that this approach outperforms HMM model, achieves high quality of Chinese-Slavic Mongolian named entity pairs with relatively high precision, even though sometimes the word alignment result is partially correct.

Key words: named entity, recognition, translation, bilingual word alignment

摘要:

汉语–新蒙古文命名实体翻译在跨汉语–新蒙古文信息处理中具有重要意义, 而直接使用机器翻译的方法不能达到满意的结果。针对上述问题, 提出一种从汉语–新蒙古文平行语料中自动抽取汉语–新蒙古文命名实体翻译对的方法。该方法只需对汉语端进行命名实体标注; 然后基于双语HMM词对齐结果, 利用滑动窗口的方法抽取所有候选命名实体翻译对; 最后基于融合5 种特征的最大熵模型, 对所有候选翻译单位进行过滤, 选取与汉语端命名实体相对应的置信度最高的新蒙古文命名实体翻译单位。实验结果表明, 该方法优于基于HMM的方法, 在对齐模型只是部分准确的情况下, 也获得较高准确率的汉语–新蒙古文命名实体翻译对。

关键词: 命名实体, 识别, 翻译, 双语对齐

CLC Number: