Acta Scientiarum Naturalium Universitatis Pekinensis

Previous Articles     Next Articles

Tibetan Number Identification and Translation

SUN Meng1,2, HUA Quecairang3, LIU Kai1, Lü Yajuan1, LIU Qun1   

  1. 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190; 2. Graduate University, Chinese Academy of Sciences, Beijing 100049; 3. Tibetan Information Research Center, QingHai Normal University, Xining 810008;
  • Received:2012-06-05 Online:2013-01-20 Published:2013-01-20

藏文数词识别与翻译

孙萌1,2,华却才让3,刘凯1,吕雅娟1,刘群1   

  1. 1. 中国科学院计算技术研究所, 北京 100190; 2. 中国科学院研究生院, 北京 100049; 3. 青海师范大学藏文信息研究中心, 西宁 810008;

Abstract: The authors propose a definition of Tibetan number basic component through analyzing the inner structure and the boundary information. A best path decision was applied in judging basic component, then the number was recognized and translated by a finite automation model, finally a template matching algorithm was used for processing complicated number. The F-score of identification and translation is 98.73% and the BLEU score of Tibetan-Chinese translation obtains an improvement of 2.64%.

Key words: Tibetan, number basic component, automation, number indentification, number translation

摘要: 通过对藏文数词内部构词规律及外部边界信息进行分析, 提出对藏文数词基本构件定义的方案。采取最优路径决策模型判断数词构件边界, 然后通过有限自动机模型识别并翻译基本数词, 最后用模板匹配算法处理复杂数词。结果表明,提出的方法对数词识别与翻译的F值达到98.73%, 在藏汉机器翻译的测试集上的BLEU提高了2.64%。

关键词: 藏文, 数词基本构件, 自动机, 数词识别, 数词翻译

CLC Number: