北京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (1): 75-80.DOI: 10.13209/j.0479-8023.2016.001

上一篇    下一篇

基于字形与语音的音译单元对齐方法

刘博佳, 徐金安, 陈钰枫, 张玉洁   

  1. 北京交通大学计算与信息技术学院, 北京 100044
  • 收稿日期:2015-06-18 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 徐金安, E-mail: jaxu(at)bjtu.edu.cn
  • 基金资助:
    国家自然科学基金(61370130, 61473294)、中央高校基本科研业务费专项资金(2014RC040)和国家国际科技合作专项(2014DFA11350)资助

Integrating of Grapheme-Based and Phoneme-Based Transliteration Unit Alignment Method

LIU Bojia,XU Jin’an, CHEN Yufeng, ZHANG Yujie   

  1. School of Computer and Information, Beijing Jiaotong University, Beijing 100044
  • Received:2015-06-18 Online:2016-01-20 Published:2016-01-20
  • Contact: XU Jin’an, E-mail: jaxu(at)bjtu.edu.cn

摘要: 为了解决仅采用基于语音或基于字形的音译方法造成的误差过大问题, 以汉英音译为主要研究对象, 运用统计与规则的理论思想, 提出融合基于语音和字形的音译单元对齐方法, 设计了4个实验, 与传统方法进行对比。实验结果显示, 该方法能够很好地提高机器音译的准确性。

关键词: 机器音译, 对齐, N-gram 模型, 基于语音的音译方法, 基于字形的音译方法

Abstract:

In order to solve the errors caused by only using the pheneme-based method or the grapheme-based method, applying the theory of statistics and rules, this paper proposes a new method for transliteration unit alignment which integrates the two main transliteration methods. Four experiments are designed to compare with the traditional methods. Experimental results show that proposed method outperforms other methods in terms of performance in machine transliteration.

Key words: machine transliteration, alignment, N-gram model, grapheme-based method, phoneme-based method

中图分类号: