北京大学学报(自然科学版)

维吾尔语大词汇语音识别系统识别单元研究

努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木,热依曼.吐尔逊   

  1. 新疆大学信息科学与工程学院, 乌鲁木齐 830046;
  • 收稿日期:2013-06-14 出版日期:2014-01-20 发布日期:2014-01-20

Research on Recognition Units of Large Vocabulary Speech Recognition System of Uyghur

Nurmemet Yolwas, Wushour Silamu, Reyiman Tursun   

  1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046;
  • Received:2013-06-14 Online:2014-01-20 Published:2014-01-20

摘要: 维吾尔语是一种黏着语, 单词不太适合作为维吾尔语大词汇连续语音识别系统识别单元。针对维吾尔语大词汇连续语音识别系统中的识别单元选择问题, 设计更适合维吾尔语的子词识别单元, 提出维吾尔语单词和子词相结合的组合识别单元构建方法, 并对单词、子词和组合识别单元的语言模型和语音识别性能进行评价。实验结果表明, 所提出的识别单元在单元数量、语言模型复杂度等方面表现出更加优越的性能, 并且使识别系统的单词错误率比基于单词的系统相对减少22%。

关键词: 维吾尔语, 大词汇, 语音识别, 识别单元

Abstract: Uyghur is an agglutinative language and words are not optimal recognition units for Uyghur LVCSR systems. With regard to recognition unit selection problem in Uyghur LVCSR systems, a more suitable recognition units for Uyghur likes sub-word is designed, and the combining recognition units of word and sub-word are proposed. The performance of language models and speech recognition are evaluated on different recognition units. Experiment results show that the proposed recognition units outperforms word units in terms of unit size, language model perplexity, and can give a relative word error rate reduction of 22% over the word based system.

Key words: Uyghur, LVCSR, speech recognition, recognition unit

中图分类号: