Acta Scientiarum Naturalium Universitatis Pekinensis

Previous Articles     Next Articles

Research on Recognition Units of Large Vocabulary Speech Recognition System of Uyghur

Nurmemet Yolwas, Wushour Silamu, Reyiman Tursun   

  1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046;
  • Received:2013-06-14 Online:2014-01-20 Published:2014-01-20



  1. 新疆大学信息科学与工程学院, 乌鲁木齐 830046;

Abstract: Uyghur is an agglutinative language and words are not optimal recognition units for Uyghur LVCSR systems. With regard to recognition unit selection problem in Uyghur LVCSR systems, a more suitable recognition units for Uyghur likes sub-word is designed, and the combining recognition units of word and sub-word are proposed. The performance of language models and speech recognition are evaluated on different recognition units. Experiment results show that the proposed recognition units outperforms word units in terms of unit size, language model perplexity, and can give a relative word error rate reduction of 22% over the word based system.

Key words: Uyghur, LVCSR, speech recognition, recognition unit

摘要: 维吾尔语是一种黏着语, 单词不太适合作为维吾尔语大词汇连续语音识别系统识别单元。针对维吾尔语大词汇连续语音识别系统中的识别单元选择问题, 设计更适合维吾尔语的子词识别单元, 提出维吾尔语单词和子词相结合的组合识别单元构建方法, 并对单词、子词和组合识别单元的语言模型和语音识别性能进行评价。实验结果表明, 所提出的识别单元在单元数量、语言模型复杂度等方面表现出更加优越的性能, 并且使识别系统的单词错误率比基于单词的系统相对减少22%。

关键词: 维吾尔语, 大词汇, 语音识别, 识别单元

CLC Number: