北京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (2): 249-254.DOI: 10.13209/j.0479-8023.2017.154

上一篇    下一篇

基于瓶颈特征的藏语拉萨话连续语音识别研究

周楠, 赵悦, 李要嫱, 徐晓娜, 才旺拉姆, 吴立成   

  1. 中央民族大学信息工程学院, 北京 100081
  • 收稿日期:2017-05-31 修回日期:2017-09-05 出版日期:2018-03-20 发布日期:2018-03-20
  • 通讯作者: 赵悦, E-mail: zhaoyueso(at)muc.edu.cn
  • 基金资助:
    教育部人文社会科学规划基金(15YJAZH120)和中央民族大学双一流学科建设项目资助

Study on Continuous Speech Recognition Based on Bottleneck Features for Lhasa-Tibetan Dialect

ZHOU Nan, ZHAO Yue, LI Yaoqiang, XU Xiaona, CAIWANG Lamu, WU Licheng   

  1. School of Information Engineering, Minzu University of China, Beijing 100081
  • Received:2017-05-31 Revised:2017-09-05 Online:2018-03-20 Published:2018-03-20
  • Contact: ZHAO Yue, E-mail: zhaoyueso(at)muc.edu.cn

摘要:

基于从深度神经网络提取的瓶颈特征具有语音长时相关性和紧凑表示的特点, 将瓶颈特征及其与MFCC的复合特征用于藏语连续语音识别任务中, 可以代替传统的MFCC特征进行GMM-HMM声学建模。在藏语拉萨话连续语音识别任务中的实验表明, 瓶颈特征的复合特征取得比深度神经网络后验特征和单瓶颈特征更好的识别表现。

关键词: 藏语拉萨话, 连续语音识别, 高斯混合–隐马尔科夫模型, 瓶颈特征, 深度神经网络

Abstract:

The bottleneck features extracted from deep neural network not only have long term contextdependence and compact representation of speech signal, but also can replace the traditional MFCC features for GMM-HMM acoustic modeling. The authors apply bottleneck features and their concatenated features with MFCC into Lhasa-Tibetan continuous speech recognition. The experiments in Lhasa-Tibetan continuous speech recognition show that the concatenated features of bottleneck features and MFCC achieve better performance than the posterior features of deep neural network and mono-bottleneck features.

Key words: Lhasa-Tibetan, continuous speech recognition, GMM-HMM, bottleneck features, deep neural network (DNN)

中图分类号: