面向大语料库的语音合成方法研究

北京大学学报（自然科学版）

面向大语料库的语音合成方法研究

于延锁,朱风云,李先刚,刘翼,吴玺宏

北京大学言语听觉研究中心, 机器感知与智能教育部重点实验室, 北京100871;

收稿日期:2013-05-09 出版日期:2014-09-20 发布日期:2014-09-20

Research on Speech Synthesis for Large-Scale Corpora

YU Yansuo, ZHU Fengyun, LI Xiangang, LIU Yi, WU Xihong

Speech and Hearing Research Center, Key Laboratory of Machine Perception MOE, Peking University, Beijing 100871;

Received:2013-05-09 Online:2014-09-20 Published:2014-09-20

摘要/Abstract

摘要： 针对几百小时粗标注大语料库, 提出一种新颖的语音合成系统构建方法。首先, 借助于语音识别、文本对齐和句法分析等技术实现大语料库的自动筛选与标注。然后, 为了有效解决大语料库声学模型训练中存在的内存空间以及计算时间开销过大等问题, 优化了传统的训练流程, 在不损失声学模型准确性的前提下, 显著提高了模型的训练速度。主观实验表明, 与具有精标注的小语料库相比, 引入粗标注的大语料库可以带来0.5分左右的MOS提升。

关键词: 语音数据筛选, 声学模型训练, 基于HMM的单元挑选与波形拼接

Abstract: Aiming at roughly labeled corpora with several hundred hours of speech, a novel approach of constructing text-to-speech system is proposed. This approach realizes automatically cleaning and labeling of large-scale corpora by means of speech recognition, text alignment and syntactic parsing. Furthermore, in order to solve the problems of memory space expansion and time consumption for acoustic model training of large-scale corpora, a fast training method, which can ensure the accuracy of acoustic model, is realized through the optimization of conventional process of model training. Subjective evaluations show that the exploitation of large-scale corpora with rough transcription can achieve significant improvement at 0.5 MOS score in contrast with small-scale corpora with exact transcription.

Key words: speech data selection, acoustic model training, HMM-based unit selection and waveform concatenation

中图分类号:

TP391

于延锁,朱风云,李先刚,刘翼,吴玺宏. 面向大语料库的语音合成方法研究[J]. 北京大学学报（自然科学版）.

YU Yansuo,ZHU Fengyun,LI Xiangang,LIU Yi,WU Xihong. Research on Speech Synthesis for Large-Scale Corpora[J]. Acta Scientiarum Naturalium Universitatis Pekinensis.

[1]	刘秋慧, 张坤丽, 许鸿飞, 俞士汶, 昝红英. 助词“的”用法自动识别研究[J]. 北京大学学报（自然科学版）, 2018, 54(3): 466-474.
[2]	柯永红, 朱永福, 穗志方, 俞士汶. 基于多特征的语义角色标注一致性计算方法研究[J]. 北京大学学报（自然科学版）, 2018, 54(3): 475-480.
[3]	杨萌, 李培峰, 朱巧明. 一种基于Tree-LSTM的句子相似度计算方法[J]. 北京大学学报（自然科学版）, 2018, 54(3): 481-486.
[4]	张雨, 曾立, 邹磊. 大规模图数据的正则路径查询[J]. 北京大学学报（自然科学版）, 2018, 54(2): 236-242.
[5]	魏星, 王玮, 陈静萍, 解焱陆, 张劲松. 基于发音特征的汉语发音偏误自动标注[J]. 北京大学学报（自然科学版）, 2018, 54(2): 243-248.
[6]	林心宜, 严睿, 赵东岩. 融合词、句层级信息的抽取式摘要优化框架[J]. 北京大学学报（自然科学版）, 2018, 54(2): 229-235.
[7]	周楠, 赵悦, 李要嫱, 徐晓娜, 才旺拉姆, 吴立成. 基于瓶颈特征的藏语拉萨话连续语音识别研究[J]. 北京大学学报（自然科学版）, 2018, 54(2): 249-254.
[8]	谭亦鸣, 王明文, 李茂西. 基于翻译质量估计的神经网络译文自动后编辑[J]. 北京大学学报（自然科学版）, 2018, 54(2): 255-261.
[9]	吴焕钦, 张红阳, 李静梅, 朱俊国, 杨沐昀, 李生. 基于伪数据的机器翻译质量估计模型的训练[J]. 北京大学学报（自然科学版）, 2018, 54(2): 279-285.
[10]	吕书宁, 董志安. 利用URL-Key领域术语识别方法[J]. 北京大学学报（自然科学版）, 2018, 54(2): 262-270.
[11]	王文超, 吕学强, 张凯, 周建设. 足球赛事战报的自动写作研究[J]. 北京大学学报（自然科学版）, 2018, 54(2): 271-278.
[12]	陈玉敬, 吕学强, 周建设, 李宁. NBA赛事新闻的自动写作研究[J]. 北京大学学报自然科学版, 2017, 53(2): 211-218.
[13]	余传明, 冯博琳, 左宇恒, 陈百云, 安璐. 基于个人-群体-商户关系模型的虚假评论识别研究[J]. 北京大学学报自然科学版, 2017, 53(2): 262-272.
[14]	王博立, 史晓东, 苏劲松. 一种基于循环神经网络的古文断句方法[J]. 北京大学学报自然科学版, 2017, 53(2): 255-261.
[15]	姜杰, 夏睿. 机器学习与语义规则融合的微博情感分类方法[J]. 北京大学学报自然科学版, 2017, 53(2): 247-254.