Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2016, Vol. 52 ›› Issue (1): 10-16.DOI: 10.13209/j.0479-8023.2016.009

Previous Articles     Next Articles

Research on the Sense Guessing of Chinese Unknown Words Based on “Semantic Knowledge-base of Modern Chinese”

SHANG Fenfen1,2, GU Yanhui1,2, DAI Rubing3, LI Bin3, ZHOU Junsheng1,2, QU Weiguang1,2   

  1. 1. School of Computer Science and Technology, Nanjing Normal University, Nanjing 210023
    2. Jiangsu Research Center of Information Security & Privacy Technology, Nanjing 210023
    3. School of Chinese Language and Culture, Nanjing 210097
  • Received:2015-06-19 Online:2016-01-20 Published:2016-01-20
  • Contact: GU Yanhui, E-mail: gu(at)njnu.edu.cn

基于《现代汉语语义词典》的未登录词语义预测研究

尚芬芬1,2, 顾彦慧1,2, 戴茹冰3, 李斌3, 周俊生1,2, 曲维光1,2   

  1. 1. 南京师范大学计算机科学与技术学院, 南京 210023
    2. 江苏省信息安全保密技术工程研究中心, 南京 210023
    3. 南京师范大学文学院, 南京 210097
  • 通讯作者: 顾彦慧, E-mail: gu(at)njnu.edu.cn
  • 基金资助:

    国家自然科学基金(61272221, 61472191)、国家社会科学基金(11CYY030, 10CYY021)、江苏省社会科学基金(12YYA002)和江苏省高校自然科学基金(14KJB520022)资助

Abstract:

Based on the research issue of sense guessing of Chinese unknown words, different levels of semantic dictionary were introduced by applying “Semantic Knowledge-base of Modern Chinese”. Models have constructed for sense guessing by using these dictionary. Each model was intergrated to predict the unknown words and obtained better performance. Based on each model, semantic prediction and annotation of the unknown words in People’s Daily which published in 2000 were evaluated. Finally, corpus resources with the sense annotation of unknown words were obtained.

Key words: Chinese unknown words, sense guessing, semantic annotation, ensemble learning

摘要:

基于《现代汉语语义词典》, 首先建立不同语义层次的词典, 根据词典分别构建模型并进行语义预测, 然后将各个模型进行集成, 通过集成模型再对未登录词进行语义预测, 得到较好的预测性能。利用预测模型对2000年《人民日报》语料进行未登录词语义预测和标注, 最终得到带有未登录词语义义项标注的语料资源。

关键词: 汉语未登录词, 语义预测, 语义标注, 集成学习

CLC Number: