北京大学学报(自然科学版)

基于清华汉语树库的复句关系词识别与分类研究

李艳翠1,2,孙静1,周国栋1,冯文贺3   

  1. 1. 苏州大学计算机科学与技术学院, 苏州 215006; 2. 河南科技学院信息工程学院, 新乡 453003; 3. 河南科技学院人文学院, 新乡 453003;
  • 收稿日期:2013-06-15 出版日期:2014-01-20 发布日期:2014-01-20

Recognition and Classification of Relation Words in the Compound Sentences Based on Tsinghua Chinese Treebank

LI Yancui1,2, SUN Jing1, ZHOU Guodong1, FENG Wenhe3   

  1. 1. Department of Computer Science and Technology, Soochow University, Suzhou 215006; 2. School of Information Engineering, Henan Institute of Science and Technology, Xinxiang 453003; 3. School of humanities, Henan Institute of Science and Technology, Xinxiang 453003;
  • Received:2013-06-15 Online:2014-01-20 Published:2014-01-20

摘要: 根据清华汉语树库的标注方法, 利用规则从中提取复句关系词并标注其类别, 然后分别抽取带功能标记和不带功能标记的自动句法树的句法、词法、位置特征, 进行复句关系词的识别和分类。实验结果表明, 复句关系词判断准确率达95.7%, 复句关系词类别判断F1值为77.2%。

关键词: 复句关系词, 清华汉语树库, 关系词识别, 关系词分类

Abstract: According to Tsinghua Chinese Treebank annotation methods, the authors extracted relation words and marked their categories. Then syntax, lexical and position features of automatic syntax tree with and without functional marker were extracted to recognize and classify relation words. Experiment results show that relative recognition accuracy is 95.7%, and relation words classification F1 is 77.2%.

Key words: relation words in compound sentences, Tsinghua Chinese Treebank, relation words recognition, relation words classification

中图分类号: