北京大学学报(自然科学版)

汉语隐式篇章关系识别

孙静1,李艳翠1,2,周国栋1,冯文贺3   

  1. 1. 苏州大学计算机科学与技术学院, 苏州 215006; 2. 河南科技学院信息工程学院, 新乡 453003; 3. 河南科技学院人文学院, 新乡 453003;
  • 收稿日期:2013-06-22 出版日期:2014-01-20 发布日期:2014-01-20

Research of Chinese Implicit Discourse Relation Recognition

SUN Jing1, LI Yancui1,2, ZHOU Guodong1, FENG Wenhe3   

  1. 1. Department of Computer Science and Technology, Soochow University, Suzhou 215006; 2. School of Information Engineering, Henan Institute of Science and Technology, Xinxiang 453003; 3. School of humanities, Henan Institute of Science and Technology, Xinxiang 453003;
  • Received:2013-06-22 Online:2014-01-20 Published:2014-01-20

摘要: 采用一个自建的汉语篇章结构语料库(隐式关系占80%)进行隐式关系识别。语料中将篇章关系分成3个层次, 第一层包含因果、并列、转折、解说四大类。在此语料上, 利用上下文特征、词汇特征、依存树特征, 采用最大熵的分类方法对四大类关系进行识别。实验结果显示, 总正确率为62.15%, 其中并列类识别效果最好, F1值达到75.26%。

关键词: 篇章结构分析, 篇章关系, 隐式关系识别, 汉语篇章语料库

Abstract: The authors use a self-built Chinese Discourse Treebank (80% relations are implicit) to recognize implicit relations. In this corpus, discourse relations are divided into three layers, the first layer has four types: causality, coordination, transition and explanation. Based on this corpus, maximum entropy classifier is employed to identify four types relations with context, lexical and dependency parse features. Experimental results show that total accuracy is 62.15% and the identification effect of coordination is the best, F1 reaches 75.26%.

Key words: discourse parsing, discourse relation, implicit relation recognition, Chinese Discourse Treebank

中图分类号: