北京大学学报(自然科学版)

面向中文专利SAO结构抽取的文本特征比较研究

饶齐,王裴岩,张桂平   

  1. 沈阳航空航天大学知识工程研究中心, 沈阳 110136;
  • 收稿日期:2014-07-27 出版日期:2015-03-20 发布日期:2015-03-20

Text Feature Analysis on SAO Structure Extraction from Chinese Patent Literatures

RAO Qi, WANG Peiyan, ZHANG Guiping   

  1. Knowledge Engineering Research Center, Shenyang Aerospace University, Shenyang 110136;
  • Received:2014-07-27 Online:2015-03-20 Published:2015-03-20

摘要: 针对中文专利文本中SAO结构实体关系抽取问题, 使用支持向量机的机器学习方法进行关系抽取实验, 分别对基本词法信息、实体间距离信息、最短路径闭包树句法信息以及词向量信息等特征的有效性进行验证分析。实验结果表明, 基本的词法信息能够明显提高关系抽取性能, 而句法信息没有显著提高关系抽取效果。此外, 也验证了词向量在SAO结构关系抽取中的可行性。

关键词: SAO结构, 关系抽取, 特征有效性, 词向量

Abstract: In order to resolve the problem of SAO-based relation extraction from Chinese patent literatures, a series of experiments were implemented by using Support Vector Machines. It focused on the analysis of the validity of basic lexical information, syntactic information such as the shortest path enclosed tree, and distance features used in related works. The results show that simple lexical features can contribute to a good performance, while syntactic features cannot bring a remarkable improvement. Moreover, the feasibility of a new representation of words, word embeddings, is validated on SAO-based relation extraction.

Key words: SAO structure, relation extraction, effectiveness of features, word distributed representation

中图分类号: