北京大学学报(自然科学版)

一个中文实体链接语料库的建设

舒佳根1,2,惠浩添1,2,钱龙华1,2,朱巧明1,2   

  1. 1. 苏州大学自然语言处理实验室, 苏州 215006; 2. 苏州大学计算机科学与技术学院, 苏州 215006;
  • 收稿日期:2014-06-29 出版日期:2015-03-20 发布日期:2015-03-20

Construction of a Chinese Entity Linking Corpus

SHU Jiagen1,2, HUI Haotian1,2, QIAN Longhua1,2, ZHU Qiaoming1,2   

  1. 1. Natural Language Processing Lab, Soochow University, Suzhou 215006; 2. School of Computer Science and Technology, Soochow University, Suzhou 215006;
  • Received:2014-06-29 Online:2015-03-20 Published:2015-03-20

摘要: 鉴于现有中文实体链接基准语料库的缺乏, 在ACE2005中文语料库和中文维基百科的基础上, 通过自动构造和人工标注的方法, 构建一个中文实体链接语料库及其相关的中文知识库。与传统的英文实体链接语料库不同, 构造的中文实体链接语料库是基于实体而非单个实体指称(Mention)。中文实体链接语料库的构建, 将为中文实体链接研究提供一个可用的基准平台。

关键词: 中文, 实体链接, 语料库

Abstract: In view of the lack of Chinese entity linking benchmark corpus, the methodology of automatic construction and manual annotation was applied to build a Chinese entity linking corpus as well as its related Chinese knowledge base derived from the ACE2005 Chinese corpus and the Chinese Wikipedia resource. Contrary to traditional English entity linking corpus, this corpus is based on entities rather than individual entity mentions. The construction of Chinese entity linking corpus provides a benchmark platform to the Chinese entity linking research community.

Key words: Chinese, entity linking, corpus

中图分类号: