Acta Scientiarum Naturalium Universitatis Pekinensis

Previous Articles     Next Articles

An Approach to Auto-detection, Segmentation and Tagging of Bibliographic Metadata

GAO Liangcai, TANG Zhi, TAO Xin, FANG Jing   

  1. Institute of Computer Science and Technology, Peking University, Beijing 100871;
  • Received:2009-09-13 Online:2010-11-20 Published:2010-11-20

一种自动发现、分割与标注引文元数据的方法

高良才,汤帜,陶欣,房婧   

  1. 北京大学计算机科学技术研究所, 北京 100871;

Abstract: After reviewing the existing methods on citation data extraction, the authors propose a new approach for the task depending on a common typesetting practice of bibliographies: style consistency of citation data in the same document. Citation data detection and segmentation task are described on which less attention is put in previous researches. Furthermore, the authors take advantage of the style consistency of bibliographies to enhance citation metadata tagging. Experimental results show that the proposed method performs well in citation data detection, segmentation and tagging.

Key words: bibliographic metadata, style consistency, metadata extraction, digital library

摘要: 在总结现有的引文元数据抽取方法的基础上, 针对引文的排版惯例???引文在文档内部风格一致, 提出了一种新的引文元数据抽取方法。重点描述了以往研究中很少涉及的引文元数据的自动发现和分割, 探讨了风格一致性在引文元数据标注中的应用。实验结果表明此方法在引文元数据发现、分割和标注方面均取得了较好的效果。

关键词: 引文元数据, 风格一致性, 元数据抽取, 数字图书馆

CLC Number: