Acta Scientiarum Naturalium Universitatis Pekinensis

    Next Articles

Discovering Abnormal Data in RDF Knowledge Base

HE Binbin, ZOU Lei, ZHAO Dongyan   

  1. Institute of Computer Science and Technology, Peking University, Beijing 100080;
  • Received:2014-06-30 Online:2015-03-20 Published:2015-03-20

语义知识库构建中的异常数据发现

贺彬彬,邹磊,赵东岩   

  1. 北京大学计算机科学技术研究所, 北京 100080;

Abstract: To effectively improve the data quality of RDF knowledge base, a solution is proposed about abnoraml data discovery and errouneous data repair in RDF graphs. Firstly, the authors innovatively define graph-based conditional functional dependency (GCFD) that can represent the attribute value and semantic structure dependencies of RDF data in a uniform manner. Then, an efficient framework and some novel pruning rules are proposed to discover GCFDs, and the workflow of auto-repairing errorneous data are given. Extensive experiments on several real-life RDF repositories confirm the superiority of proposed solution.

Key words: RDF data quality, graph-based conditional functional dependencies (GCFD), conditional functional dependency, functional dependency, RDF data quality, graph-based conditional functional dependencies (GCFD), conditional functional dependency, functional dependency

摘要: 为了提高RDF知识库的数据质量, 提出RDF图数据的异常检测及其自动修复的方法。首先, 原创性地定义了基于图的条件函数依赖(GCFD), 能够将属性值和语义结构的依赖关系统一表示; 然后, 提出有效的算法框架以及优化策略, 挖掘RDF数据中的GCFD, 并给出异常数据的自动修复流程; 最后, 在真实的数据集上, 通过大量实验确认解决方案的可行性和优越性。

关键词: RDF数据质量, 基于图的条件函数依赖, 条件函数依赖, 函数依赖, RDF数据质量, 基于图的条件函数依赖, 条件函数依赖, 函数依赖

CLC Number: