北京大学学报(自然科学版)

ArithRegion——一种压缩XML的索引结构

包小源1,2,唐世渭2,吴泠2,杨冬青2,宋再生2,王腾蛟2   

  1. 1天津师范大学计算机与信息工程学院,天津,300074;E-mail: zongyj@pku.org.cn;2北京大学计算机科学系,北京,100871
  • 收稿日期:2005-01-05 出版日期:2006-01-20 发布日期:2006-01-20

ArithRegion - An Index Structure on Compressed XML Data

BAO Xiaoyuan1, 2, TANG Shiwei2, WU Ling2, YANG Dongqing2, SONG Zaisheng2, WANG Tengjiao2   

  • Received:2005-01-05 Online:2006-01-20 Published:2006-01-20

摘要: XML在数据交换中的应用越来越广泛,但由于标记引入而使其空间膨胀较大,对传输及存储资源耗费严重。压缩后的XML数据容量明显减少,但怎样基于压缩后的XML数据直接进行高效的查询处理,当前研究工作较少。以反向算术压缩为基本压缩算法,提出针对XML数据库中压缩XML文件的索引结构ArithRegion,基于该索引结构,可高效处理形如//element1/element2/…/elmentm的查询。

关键词: XML, 索引, B+树, 算术压缩

Abstract: Even XML is used as a popular data exchange standard over Internet and Intranet, its space expansion makes the transmitting and storing of XML data very expensive in terms of resources because of adding tags to every different semantic content unit. After compressed, its size will be much smaller, but how to evaluate query efficiently and directly based on the compressed data is still a necessary work. The authors propose an XML index structure using B+ tree as its' backbone structure, on compressed data which is resulted from revert arithmetic compression, ArithRegion. Queries as the form of //element1/element2/…/elmentm can be evaluated efficiently using ArithRegion.

Key words: XML, index, B+ tree, arithmetic compression

中图分类号: