A HIT-Based Semantic Search Approachin Unstructured P2P Systems

XU Quanqing, DAI Yafei, CUI Bin   

  1. State Key Lab for Advanced Optical Communication Systems and Networks, Peking University, Beijing 100871;
  • Received:2009-01-05 Online:2010-01-20 Published:2010-01-20

一种无结构 P2P 系统中基于层次兴趣树的语义检索机制


  1. 北京大学区域光纤通信网与新型光通信系统国家重点实验室, 北京100871;

Abstract: An effective semantic search approach based on hierarchical interest tree ( HIT) is proposed in unstructured P2P systems. Documents owned by a peer are classifiedinto categories to build a HIT, which is sent to a super peer. Meanwhile, the inverted document index (IDI) of top n ter ms for each category is also sent to a super peer according to their Chi-square (χ2) statistic values. When a regular peer sends a query and gives a category semantic similarity threshold Simth, query messages are forwarded via an effective query routing algorithm and the results are returned by searching HIT. It is flexible for each peer since it can set the Simth, which can provide a better personal service. The experiments showthat HIT-based semantic search approach is more accurate and efficient than previous methods.

Key words: P2P, semantic search, hierarchical interest tree, query routing, semantic similarity

摘要: 提出了一种无结构P2P系统中有效的语义检索方法: 基于层次兴趣树(HIT) 的语义检索。每个节点中所有的文档根据分类 目录被分类成层次兴趣树, 并发送至该节点所属的超级节点。同时, 每个类中前 n个关键词的倒排文档索引, 也会依据它们的χ2统计值被发送至超级节点。当节点发送一个查询并给出类别语义相似性阈值Simth时, 查询消息通过一个有效的查询路由算法被转发, 结果则通过搜索HIT返回。不同的节点可以给出各自不同的Si mth, 其灵活性可以为每个节点提供更好的个性化服务。实验表 明在无结构的P2P系统中, 基于HIT的语义检索方法比以前的方法具有更好的准确性和有效性。

关键词: P2P, 语义检索, 层次兴趣树, 查询路由, 语义相似性

