北京大学学报(自然科学版)

• 北京大学学报 •

利用URL-Key进行查询分类

李雪伟1,吕学强1,董志安1,刘克会2,3   

  1. 1. 北京信息科技大学网络文化与数字传播北京市重点实验室, 北京100101; 2. 北京理工大学管理与经济学院, 北京100081; 3. 北京城市系统工程研究中心, 北京100035;
  • 收稿日期:2014-07-27 出版日期:2015-03-20 发布日期:2015-03-20

Query Classification by Using URL-Key

LI Xuewei1, Lü Xueqiang1, DONG Zhian1, LIU Kehui2,3   

  1. 1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101; 2. School of Management and Economics, Beijing Institute of Technology, Beijing 100081; 3. Beijing Research Center of Urban Systems Engineering, Beijing 100035;
  • Received:2014-07-27 Online:2015-03-20 Published:2015-03-20

摘要: 针对查询分类问题, 借助互联网中人工组织的分类网站领域URL, 利用URL-key在各个类别中使用的频度, 提出基于方差的领域URL-key识别方法, 利用机器翻译、拼音翻译和搜索结果反馈等技术对URL-key进行过滤, 构建领域URL-key。然后结合伪相关反馈技术, 选取URL-key为特征, 构建URL-key向量, 利用SVM对查询串进行分类。实验结果表明, 该方法不仅F值比对比方法提高7%, 而且资源的使用也远远小于对比方法, 提高了系统的时效性。

关键词: 查询分类, URL, URL-key, 伪相关反馈, 查询分类, URL, URL-key, 伪相关反馈

Abstract: For the problem of query classification, a variance based method is proposed to identify domain URL-key by the domain URL organized manually from aggregator sites and the use frequency of URL-key in each category. Then, the URL-key is filtered by using machine translation, pinyin and search results feedback technology. Finally, coupled with relevance feedback, the authors classify the query by selecting the URL-key as feature and establishing the URL-key vector with a SVM multi-class classifier. Experimental results show that the proposed method uses less resources and the F-value is 7% higher than contrast method.

Key words: query classification, URL, URL-key, pseudo relevance feedback, query classification, URL, URL-key, pseudo relevance feedback

中图分类号: