北京大学学报(自然科学版) ›› 2026, Vol. 62 ›› Issue (1): 57-68.DOI: 10.13209/j.0479-8023.2025.091

上一篇    下一篇

基于可编程数据面加速分布式检索系统

张鹏豪   

  1. 山西大学计算机与信息技术学院, 太原 030000
  • 收稿日期:2024-11-18 修回日期:2025-04-21 出版日期:2026-01-20 发布日期:2026-01-20
  • 通讯作者: 张鹏豪, E-mail: zhangpenghao(at)sxu.edu.cn
  • 基金资助:
    国家自然科学基金(62302281)和山西省基础研究计划(202303021212016)资助

Accelerating Distributed Search Systems Based on Programmable Data Plane Technology

ZHANG Penghao   

  1. School of Computer and Information Technology, Shanxi University, Taiyuan 030000
  • Received:2024-11-18 Revised:2025-04-21 Online:2026-01-20 Published:2026-01-20
  • Contact: ZHANG Penghao, E-mail: zhangpenghao(at)sxu.edu.cn

摘要:

为提高分布式应用系统的网络性能, 提出一种基于可编程数据面的加速分布式检索系统NetDSH。该系统能够优化可编程数据面的存储和数据处理能力, 通过自定义协议、Top-K插入方法和T更新策略, 高效准确地剔除潜在的低质量候选答案, 从而提高网络传输性能。在搭建的测试平台上, 基于4种类型的数据集(SIF1M, SIF1B, SPACE1B和Random)对NetDSH进行评估。结果表明, 与传统的基于局部敏感哈希的分布式检索系统TLSH和NetSHa相比, NetDSH可以将传输的数据包数目减少至原来的1/3, 同时, 系统检索性能得到3.2倍的提升。

关键词: 可编程数据面, 分布式系统, 近似最近邻检索, 局部敏感哈希算法

Abstract:

To enhance the network performance of distributed application systems, we propose an acceleration framework NetDSH for distributed search based on a programmable data plane. NetDSH optimizes the storage and data processing capabilities of the programmable data plane by leveraging a custom protocol, a Top-K insertion method, and a T-update strategy to efficiently and accurately filter out low-quality candidate answers, thereby improving network transmission efficiency. We evaluate NetDSH on the testbed using four benchmark datasets inclouding SIF1M, SIF1B, SPACE1B, and Random. Experimental results demonstrate that compared with conventional distributed search systems, namely TLSH and NetSHa, NetDSH reduces the number of transmitted packets to 1/3 of the original volume while achieving a 3.2× improvement in system retrieval performance. 

Key words: programmable data plane, distributed system, approximate nearest neighbour search, local sensitive hashing