Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2020, Vol. 56 ›› Issue (4): 607-613.DOI: 10.13209/j.0479-8023.2020.049

Previous Articles     Next Articles

Research on Expert Disambiguation of Same Name Based on Multi-feature Fusion

ZENG Jianrong1, ZHANG Yangsen1,2,†, WANG Siyuan1, HUANG Gaijuan1,2, CUI Jia3, MA Huan3   

  1. 1. Intelligent Information Processing Laboratory of Beijing Information and Technology University, Beijing 100101 2. Beijing Laboratory of National Economic Security Early-warning Engineering, Beijing 100044 3. National Computer Network and Information Security Management Center, Beijing 100029
  • Received:2019-07-17 Revised:2019-11-26 Online:2020-07-20 Published:2020-07-20
  • Contact: ZHANG Yangsen, E-mail: zhangyangsen(at)163.com

基于多特征融合的同名专家消歧方法研究

曾健荣1, 张仰森1,2,†, 王思远1, 黄改娟1,2, 崔佳3, 马欢3   

  1. 1. 北京信息科技大学智能信息处理研究所, 北京 100101 2. 国家经济安全预警工程北京实验室, 北京 100044 3. 国家计算机网络与信息安全管理中心, 北京 100029
  • 通讯作者: 张仰森, E-mail: zhangyangsen(at)163.com
  • 基金资助:
    国家自然科学基金(61772081)和促进高校内涵发展–研究生科技创新项目(5121911044)资助

Abstract:

According to the expert ambiguity with the same name in the process of building expert database, an expert disambiguation method based on multi-feature fusion is proposed. The paper information of experts is obtained from data sources such as CNKI. Key information (title, abstract, keyword, affiliation and collaborator) is extracted. The feature representation model is constructed with these information as attribute features. The similarity calculation function between experts of the same name is defined. According to the similarity, the problem of disambiguation of the same name is transformed into clustering problem. Affinity propagation clustering algorithm is used to solve the problem of homonymy disambiguation. Experiments on the collected expert papers show that the accuracy of the same-name expert disambiguation method based on multi-feature fusion can reach 92%, and good disambiguation results are achieved.

Key words: multi-feature fusion, homonymy disambiguation, expert database, clustering algorithm, data collection

摘要:

针对专家库构建过程中出现的同名歧义现象, 提出一种基于多特征融合的同名专家消歧方法。从中国知网(CNKI)数据源中获取专家的论文信息, 抽取论文的标题、摘要、关键词、作者单位和合作者等关键信息, 并将其作为属性特征, 构建特征表示模型, 进而定义同名专家之间的相似度计算函数。根据计算得到的相似度, 将同名消歧问题转化为聚类问题。利用近邻传播聚类算法进行聚类, 解决同名消歧问题。在采集的专家论文数据上的实验表明, 基于多特征融合的同名专家消歧方法的准确率可达92%, 取得良好的消歧效果。

关键词: 多特征融合, 同名消歧, 专家库, 聚类算法, 数据采集