北京大学学报自然科学版 ›› 2016, Vol. 52 ›› Issue (1): 1-9.DOI: 10.13209/j.0479-8023.2016.020

   下一篇

在线医疗文本中的实体识别研究

苏娅, 刘杰, 黄亚楼   

  1. 南开大学计算机与控制工程学院(软件学院), 天津 300071
  • 收稿日期:2015-06-06 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 刘杰, E-mail: nkjieliu(at)gmail.com
  • 基金资助:

    天津市科技支撑项目(13ZCZDGX01098)、天津市自然科学基金(14JCQNJC00600)和中国民航信息技术科研基地开放课题(CAAC-ITRB-201303)资助

Entity Recognition Research in Online Medical Texts

SU Ya, LIU Jie, HUANG Yalou   

  1. College of Computer and Control Engineering (Software Institute), Nankai University, Tianjin 300071
  • Received:2015-06-06 Online:2016-01-20 Published:2016-01-20
  • Contact: LIU Jie, E-mail: nkjieliu(at)gmail.com

摘要:

针对在线医疗文本, 设计考虑医疗领域特性的识别特征, 并在自建数据集上进行实体识别实验。针对常见的5 类疾病: 胃炎、肺癌、哮喘、高血压和糖尿病, 采用近年来较先进的机器学习模型条件随机场, 进行训练和测试, 抽取目标实体包括疾病、症状、药品、治疗方法和检查5类。通过采用逐一添加特征的实验方式, 验证所提特征的有效性, 取得总体上81.26%的准确率和60.18%的召回率, 随后对识别特征给出进一步分析。

关键词: 实体识别, 数据挖掘, 条件随机场, 医疗信息

Abstract:

The authors design recognition features with the consideration of medical field characteristic for the online medical text, and the experiment of the entity recognition is carried out on the self-built data set. Concerned about five common diseases: gastritis, lung cancer, asthma, hypertension and diabetes. In the experiment, an advanced machine learning model Conditional Random Field is used for training and testing. The target entities include five kinds: disease, symptoms, drugs, treatment methods and check. The effectiveness of the proposed features is verified by using the experimental method, and the accuracy of the total 81.26% is obtained and the recall rate is 60.18%. Subsequently, the further analysis is given for the recognition features.

Key words: named entity recognition, data mining, conditional random field, medical information

中图分类号: