Acta Scientiarum Naturalium Universitatis Pekinensis ›› 2020, Vol. 56 ›› Issue (1): 1-8.DOI: 10.13209/j.0479-8023.2019.095

Previous Articles     Next Articles

Automated ICD Coding Based on Word Embedding with Entry Embedding and Attention Mechanism

ZHANG Hongke1, FU Zhenxin1, REN Qianping2, XU Hui2, ZHAO Dongyan1, YAN Rui1,†   

  1. 1. Wangxuan Institute of Computer Technology, Peking University, Beijing 100871 2. Gennlife (Beijing) Technology Ltd, Beijing 100080
  • Received:2019-05-22 Revised:2019-09-25 Online:2020-01-20 Published:2020-01-20
  • Contact: YAN Rui, E-mail: ruiyan(at)pku.edu.cn

基于融合条目词嵌入和注意力机制的自动 ICD 编码

张虹科1, 付振新1, 任前平2, 徐辉2, 赵东岩1, 严睿1,†   

  1. 1. 北京大学王选计算机研究所, 北京 100871 2. 生命奇点(北京)科技有限公司, 北京 100080
  • 通讯作者: 严睿, E-mail: ruiyan(at)pku.edu.cn
  • 基金资助:
    国家重点研发计划(2017YFC0804001)和国家自然科学基金(61672058, 61876196)资助

Abstract:

The authors propose a neural model based on word embedding with entry embedding and attention mechanism, which can make full use of the unstructured text in the electronic medical record to achieve automated ICD coding for the main diagnosis of the medical record home page. This method first embeds the words which contain the medical record entries into word embeddings, and enriches word-level representation based on keyword attention. Then, the word attention is used to highlight the role of key words and enhance the text representation. Finally, ICD codes are output by a fully connected neural network classifier. Ablation study on a Chinese electronic medical record data set shows that word embedding with entry embedding, keyword attention and word attention is effective. The proposed model gets the best results for 81 diseases classification compared with baselines and can effectively improve the quality of automated ICD coding.

Key words: automated ICD coding, word embedding with entry embedding, keyword attention, word attention, medical record home page, main diagnosis

摘要:

构建一种基于融合条目词嵌入和注意力机制的深度学习模型, 可以充分利用电子病案中的多种非结构化文本数据, 对病案首页的主要诊断进行自动ICD编码。该模型首先对含有病案条目的文本进行融合条目的词嵌入, 并通过关键词注意力来丰富词级别的类别表示; 然后利用词语注意力来突出重点词语的作用, 增强文本表示; 最后通过全连接神经网络分类器进行分类, 输出ICD编码。通过在中文电子病案数据集上的消融实验, 验证了融合条目词嵌入、关键词注意力和词语注意力的有效性; 与多个基准模型相比, 所建模型在对81 种疾病的分类中取得最好的分类效果, 可以有效地提高自动ICD编码的质量。

关键词: 自动ICD 编码, 融合条目词嵌入, 关键词注意力, 词语注意力, 病案首页, 主要诊断