北京大学学报自然科学版 ›› 2022, Vol. 58 ›› Issue (1): 1-6.DOI: 10.13209/j.0479-8023.2021.100

上一篇    下一篇

大规模中文具体度词典的构建及推理技术

谢志鹏, 毕冉   

  1. 复旦大学计算机科学技术学院, 上海 200433
  • 收稿日期:2021-06-08 修回日期:2021-08-14 出版日期:2022-01-20 发布日期:2022-01-20
  • 通讯作者: 谢志鹏, E-mail: xiezp(at)fudan.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB1005100)和国家自然科学基金(62076072)资助

Construction and Inference Technique of Large-Scale Chinese Concreteness Lexicon

XIE Zhipeng, BI Ran   

  1. School of Computer Science, Fudan University, Shanghai 200433
  • Received:2021-06-08 Revised:2021-08-14 Online:2022-01-20 Published:2022-01-20
  • Contact: XIE Zhipeng, E-mail: xiezp(at)fudan.edu.cn

摘要:

针对中文词语具体度资源的匮乏, 提出一种自动的中文词语具体度词典构造方法。该方法充分利用已有的英文词语具体度资源, 基于在线翻译工具和预训练词向量, 训练出中文词语具体度的多层感知器回归模型, 构造大规模的中文词语具体度词典。为了评估该中文词语具体度词典的性能, 设计两项基本的具体度推理任务: 词语级具体度推理和句子级具体度推理, 并通过人工标注的方式构造相应的评测数据集。实验结果表明构造的词语具体度词典可以有效地完成这两项推理任务。

关键词: 词语具体度, 具体度推理, 多层感知器, 自然语言处理

Abstract:

To solve the resource-lack problem of Chinese word concreteness, this paper designs and implements an automatic method to construct Chinese concreteness lexicon. By making full use of the existing resource of English word concreteness, it builds up a large-scale Chinese concreteness lexicon based on pretrained word embeddings and an MLP concreteness regression model. In addition, it proposes the concreteness inference tasks on the word level and on the sentence level, and manually constructs the corresponding datasets for evaluation the performance of the Chinese concreteness lexicon on these tasks. Experimental results show that the constructed concreteness lexicon can perform the two inference tasks effectively.

Key words: word concreteness, concreteness inference, multi-layer perceptron, natural language processing