北京大学学报自然科学版 ›› 2022, Vol. 58 ›› Issue (1): 83-90.DOI: 10.13209/j.0479-8023.2021.101

上一篇    下一篇

结合自监督学习的多任务文本语义匹配方法

陈源1, 丘心颖1,2,†   

  1. 1. 广东外语外贸大学信息科学与技术学院, 广州 510006 2. 广州市非通用语种智能处理实验室, 广东外语外贸大学, 广州 510006
  • 收稿日期:2021-06-08 修回日期:2021-08-14 出版日期:2022-01-20 发布日期:2022-01-20
  • 通讯作者: 丘心颖, E-mail: xy.qiu(at)foxmail.com
  • 基金资助:
    国家社会科学基金(17BGL068)和广东省自然科学基金(2018A030313777)资助

Multi-task Semantic Matching with Self-supervised Learning

CHEN Yuan1, QIU Xinying1,2,†   

  1. 1. School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006 2. Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong University of Foreign Studies, Guangzhou 510006
  • Received:2021-06-08 Revised:2021-08-14 Online:2022-01-20 Published:2022-01-20
  • Contact: QIU Xinying, E-mail: xy.qiu(at)foxmail.com

摘要:

基于文本交互信息对文本语义匹配模型的重要性, 提出一种结合序列生成任务的自监督学习方法。该方法利用自监督模型提取的文本数据对的交互信息, 以特征增强的方式辅助基于神经网络的语义匹配模型, 构建多任务的文本匹配模型。9个模型的实验结果表明, 加入自监督学习模块后, 原始模型的效果都有不同程度的提升, 表明所提方法可以有效地改进深度文本语义匹配模型。

关键词: 自监督学习, 文本语义匹配, 多任务学习

Abstract:

In semantic matching, the interaction information between pairs of texts is critical in predicting a matching score for the pairs. This paper proposes a multi-task learning framework with self-supervised learning for deep learning semantic matching problem. Specifically, a self-supervised model is designed for the paired sentences to regenerate each other with sequence-to-sequence generation method. Then a multi-task learning framework integrates the representation from the self-supervised generation with that of the deep matching model to predict the similarity score of the texts. Experimentations with 9 deep matching models prove that the proposed framework can improve the performances of the traditional deep matching models.

Key words: self-supervised learning, semantic matching, multi-task learning