北京大学学报(自然科学版)

汉语单句谓语中心词识别知识的获取及应用

穗志方, 俞士汶   

  1. 北京大学计算语言学研究所,北京,100871
  • 收稿日期:1997-10-21 出版日期:1998-09-20 发布日期:1998-09-20

The Acquisition and Application of the Knowledge for Recognizing the Predicate Head of a Chinese Simple Sentence

SUI Zhifang, YU Shiwen   

  1. Computational Linguistics Institute, Peking University, Beijing, 100871
  • Received:1997-10-21 Online:1998-09-20 Published:1998-09-20

摘要: 在基于实例的机器翻译(EBMT)的语句相似度研究中,确定谓语中心词以把握句子的整体结构是至关重要的。以标注了谓语中心词的3000句汉语单句作为训练集,将候选词本身的语法属性以及上下文环境作为该候选词的归类特征,通过建立统计决策树模型获取谓语中心词的识别知识。应用统计决策树进行了谓语中心词的自动识别,并获得了较为满意的测试结果。

关键词: 自然语言处理, 语料库, 机器翻译, 知识获取, 谓语中心词, 统计决策树

Abstract: It is necessary to grasp the main structure of the sentence through its predicate head for the sentence similarity calculation in EBMT. Taking 3000 tagged Chinese simple sentences as training set and the syntactic attributes and the contextual information as the classification features, this research acquires the knowledge of recognizing the predicate head through constructing a statistical decision tree model. The problem of applying the statistical decision tree to recognize the predicate head is also discussed.

Key words: natural language processing, corpus, machine translation, knowledge acquisition, predicate head, statistical decision tree

中图分类号: