To slove the classification of the “de” structure containing the usage of semantic ellipsis, a hybrid neural network is built. Firstly, the network uses a bidirectional LSTM (long short-term memory) neural network to learn more syntactic and semantic information of the “de” structure. Then, the network employs a Max-pooling layer or GRU (gated recurrent unit) based multiple attention layers to capture features of ellipsis of the “de” structure by which the network can recognize the “de” structure containing the usage of semantic ellipsis. Experiments on CTB8.0 corpus show that the proposed approach can achieve accurate results efficiently, the F1 value is 96.67%.
Based on the research issue of sense guessing of Chinese unknown words, different levels of semantic dictionary were introduced by applying “Semantic Knowledge-base of Modern Chinese”. Models have constructed for sense guessing by using these dictionary. Each model was intergrated to predict the unknown words and obtained better performance. Based on each model, semantic prediction and annotation of the unknown words in People’s Daily which published in 2000 were evaluated. Finally, corpus resources with the sense annotation of unknown words were obtained.