:::

詳目顯示

回上一頁
題名:一種基於知網的語義排歧模型研究
書刊名:International Journal of Computational Linguistics & Chinese Language Processing
作者:楊曉峰李堂秋
作者(外文):Yang, XiaofengLi, Tangqiu
出版日期:2002
卷期:7:1
頁次:頁47-78
主題關鍵詞:語義排歧知網中間語言相似度模式匹配語料庫語義限制規則語義環境Word sense disambiguationHownetInter liguaSense atomCorpusSemantic environment
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:11
本文提出了機器翻譯中句法分析的一種語義排歧模型,該模型以《知網》爲主 要語義知識源。《知網》是一個以漢語和英語的詞語所代表的概念爲描述物件, 以揭示概念與概念之間以及概念所具有的屬性之間的關係爲基本內容的常識 知識庫,它爲我們的排歧提供了豐富的語義資訊。排歧模型結合了基於規則及 基於統計的方法,應用於分析所産生的中間結構中,從“優選"的角度進行詞 義及結構的排歧。 排歧模型首先利用大規模的語料庫獲取義原的同現集合,該語料庫未進行任何 的語義標誌,因此獲取過程是無指導的。然後它根據轉換模板構造出義原的語 義限制規則。《知網》中的詞語義項由義原組成,義項的語義限制規則可以由 其構成義原的語義規則得到。 在語義排歧階段,我們首先確定輸入句的每個實義詞的上下文相關詞集。由於 實義詞的語義關係在對當前句子的語法結構確定及各詞語詞義的選擇起著相 當重要的作用,我們對一個句子的評價就建立在對該句中實義詞的評價基礎之 上。把詞語的當前上下文相關詞集與詞語各義項的限制規則所描述語義特徵資 訊進行比較,根據比較的相似度選擇最合適的義項。同時將相似度的最大值作 爲該詞語的評價值。中間分析結果中各實義詞的評價分值可以成爲評價此中間 結果的依據,以此在多個中間結構中選出最佳的結果。這樣,我們在解決詞義 歧義的基礎上同時也解決了結構歧義。 本文所提出的語義排歧模型已在機器翻譯系統中具體地實現。實驗例句的測試 表明該排歧模型對解決句法分析中的辭彙歧義、結構歧義是有效的,並且優於 傳統的YES/NOT 的方法。本文首先提出了排歧模型的主要思想,並簡要介紹了《知網》。然後給出了從 語料庫中抽取義原同現資訊及將其轉化成語義限制規則的方法。接著文章詳細 介紹了排歧演算法,包括構建上下文相關詞集,義原間、語義規則和上下文詞 集間的相似度計算。最後文章給出了模型的試驗實例結果。
This thesis presents a description of a semantic disambiguation model applied in the syntax parsing process of the machine translation system. The model uses Hownet as its main semantic resource, which is a common-sense knowledge base unveiling inter-conceptual relations and inter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents. It can provide rich semantic information for our disambiguation. The model makes the word sense and structure disambiguation in the way of “preferring”. “preferring” is applied in the results produced by the parsing process. It combines the rule-based method and statistic based method. First we extract from a large the co-occurrence information of each sense-atom. The corpus is untagged so the extracting process is unguided. We can construct restricted rules from the co-occurrence information according to certain transfer template. The semantic entry of a word in the Hownet is made of sense-atoms, so we can make out the restricted rules for each entry of any word. During the course of disambiguation, the model constructs the context-related words set for each notational word in the input sentence. The semantic collocation relations between notional words can play a very important role in the syntax structure disambiguation. Our evaluation of some candidates is based on the degree of tightness of match between notional words in the structure. We compare the context-related words set of the word in the current structure with all the restricted rules of the word in the lexicon, and find the best match. Then the entry with the best match is taken as the word’s explanation. And the degree of similarity shows how the word in the structure matches with other notional words in it, so it can be taken as the reference of the notional words. Because the discrepancy of different candidate parses of a structure, the same word has different content-related words set, and so will get different scores. We can calculate the best match according tothe score of all the notional words of the sentence. In this way we can solve the most of word sense disambiguation and structural disambiguation at the same time. The semantic disambiguation model proposed in this thesis has been implemented in MTG system. Our experiment shows that the model is very effective for this purpose. And it is obviously more tolerant and much better than traditional YES or NO clear cut method. In this thesis we first put forward the general idea of the method and give a brief introduce to the Hownet Dictionary. Then we give the methods of extracting co-occurrence information for each sense-atom from the corpus and transferring this information to restricted rules. Then the algorithm of disambiguation is proposed with detail, which includes constructing context-related words set, the calculation of the similarity between atom-senses, and between restricted-rules and the context-related sets. The experiment result given in the end of the paper shows that the method is effective.
期刊論文
1.馮志偉(1995)。論歧義結構的潛在性。中文資訊學報,9(4)。  延伸查詢new window
2.呂叔湘(1984)。歧義類型。中國語文,1984(5)。  延伸查詢new window
3.苑春法、黃錦輝、李文捷。基於語義知識的漢語句法結構排歧。中文資訊學報,13(1)。  延伸查詢new window
4.李涓子、黃昌寧(1999)。基於轉換的無指導詞義標注方法。清華大學學報(自然科學版),39(7)。  延伸查詢new window
5.楊曉峰、李堂秋、洪青陽(2001)。基於實例的漢語句法結構分析歧義消解。中文資訊學報,2001(15)。  延伸查詢new window
會議論文
1.Roth, Dana Lincoln(1998)。Learning to Revolve Natural Language Ambiguities: A Unified Approach。Madison, WI。  new window
2.Wilks, Y.、Stevenson, Mark(1998)。Word Sense Disambiguation Using Optimized Combinations of Knowledge Sources。New Brunswick, NJ : ACL ; San francisco, CA。  new window
3.Resnik, Philip、Yarowsky, David。A Perspective on Words Sense Disambiguation Methods and Their Evaluation。wd方。79-86。  new window
學位論文
1.詹衛東(1993)。現代漢語VP結構定界各結構關係判定,北京。  延伸查詢new window
圖書
1.梅家駒(1999)。現代漢語搭配辭典。上海:商務印書館。  延伸查詢new window
2.馮志偉(1995)。歧義消解策略初探。計算語言學進展與應用。  延伸查詢new window
3.趙鐵軍(2000)。機器翻譯原理。機器翻譯原理。哈爾濱。  延伸查詢new window
4.李涓子、黃昌寧(1999)。一種無指導的詞義排歧模型。計算語言學文集。北京。  延伸查詢new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關期刊論文
 
無相關博士論文
 
無相關書籍
 
無相關著作
 
QR Code
QRCODE