:::

詳目顯示

回上一頁
題名:詞義預測研究:以語料庫驅動的語言學研究方法
作者:洪嘉馡 引用關係
作者(外文):Jia-Fei Hong
校院名稱:國立臺灣大學
系所名稱:語言學研究所
指導教授:黃居仁
安可思
學位類別:博士
出版日期:2010
主題關鍵詞:詞彙歧異詞義預測語料庫為主的方法詞形相似成群的方法概念相似成群的方法實驗性的評估Lexical ambiguitysense predictioncorpus-based approachcharacter similarity clustering approachconcept similarity clustering approachexperimental Evaluation
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:32
在這個研究當中,我使用以語料庫為驅動的操作當作是詞義預測的主要方法。我著重藉由使用語料庫觀察個別的語義特徵以預測還沒有分析的詞彙的詞義,在本論文中,所使用的語料庫,如:中文十億詞語料庫 (Chinese Gigaword Corpus), 知網 (HowNet), 中文詞網 (Chinese Wordnet), and 現代漢語辭典 (XianDai HanYu CiDian)。使用這些語料庫,我可以藉由詞形比對和概念比對的分析來確定四個目標詞彙 --- 吃、玩、換、燒的共現詞彙群組。
這四個目標詞彙都是及物動詞,他們都有超過兩個以上的詞義。他們的共現詞彙對於這個詞義預測研究非常有用,也扮演著很重要的角色。當我進行詞形相似成群的分析時,我使用這些共現詞彙的相同詞素,是為了要將他們放入相同的群組。因此,在這個詞義預測的研究,以語料庫為主和計算機計算的方法裡,有兩個主要的策略,分別是:(1) 詞形相似成群的分析,和 (2) 概念相似成群的分析。又在(2)的分析當中,透過知網以探究 (a) 義原之間的相似,和 (b) 概念之間的相似。在這個詞義預測研究,我先預測不同群組詞彙可以表達不同的詞義,再透過以語料庫為主和計算機計算的方法的詞形相似成群分析和概念相似成群分析來檢測這四個目標詞彙的準確率。然後,我再透過中文詞網和現代漢語辭典來評估這四個目標詞彙,以證明我可以利用自動計算的程式來預測吃、玩、換、燒的不同詞義。
利用以語料庫為主和計算機的方法在這個詞義預測研究之後,我以紙筆的測驗來測試受試者的直覺知識以驗證不同群組的詞彙可以表達不同的詞義。因此,為了測驗這四個目標詞彙的相關共現詞彙,我使用了有多項選擇的任務(multiple-choice task, Burton et al. 1991)。此外,因為實驗的刺激語料收集是來自以語料庫為主和計算機計算的詞形相似成群的方法,所以我將靠著這些在詞義預測研究中所表現的結果來驗證本研究方法的可行性。
In this study, I proposed using corpus-driven distribution as the main method of prediction. I concentrated on individual semantic features to predict the senses of non-defined words by using corpora and tools, such as Chinese Gigaword Corpus, HowNet, Chinese Wordnet, and XianDai HanYu CiDian (Xian Han). Using these corpora, I determined the collocation clusters of the four target words--- chi1 “eat”, wan2 “play”, huan4 “change” and shao1 “burn” through character similarities and concepts similarities.
The four target words are all transitive verbs and they each have more than two senses. The collocation words of the four target words are very useful and play an important role in this sense prediction study. When conducting the character similarity clustering analysis, I employed identical morphemes of some of the collocation words in order to cluster them into the same cluster. Therefore, there are two main strategies of the corpus-based and computational approach used in this sense prediction study: (1) character similarity clustering analysis; and (2) concept similarity clustering analysis, which encompasses via HowNet (a) similarity between sememes, and (b) similarity between concepts. In this sense prediction study, I first predicted that different clusters can represent different senses, and I examined the accuracy rates of the four target words via the character similarity clustering analysis and the concept similarity clustering analysis of the corpus-based and computational approach. Then, I evaluated the four target words via sense divisions in Chinese Wordnet and in Xiandai Hanyu Cidian and was able to employ automatically computational programming to predict different senses for chi “eat”, wan2 “play”, huan4 “change”, and shao1 “burn”.
After the corpus-based and computational approach used in this sense prediction study, I demonstrated that I was able to use off-line tasks to test my participants’ intuition, which supports the theory that different clusters can represent different senses when using the corpus-based and computational approach. Therefore, in order to examine the related collocation words for the lexically ambiguous target words, I employed a multiple-choice task (Burton et al. 1991). In addition, because the stimuli were collected from the character similarity clustering analysis of the corpus-based and computational approach, I demonstrated the viability of this approach by the results presented in this sense prediction study.
Ahrens, Kathleen. 2006. “The Effect of Visual Target Presentation Times on Lexical Ambiguity Resolution.” Language and Linguistics, 7(3): 677–696.
Ahrens, Kathleen, Huang Chu-Ren and Shirley Chuang. 2003. “Sense and Meaning Facets in Verbal Semantics: A MARVS Perspective.” Language and Linguistics, 4(3): 468-484.
Ahrens, Kathleen. 2001. “On-line Sentence Comprehension of Ambiguous Verbs in Mandarin.” Journal of East Asian Linguistics, 10/4: 337–358.
Ahrens, Kathleen. 1998. “Lexical Ambiguity Resolution: Languages, Tasks and Timing.” In Sentence Processing: A Cross-linguistic Perspective. (Ed.) Dieter Hillert. Academic Press, pp.11–31.
Bolette, Sandford Pedersen.1997. “Lexical ambiguity in machine translation: Using Frame Semantics for expressing regularities in polysemy”. Nicolas Nicolov and Ruslan Mitkov (eds.). Recent Advances in Natural Language Processing II. Tzigov Chark, Bulgaria, pp. 207-220.
Buscaldi, Davide, Paolo Rosso, and Emilio Sanchis. 2007. “A WordNet-Based Indexing Technique for Geographical Information Retrieval”. Peters et al. (Eds.): CLEF 2006, LNCS 4730, pp. 954–957.
Canas, Alberto J.,Alejandro Valerio, Juan Lalinde-Pulido, Marco Carvalho, and Marco Arguedas. 2003. “Using WordNet for Word Sense Disambiguation to Support Concept Map Construction.” Paper presented at SPIRE 2003—10th International Symposium on String Processing and Information Retrieval, Oct. 2003, Manaus, Brazil, pp. 350-359.
Chao, Gerald and Michael G Dyer. 2002. “Maximum Entropy Models for Word Sense Disambiguation.” Proceedings of the 19th International Conference on Computational Linguistics. Taipei, Taiwan, pp. 155–161.
Chen, Hao, Tingting He, Donghong Ji, and Changqin Quan. 2005. “An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet.” Computational Linguistics and Chinese Language Processing. 10:4, pp. 473–482.
Chen, Hsin-Hsi, Guo-Wei Bian, and Wen-Cheng Lin. 1999. “Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval.” International Journal of Computational Linguistics and Chinese Language Processing, 4(2), August 1999, pp. 21–38.
Chen, Jen Nan and Jason S. Chang. 1998. “Topical Clustering of MRD Senses.” Computational Linguistics, 24(1): 61–95.


Chen, Jinying, and Martha Palmer. 2009. Improving English Verb Sense Disambiguation Performance with Linguistically Motivated Features and Clear Sense Distinction Boundaries. Language Resources and Evaluation. 43 (2):181–208, Springer Netherlands: SemEval2007, 2009.
Chung, Siaw-Fong and Kathleen Ahrens. Forthcoming. 2008. “MARVS Revisited: Incorporating Sense Distribution and Mutual Information into Near-Synonym Analyses. Language and Linguistics. 9.2:415-434.
Cottrell, Garrison W. 1984. "A model of lexical access of ambiguous words." Proceedings of the National Conference on Artificial Intelligence, Austin, TX, Aug. 6-10. (pp. 61-67) One of twelve papers nominated for the AAAI Publisher''s Prize.
Cruse, Alan. 2004. Meaning in Language: An Introduction to Semantics and Pragmatics. (Second edition.) Oxford: Oxford University Press.
Cruse, Alan. 1986. Lexical Semantics. Cambridge: Cambridge University Press.
Dai, Liu-Ling, Bin Liu, Yuning Xia, and Shi-Kun Wu. 2008. “Measuring Semantic Similarity between Words Using HowNet.” International Conference on Computer Science and Information Technology, pp. 601–605.
Dictionary editing team of Institute of Linguistics in Chinese Academy of Social Sciences (Ed.). 2005. Xiandai Hanyu Cidian. (The fifth edition). Beijing:The Commercial Press.
Dong, Zhen-Dong, Dong, Qiang. 2000. HowNet Knowledge Database, http:// www.keenage.com.
Dong, Zhen-Dong and Qiang Dong. 2006. HowNet and the Computation of Meaning. World Scientific Publishing.
Elston-Guttler, Kerrie E. and Angela D. Friederici. 2006. “Ambiguous words in sentences: Brain indices for native and non-native disambiguation.” Neuroscience Letters, 414: 85–89.
Fellbaum, Christiane. 2000. Autotroponomy. Polysemy: Theoretical and computational approaches, ed. by Yael Ravin, and Claudia Leacock, 52-67. New York: Oxford University Press.
Fellbaum, Christiane (Ed.). 1998. WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
Fillmore, Charles J., and Beryl T.S. Atkins. 2000. Describing polysemy: The case of ‘crawl.’ Polysemy: Theoretical and computational approaches, ed. by Yael Ravin, and Claudia Leacock, 91-110. New York: Oxford University Press.
Fujii, Hideo and Croft, W. Bruce. 1993. A Comparison of Indexing Techniques for Japanese Text Retrieval. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 237-246.
Ganesh, Ramakrishnan, B. P. Prithviraj, A. Deepa, Pushpak Bhattacharyya, and Soumen Chakrabarti. 2004. Soft Word Sense Disambiguation. Proceedings of the Second Global Wordnet Conference 2004, pp. 291–298.
Geeraerts, Dirk. 1993. Vagueness’s puzzles, polysemy’s vagaries. Cognitive Linguistics, 4.3: 223-272.
Goddard, Cliff. 2000. Polysemy: A Problem of Definition. Polysemy: Theoretical and computational approaches, ed. by Yael Ravin, and Claudia Leacock, 91-110. New York: Oxford University Press.
Gunter, Thomas C., Susanne Wagner, and Angela D. Friederici. 2003. Working Memory and Lexical Ambiguity Resolution as Revealed by ERPs: A Difficult Case for Activation Theories. Journal of Cognitive Neuroscience. 15.5: pp. 643–657.
Huang, Chih Ying. 2009. Lateralization of the sense effect in reading Chinese disyllabic compounds: an event-related potential study. Master''s thesis. National Chengchi University, Taipei, Taiwan.
Huang, Chu-Ren, Elanna I. J. Tseng, Dylan B. S. Tsai, and Brian Murphy. 2003. “Cross-lingual Portability of Semantic Relations: Bootstrapping Chinese WordNet with English WordNet Relations.” Languages and Linguistics. 4.3: 509–532.


Huang, Chu-Ren, Chao-Ran Chen and Claude C.C. Shen. 2002. The Nature of Categorical Ambiguity and Its Implications for Language Processing: A Corpus-based Study of Mandarin Chinese. In Mineharu Nakayama (Ed.) Sentence Processing in East Asian Languages. Stanford: CSLI Publications.
Huang, Chu-Ren, Kathleen Ahrens, Chang Li-Li, Chen Keh-Jiann, Liu Mei-Chun, and Tsai Mei-Chih. 2000. “The Module-Attribute Representation of Verbal Semantics: From Semantics to Argument Structure.” In Biq (ed.) Special Issue on Chinese Verbal Semantics. Computational Linguistics and Chinese Language Processing. 5.1: 19-46.
Hong, Jia-Fei, Chu-Ren Huang and Kathleen Ahrens. 2008. Event Selection and Coercion of Two Verbs of Ingestion: A MARVS perspective. International Journal of Computer Processing of Oriental Language (IJCPOL). 21.2: 29-40. Singapore.
Hong, Jia-Fei, Kathleen Ahrens and Chu-Ren Huang. 2008. The Polysemy of Da3: An ontology-based study. Presented at the 9th Chinese Lexical Semantics Workshop (CLSW 2008), pp. 51-64. Singapore: National University of Singapore. July, 13-16.



Hong, Jia-Fei, Chu-Ren Huang and Kathleen Ahrens. 2007. The Polysemy of Da3: An ontology-based lexical semantic study. In the Proceedings of the 21st Pacific Asia Conference on Language,Information and Computation (PACLIC 21). November 1-3, Seoul National University. pp. 155-162.
Hong, Jia-Fei and Chu-Ren Huang 2006. “Using Chinese Gigaword Corpus and Chinese Word Sketch in Linguistic Research.” Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation. Wuhan, China, November 1–3.
Ide, Nancy and Jean Véronis. 1998. “Word Sense Disambiguation: The State of the Art.” Computational Linguistics, 1998, 24(1): 1–40.
Jackendoff, Ray. 1983. Semantics and Cognition. Cambrige: MIT Press.
Jiang, Jay J. and David W. Conrath. 1997. “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.” In Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan.
Karov, Yael and Shimon Edelman. 1998. “Similarity-based Word Sense Disambiguation.” Computational Linguistics, 24(1): 41–59.
Ker, Sue-Jin, Chu-Ren Huang, Jia-Fei Hong, Shi-Yin Liu, Hui-Ling Jian, I-Li Su and Shu-Kai Hsieh. 2008. Design and Prototype of a Large-scale and Fully Sense-tagged Corpus. 4938: 186-193. Springer-Verlag Berlin Heidelberg.
Ker Sue-Jin and Jen-Nan Chen. 2004. Adaptive Word Sense Tagging on Chinese Corpus. PACLIC 18, pp. 267–273. Dec. 8–10, 2004, Waseda University, Tokyo.
Kilgarriff, Adam, Chu-Ren Huang, Pavel Rychly, Simon Smith, and David Tugwell. 2005. Chinese Word Sketches. ASIALEX 2005: Words in Asian Cultural Context. June 1-3. Singapore.
Kipper, Karin, Anna Korhonen, Neville Ryant, Martha Palmer. 2008. A Large-scale Classification of English Verbs. Language Resources and Evaluation Journal. 42 (1): 21-40. Springer Netherlands.
Leacock, Claudia, George A. Miller, and Martin Chodorow. 1998. Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics, 24(1): 147–65.
Lee, Hyun-Ah and Gil Chang Kim. 2002. Translation Selection through Source Word Sense Disambiguation and Target Word Selection. Proceedings of the 19th International. Conference on Computational Linguistics, Taipei, Taiwan.
Lee, Yoong-Keok, Hwee Tou Ng, and Tee-Kiah Chia. 2004. Supervised Word Sense Disambiguation with Support Vector Machines and Multiple Knowledge Sources. Proceedings of SENSEVAL-3: The third International Workshop on the Evaluating Systems for the Semantic Analysis of Text, Barcelona, Spain, pp. 137–140.
Li, Ping, Zhen Jin, and Li Hai Tan. 2004. Neural Representations of Nouns and Verbs in Chinese: An fMRI Study. Neuroimage, 21: 1533–1541.
Li, Ping. 1998. Crosslinguistic Variation and Sentence Processing: The Case of Chinese, in D. Hillert (ed.), Sentence processing: A Cross-linguistic Perspective, Academic Press, San Diego.
Li, Ping and Michael Yip. 1996. Lexical Ambiguity and Context Effects in Spoken Word Recognition: Evidence from Chinese, in G. Cottrell (ed.), Proceedings of the 18th Annual Meeting of the Cognitive Science Society, Lawrence Earlbaum Associates, Hillsdale, NJ, pp. 228-232.
Li, Ping and Michael Yip. 1998. Context Effects and Processing of Spoken Homophones, in C. K. Leong and K. Tamaoka (eds.), Reading and Writing: An Interdisciplinary Journal 10, 223-243.
Li, Yu-Hua, Zuhair A. Bandar, and David McLean. 2003. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 15: 871–82.
Li, Wanyin, Qin Lu, and Ruifeng Xu. 2005. Similarity Based Chinese Synonym Collocation Extraction. Computational Linguistics and Chinese Language Processing. 10.1: 123–44.

Lien, Chinfa. 2000. A Frame-based Account of Lexical Polysemy in Taiwanese. Language and Linguistics. 1.1:119–138.
Lin, Charles and Kathleen Ahrens. 2000. Calculating the Number of Senses: Implications for Ambiguity Advantage Effect During Lexical Access. In H. Y Tai and Chang Y. L. (eds.) Proceedings of the Seventh International Symposium on Chinese Languages and Linguistics. Chai-yi: National Chung-Cheng University, pp. 141–155.
Lin, Dekang. 1998. Automatic Retrieval and Clustering of Similar Words. The 36th Annual Meeting of the Association for Computational Linguistics, pp. 768–74.
Liu, Qun and Su-Jian Li. 2002. The Word Similarity Calculation on <<HowNet>>. Proceedings of the 3rd Conference on Chinese lexicography, Taipei.
Martinez, David, Eneko Agirre, and Xinglong Wang. 2006. Word Relatives in Context for Word Sense Disambiguation. Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pp. 42–50.
Mason, Robert A. and Marcel Adam Just. 2007. Lexical ambiguity in sentence comprehension. Brain Research, 1146: 115–27.
McCarthy, Diana. 2009. Word Sense Disambiguation: An Overview. Language and Linguistics Compass. 3 (2): 537-558. Blackwell Published.

McRoy, Susan. 1992. Using multiple knowledge sources for word sense disambiguation. Computational Linguistics, 18(1): 1-30.
Mei, Jia-Ju, Yi-Ming Zhu, Yun-Qi Gao, and Hong-Xiang Yin. 1984. Tongyici Cilin. Shan ghai: Shang wu Press and Shang hai Dictionaries.
Miller, George A., R. Beckwith, Christiane Fellbaum, D. Gross, and K. Miller. 1993. Introduction to WordNet: An On-line Lexical Database. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence. Chambéry, France. 28, August–3, September.
Moldovan, Dan and Adrian Novischi. 2004. Word sense disambiguation of WordNet glosses. Computer Speech and Language, 18: 301–17.
Navigli, Roberto. 2009. Word Sense Disambiguation: A Survey. ACM Computing Surveys, 41(2), ACM Press, 2009, pp. 1-69.
Niles, Ian and Adam Pease. 2003. Linking Lexicons and Ontologies: Mapping WordNet to the Suggested Upper Merged Ontology. In Proceedings of the IEEE International Conference on Information and Knowledge Engineering. (IKE 2003), Las Vegas, Nevada.
Pedersen, Ted. 2000. A Simple Approach to Building Ensembles of Naïve Bayesian Classifiers for Word Sense Disambiguation. Proceeding of the First Annual Meeting of the North American Chapter for Computational Linguistics, pp. 63–9.
Peng, Jin, Xu Sun, Yunfang Wu, and Shiwen Yu. 2007. Word Clustering for Collocation-Based Word Sense Disambiguation. The Eighth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2007), LNCS 4394: 267–274.
Pitler, Emily, Annie Louis and Ani Nenkova. 2009. Automatic sense prediction for implicit discourse relations in text. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNCLP of the AFNLP, 683-691. Suntec, Singapore, August 2009.
Pustejovsky, James and Branimir Boguraev. 1996. Lexical Semantics: The Problem of Polysemy.Oxfprd University Press.
Pustejovsky, James. 1995. The generative lexicon. Cambridge: MIT Press.
Rohsenow, John. 1978. Perfect –le: aspect and relative tense in Mandarin Chinese. Proceedings of symposium on Chinese linguistics, 1977 Linguistic Institute of Linguistic Society of America, ed. by Robert L. Cheng, Ying-che Li, and Ting-chi Tang, 267-291. Taipei: Student.
Pustejovsky, James. 1991. The Syntax of Event Structure. Lexical and Conceptual Semantics:A Cognition Special Issue, ed. by Levin and Pinker, 47-80.Cambridge: Blackwell.

Ravin, Yael, and Claudia Leacock. 2000. Polysemy: An overview. Polysemy: Theoretical and computational approaches, ed. by Yael Ravin and Claudia Leacock, 1-29. New York: Oxford University Press.
Resnik, Philip and David Yarowsky. 2000. Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation. Natural Language Engineering 5 (3): 113–33. Printed in the United Kingdom. Cambridge University Press.
Resnik, Philip. 1999. Semantic Similarity in a Taxonomy: an Information-based Measure and Its Application to Problems of Ambiguity in Natural Language. Artificial Intelligence Research, 11: 95–130.
Small, Steven L., Garrison W. Cottrell and Michael K. Tanehaus (Ed.). 1988. Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology and Artificial Intelligence. San Mateo, CA: Morgan Kaufmann.
Stevenson, Mark. 2003. Word sense disambiguation: the case for combinations of knowledge sources. Stanford, California: Center for the Study of Language and Information.



Steven J. Burton, Richard R. Sudweeks, Paul F. Merrill and Bud Wood. 1991. How to Prepare Better Multiple-Choice Test Items: Guidelines for University Faculty. Brigham Young University Testing Service and The Department of Instructional Science.
Tabossi, Patrizia and Francesco Zardon. 1993 Processing Ambiguous Words in Context. Journal of Memory and Language, 32: 359-372.
Van, Petten Cyma and Luka Barbara. 2006. Neural localization of semantic context effects in electromagnetic and hemodynamic studies. Brain and Language, 97: 279-293.
Veronis, Jean and Nancy M. Ide. 1990. Word sense disambiguation with very large neural networks extracted from machine readable dictionaries. Proceedings of the 13th Conference on Computational Linguistics, 389–94. August 20–25, 1990, Helsinki, Finland.
Xue, Nianwen Jinying Chen, and Martha Palmer. 2006. Aligning Features with Sense Distinction Dimensions. Proceedings of the COLING/ACL Main Conference Poster Sessions, 921–928. Sydney, July 2006.
Yarowsky David. 2000. Hierarchical Decision Lists for Word Sense Disambiguation. Computers and the Humanities. 34: 179–186.

Yarowsky, David. 1993. One Sense Per Collocation. Proceedings of the Workshop on Human Language Technology. Princeton, New Jersey, pp. 266–271.
Weinreich, Uriel. 1964. Webster’s Third: a critique of its semantics. International Journal of American Linguistics. 30: 405-409.
Wu, Hsiao-Ching. 2003. A case study on the grammaticalization of GUO in Mandarin Chinese—Polysemy of the motion verb with respect to semantic changes. Language and Linguistics. 4:857–885.
Zempleni, Monika-Zita, Remco Renken, John C.J. Hoeks, Johannes M. Hoogduin, and Laurie A. Stowe. 2007. Semantic ambiguity processing in sentence context: Evidence from event-related fMRI. Neuroimage, 34: 1270–79.
Zhang, Yuntao, Ling Gong, and Yongcheng Wang. 2005. Chinese Word Sense Disambiguation Using HowNet. L. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3610, pp. 925–32.


 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE