:::

詳目顯示

回上一頁
題名:文本相似度計算方法研究綜述
書刊名:數據分析與知識發現
作者:陳二靜姜恩波
出版日期:2017
卷期:2017(6)
頁次:1-11
主題關鍵詞:文本相似度語義相似度本體詞袋模型神經網絡Text similaritySemantic similarityOntologyBag of words modelNeural network
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:4
【目的】分析文本相似度計算方法,了解該領域的發展態勢。【文獻范圍】在CNKI和Web of Science中分別以檢索式"篇名:文本相似度OR篇名:詞匯相似度OR篇名:語義相似度"和"TI:‘text similarity’or‘semantic similarity’or‘lexical similarity’"并限定文獻類型進行檢索,最終得到69篇重點文獻。【方法】對文本相似度計算方法進行系統梳理,分析重點方法的基本思想、特點并總結未來發展方向。【結果】形成了較為全面的分類描述體系,文本相似度計算方法可分為4類:基于字符串的方法、基于語料庫的方法、基于世界知識的方法和其他方法。其中,基于神經網絡和基于世界知識的方法以及針對跨領域文本的相似度計算將成為該領域的發展趨勢。【局限】僅將不同方法本身作為探討的核心,未進一步分析方法的應用情況。【結論】有助于全面把握和深入了解文本相似度計算方法的研究現狀和未來趨勢。
[Objective] This paper analyzes the popular text similarity measures and discusses their latest developments.[Coverage] We retrieved 69 key articles from CNKI and Web of Science databases by searching "TI: 'text similarity' or 'semantic similarity' or 'lexical similarity' "in Chinese and English respectively. [Methods] We systematically reviewed the text similarity measures focusing on their basic concepts, characteristics and future directions. [Results]There were four types of text similarity measures: String-based, Corpus-based, Knowledge-based and others. Measures based on the neural network, Knowledge-based measures and inter-disciplinary measures could be the future research directions. [Limitations] We did not discuss the applications of those measures. [Conclusions] This paper is a comprehensive review of text similarity measure research.
期刊論文
1.Salton, G.、Wong, A.、Yang, C. S.(1975)。A Vector Space Model for Automatic Indexing。Communications of the ACM,18(11),613-620。  new window
2.Islam, A.、Inkpen, D.(2008)。Semantic text similarity using corpus-based word similarity and string similarity。ACM Transactions on Knowledge Discovery from Data,2(2),1-25。  new window
3.Bengio, Y.、Ducharme, R.、Vincent, P.、Jauvin, C.(2003)。A neural probabilistic language model。Journal of Machine Learning Research,3(6),1137-1155。  new window
4.Landauer, T. K.、Dumais, S. T.(1997)。A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge。Psychological review,104(2),211-240。  new window
5.Tversky, Amos(1977)。Features of similarity。Psychological Review,84(4),327-352。  new window
6.Cilibrasi, R. L.、Vitanyi, P. M. B.(2007)。The Google Similarity Distance。IEEE Transactions on Knowledge and Data Engineering,19(3),370-383。  new window
7.Resnik, Philip(1999)。Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language。Journal of Artificial Intelligence Research,11(1),95-130。  new window
8.Li, Yu-Hua、Bandar, Zuhair A.、McLean, David(2003)。An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources。IEEE Transactions on Knowledge and Data Engineering,15(4),871-882。  new window
9.Rada, R.、Mili, H.、Bicknell, E.(1989)。Development and Application of a Metric on Semantic Nets。IEEE Transactions on Systems, Man, and Cybernetics,19(1),17-30。  new window
10.Gomaa, W. H.、Fahmy, A. A.(2013)。A Survey of Text Similarity Approaches。International Journal of Computer Applications,68(13),13-18。  new window
11.魏韡、向陽、陳千(2010)。計算術語間語義相似度的混合方法。計算機應用,30(6),1668-1670。  延伸查詢new window
12.王小林、肖慧、邰偉鵬(2015)。基於Hadoop平臺的文本相似度檢測系統的研究。計算機技術與發展,25(8),90-93。  延伸查詢new window
13.Atoum, I.、Otoom, A.(2016)。Efficient Hybrid Semantic Text Similarity Using Wordnet and a Corpus。International Journal of Advanced Computer Science and Applications,7(9),124-130。  new window
14.Pradhan, N.、Gyanchandani, M.、Wadhvani, R.(2015)。A Review on Text Similarity Technique Used in IR and Its Application。International Journal of Computer Applications,120(9),29-34。  new window
15.秦春秀、趙捧未、劉懷亮(2007)。詞語相似度計算研究。情報理論與實踐,30(1),105-108。  延伸查詢new window
16.李慧(2015)。詞語相似度算法研究綜述。現代情報,35(4),172-177。  延伸查詢new window
17.韓普、王東波、王子敏(2016)。詞彙相似度計算和相似詞挖掘研究進展。情報科學,34(9),161-165。  延伸查詢new window
18.孫海霞、錢慶、成穎(2010)。基於本體的語義相似度計算方法研究綜述。現代圖書情報技術,26(1),51-56。  延伸查詢new window
19.劉宏哲、須德(2012)。基於本體的語義相似度和相關度計算研究綜述。計算機科學,39(2),8-13。  延伸查詢new window
20.劉群、李素建(2002)。基於《知網》的詞彙語義相似度計算。中文計算語言學,7(2),59-76。  延伸查詢new window
21.田久樂、趙蔚(2010)。基於同義詞詞林的詞語相似度計算方法。吉林大學學報.信息科學版,28(6),602-608。  延伸查詢new window
22.張煥炯、王國勝、鐘義信(2001)。基於漢明距離的文本相似度計算。計算機工程與應用,37(19),21-22。  延伸查詢new window
23.Dice, L. R.(1944)。Measures of the Amount of Ecologic Association Between Species。Ecology,26(3),297-302。  new window
24.郭慶琳、李艶梅、唐琦(2008)。基於VSM的文本相似度計算的研究。計算機應用研究,25(11),3256-3258。  延伸查詢new window
25.李連、朱愛紅、蘇濤(2012)。一種改進的基於向量空間文本相似度算法的研究與實現。計算機應用與軟件,29(2),282-284。  延伸查詢new window
26.王振振、何明、杜永萍(2013)。基於LDA主題模型的文本相似度計算。計算機科學,40(12),229-232。  延伸查詢new window
27.熊大平、王健、林鴻飛(2012)。一種基於LDA的社區問答問句相似度計算方法。中文信息學報,26(5),40-45。  延伸查詢new window
28.張超、陳利、李瓊(2016)。一種PST_LDA中文文本相似度計算方法。計算機應用研究,33(2),375-377+383。  延伸查詢new window
29.劉勝久、李天瑞、賈真(2014)。基於搜索引擎的相似度研究與應用。計算機科學,41(4),211-214。  延伸查詢new window
30.陳海燕(2015)。基於搜索引擎的詞彙語義相似度計算方法。計算機科學,42(1),261-267。  延伸查詢new window
31.Batet, M.、Sánchez, D.、Valls, A.(2011)。An Ontology-based Measure to Compute Semantic Similarity in Biomedicine。Journal of Biomedical Informatics,44(1),118-125。  new window
32.Lord, P. W.、Stevens, R. D.、Brass, A.(2003)。Investigating Semantic Similarity Measures across the Gene Ontology: The Relationship Between Sequence and Annotation。Bioinformatics,19(10),1275-1283。  new window
33.邊振興(2011)。WordNet中概念語義相似度IC參數模型研究。計算機工程與應用,47(19),128-131。  延伸查詢new window
34.葛斌、李芳芳、郭絲路(2010)。基於知網的詞彙語義相似度計算方法研究。計算機應用研究,27(9),3329-3333。  延伸查詢new window
35.王艶娜、周子力、何艶(2011)。WordNet中基於IC的概念語義相似度算法。計算機工程,37(22),42-44。  延伸查詢new window
36.李文清、孫新、張常有(2012)。一種本體概念的語義相似度計算方法。自動化學報,38(2),229-235。  延伸查詢new window
37.孫琛琛、申德榮、單菁(2012)。WSR:一種基於維基百科結構信息的語義關聯度計算算法。計算機學報,35(11),2361-2370。  延伸查詢new window
38.盛志超、陶曉鵬(2011)。基於維基百科的語義相似度計算方法。計算機工程,37(7),193-195。  延伸查詢new window
39.彭麗針、吳揚揚(2016)。基於維基百科社區挖掘的詞語語義相似度計算。計算機科學,43(4),45-49。  延伸查詢new window
40.詹志建、梁麗娜、楊小平(2013)。基於百度百科的詞語相似度計算。計算機科學,40(6),199-202。  延伸查詢new window
41.尹坤、尹紅風、楊燕(2014)。基於SimRank的百度百科詞條語義相似度計算。山東大學學報.工學版,44(3),29-35。  延伸查詢new window
42.李彬、劉挺、秦兵(2003)。基於語義依存的漢語句子相似度計算。計算機應用研究,20(12),15-17。  延伸查詢new window
43.李茹、王智強、李雙紅(2013)。基於框架語義分析的漢語句子相似度計算。計算機研究與發展,50(8),1728-1736。  延伸查詢new window
44.Blanco, E.、Moldovan, D.(2015)。A Semantic Logic-Based Approach to Determine Textual Similarity。IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(4),683-693。  new window
45.Tasi, C. S.、Huang, Y. M.、Liu, C. H.(2012)。Applying VSM and LCS to Develop an Integrated Text Retrieval Mechanism。Expert Systems with Applications,39(4),3974-3982。  new window
46.劉萍、陳燁(2012)。詞彙相似度研究進展綜述。現代圖書情報技術,2012(7/8),82-89。  延伸查詢new window
47.Blei, David M.、Ng, Andrew Y.、Jordan, Michael I.(2003)。Latent Dirichlet allocation。Journal of Machine Learning Research,3(4/5),993-1022。  new window
會議論文
1.Hofmann, Thomas(1999)。Probabilistic latent semantic analysis。The 15th Conference on Uncertainty in Artificial Intelligence。Morgan Kaufmann。289-296。  new window
2.Lin, D.(1998)。An information-theoretic definition of similarity。The 15th International Conference on Machine Learning。Madison, Wisconsin。296-304。  new window
3.Hinton, G. E.(1986)。Learning distributed representations of concepts。The Eighth Annual Conference of the Cognitive Science Society。Hillsdale, NJ:Erlbaum。1-12。  new window
4.Gabrilovich, E.、Markovitch, S.(2007)。Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis。The 20th International Joint Conference on Artificial Intelligence,1606-1611。  new window
5.Sahami, M.、Heilman, T. D.(2006)。A web-based kernel function for measuring the similarity of short text snippets。The 15th international conference on World Wide Web,377-386。  new window
6.Kusner, M. J.、Sun, Y.、Kolkin, N. I.(2015)。From Word Embeddings to Document Distances。The 32nd International Conference on Machine Learning,957-966。  new window
7.Liu, G.、Wang, R.、Buckley, J.(2011)。A WordNet-based Semantic Similarity Measure Enhanced by Internet-based Knowledge。The International Conference on Software Engineering & Knowledge Engineering。  new window
8.Mikolov, Tomas、Sutskever, I.、Chen, K.、Corrado, G.、Dean, J.(2013)。Distributed Representations of Words and Phrases and Their Compositionality。The 26th International Conference on Neural Information Processing Systems,3111-3119。  new window
9.Kenter, T.、Rijke, M. D.(2015)。Short Text Similarity with Word Embeddings。The 24th ACM International on Conference on Information and Knowledge Management,1411-1420。  new window
10.Huang, G.、Guo, C.、Kusner, M. J.(2016)。Supervised Word Mover's Distance。The 30th Conference on Neural Information Processing Systems。  new window
11.Wu, Z.、Palmer, M.(1994)。Verb Semantic and Lexical Selection。The 32nd Annual Meeting of the Associations for Computational Linguistics,133-138。  new window
12.Lin, D.(1993)。Principle-based Parsing without Overgeneration。The 31st Annual Meeting of the Association for Computational Linguistics。  new window
13.Strube, M.、Ponzetto, S. P.(2006)。WikiRelate! Computing Semantic Relatedness Using Wikipedia。The 21st National Conference on Artificial Intelligence。  new window
14.Milne, D.、Witten, I. H.(2008)。An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links。The 23rd Association for the Advancement of Artificial Intelligence。  new window
15.Lizorkin, D.、Medelyan, O.、Grineva, M.(2009)。Analysis of Community Structure in Wikipedia。The 18th International Conference on World Wide Web,1221-1222。  new window
16.穗志方、俞士汶(1998)。基於骨架依存樹的語句相似度計算模型。1998中文信息處理國際會議。  延伸查詢new window
17.Jiang, J. J.、Conrath, D. W.(1997)。Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy。The International Conference on Research in Computational Linguistics。  new window
18.Pennington, Jeffrey、Socher, Richard、Manning, Christopher D.(2014)。GloVe: Global Vectors for Word Representation。The 2014 Conference on Empirical Methods in Natural Language Processing,(會議日期: 2014/10/25-10/29),1532-1543。  new window
其他
1.董振東,董強。知網,http://www.keenage.com/zhiwang/c_zhiwang.html。  延伸查詢new window
2.Hliaoutakis, A.。Semantic Similarity Measures in MeSH Ontology and Their Application to Information Retrieval on Medline,http://www.intelligence.tuc.gr/publications/Hliautakis.pdf。  new window
3.Richardson, R.,Smeaton, A. F.,Murphy, J.。Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words,http://pssd.computing.dcu.ie/wpapers/1994/1294.pdf。  new window
圖書論文
1.Harris, Z. S.(1970)。Distributional Structure。Papers in Structural and Transformational Linguistics。Dordrecht:Springer。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top