:::

詳目顯示

回上一頁
題名:一種基於語義組塊特徵的改進Cosine文本相似度計算方法
書刊名:數據分析與知識發現
作者:白如江冷伏海廖君華
出版日期:2017
卷期:2017(6)
頁次:56-64
主題關鍵詞:文本相似度語義組塊向量空間模型本體Text similaritySemantic chunksVector space modelOntology
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:2
【目的】利用文本語義組塊特征提升Cosine文本相似度計算性能。【方法】獲取NSF資助的關于碳納米管研究領域的項目數據,進行詞干還原、詞性標注等預處理;利用條件隨機場模型實現文本內容的語義組塊標注;在此基礎上實現基于語義組塊特征的改進Cosine文本相似度計算,并與未標注的數據進行相似度計算比較,分析實驗結果。【結果】實驗證明基于語義組塊特征的改進Cosine相似度計算結果比原始文本Cosine相似度計算結果相似度均有不同程度的提升,在實驗數據中最高的相似度提升了26%。【局限】依賴于語義組塊標注性能。【結論】本文方法能有效提升文本間語義相似度,降低向量空間模型維度,提高計算效率,并且具有良好的泛化能力和魯棒性。
[Objective] This paper aims to improve the performance of Cosine text similarity computing method with the help of text semantic chunk feature. [Methods] First, we retrieved the project data of carbon nanotubes studies, which were pre-processed with stemming and POS techniques. Then, we identified the semantic chunk of text contents with the conditional random field model. Third, we calculated the similarity of texts based on semantic chunk feature. Finally, we compared our results with those generated by the unlabeled data. [Results] The proposed method improved the performance of Cosine similarity calculation by up to 26%. [Limitations] Our study relies on semantic chunks to annotate the computing performance. [Conclusions] The proposed method could effectively identify similar texts, and reduce the dimensions of vector space model, which improves the computing efficiency. The new method is robust and could be transferred to other fields.
期刊論文
1.Landauer, T. K.、Foltz, P. W.、Laham, D.(1998)。An Introduction to Latent Semantic Analysis。Discourse Processes,25(2/3),259-284。  new window
2.Salton, G.、Wong, A.、Yang, C. S.(1975)。A Vector Space Model for Automatic Indexing。Communications of the ACM,18(11),613-620。  new window
3.Islam, A.、Inkpen, D.(2008)。Semantic text similarity using corpus-based word similarity and string similarity。ACM Transactions on Knowledge Discovery from Data,2(2),1-25。  new window
4.Tversky, Amos(1977)。Features of similarity。Psychological Review,84(4),327-352。  new window
5.Lund, K.、Burgess, C.(1996)。Producing high-dimensional semantic spaces from lexical co-occurrence。Behavior Research Methods, Instruments, & Computers,28(2),203-208。  new window
6.Wang, J. Z.、Du, Z.、Payattakool, R.、Yu, P. S.、Chen, C. F.(2007)。A new method to measure the semantic similarity of GO terms。Bioinformatics,23(10),1274-1281。  new window
7.劉宏哲、須德(2012)。基於本體的語義相似度和相關度計算研究綜述。計算機科學,39(2),8-13。  延伸查詢new window
8.黃承慧、印鑒、侯昉(2011)。一種結合詞項語義信息和TF-IDF方法的文本相似度量方法。計算機學報,34(5),856-864。  延伸查詢new window
9.Ponzetto, P. S.、Strube, M.(2007)。Knowledge Derived from Wikipedia for Computing Semantic Relatedness。Journal of Artificial Intelligence Research,30(1),181-212。  new window
10.Sébastien, H.、David, S.、Sylvie, R.(2014)。A Framework for Unifying Ontology-based Semantic Similarity Measures: A Study in the Biomedical Domain。Journal of Biomedical Informatics,48(2),38-53。  new window
11.Rada, R.、Mili, H.、Bicknell, E.(1989)。Development and Application of a Metric on Semantic Nets。IEEE Transactions on Systems, Man, and Cybernetics,19(1),17-30。  new window
12.Othman, R. M.、Deris, S.、Illias, R. M.(2008)。A Genetic Similarity Algorithm for Searching the Gene Ontology Terms and Annotating Anonymous Protein Sequences。Journal of Biomed Information,41(1),65-81。  new window
13.李文清、孫新、張常有(2012)。一種本體概念的語義相似度計算方法。自動化學報,38(2),229-235。  延伸查詢new window
會議論文
1.Metzler, D.、Bernstein, Y.、Croft, W. B.(2005)。Similarity Measures for Tracking Information Flow。The 14th ACM International Conference on Information and Knowledge Management,517-524。  new window
2.Banerjee, S.、Pedersen, T.(2003)。Extended Gloss Overlaps as a Measure of Semantic Relatedness。The 17th International Joint Conference on Artificial Intelligence。New York:ACM Press。805-810。  new window
3.Pekar, V.、Staab, S.(2002)。Taxonomy Learning: Factoring the Structure of a Taxonomy into a Semantic Classification Decision。The 19th International Conference on Computational Linguistics, Taipei, Taiwan, China。New York:ACM Press。1-7。  new window
4.Wu, Zhibiao、Palmer, Martha(1994)。Verb Semantics and Lexical Selection。The 32nd Annual Meeting on Association for Computational Linguistics。New York:ACM Press。133-138。  new window
5.Allan, J.、Wade, C.、Bolivar, A.(2003)。Retrieval and Novelty Detection at the Sentence Level。The 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval。Toronto, Ontario。  new window
圖書
1.劉宏哲(2012)。文本語義相似度計算方法研究。北京:北京交通大學。  延伸查詢new window
2.白如江(2015)。基於語義計算的科學研究前沿識別研究。北京:中國科學院大學。  延伸查詢new window
3.孫建軍、成穎(2004)。信息檢索技術。北京:科學出版社。  延伸查詢new window
4.Couto, F. M.、Silva, M.、Coutinho, P. M.(2003)。Implementation of a Functional Semantic Similarity Measure Between Gene-Products。Lisbon:University of Lisbon。  new window
其他
1.Jacob, B.,Benjamin, C.。Calculating the Jaccard Similarity Coefficient with Map Reduce for Entity Pairs in Wikipedia,http://www.infosci.cornell.edu/weblab/papers/Bank2008.pdf。  new window
圖書論文
1.Leacock, C.、Chodorow, M.(1998)。Combining local context and WordNet similarity for word sense identification。WordNet: An electronic lexical database。Cambridge, Massachusetts:MIT Press。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE