應用語意分析於衡量文獻引用關係之探討__臺灣人文及社會科學引文索引資料庫

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

論文基本資料
摘要
外文摘要

題名：	應用語意分析於衡量文獻引用關係之探討
作者：	蕭宗銘
作者(外文)：	Tsung-Ming Hsiao
校院名稱：	國立臺灣大學
系所名稱：	圖書資訊學研究所
指導教授：	陳光華
學位類別：	博士
出版日期：	2022
主題關鍵詞：	語意分析；引用分析；直接引用；書目耦合；共被引；Semantic Analysis；Citation Analysis；Direct Citation；Bibliographic Coupling；Co-Citation
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0) 排除自我引用:0 共同引用:0 點閱:2

本研究分析三種被廣泛應用的引用關係，包含直接引用、書目耦合、共被引。以瞭解於不同模型衡量引用關係後，其所得分析結果之異同。研究分析的模型，除實務應用之經典模型與本研究設計之兩種語意模型外，另納入相關研究提出之頻率模型、距離模型、辭彙模型，總計六種。語意模型的部分，本研究使用基於Wordnet與BERT設計之自然語言處理開源工具，以Awais（2011）資料集進行訓練後，判斷引用句的情感傾向與語意相似度。一方面，經由判斷引用句的情感傾向，分類直接引用關係；另一方面，則衡量引用句間的語意相似度，修正書目耦合與共被引的關係強度。頻率模型部分，則依文內引用頻率，調整直接引用、書目耦合、共被引之引用強度。辭彙模型上，則依引用句所用辭彙的相似程度，調整書目耦合、共被引之引用強度。距離模型則依文內引用的對應位置，調整共被引強度。
對於各模型衡量之結果，則比較其網路結構、分群結果、關鍵節點與強關係之情形，以確認在引用分析結果上的表現情形。本研究將各模型形成之引用網路，區別為整體網路與核心網路。對這兩類網路，比較各模型的節點數、關係數、網路密度、連結元件數（number of connected components）、傳導性（transitivity）、平均群聚係數（average clustering coefficience）上的差異。對分群結果的比較上，本研究以Modularity分群演算法對各模型核心網路之節點進行分群。於初步檢視分群數、孤立節點（singleton）數、群規模後，再以Adjusted Rand Index確認分群結果間的相似程度。接著，則以文字群聚度（textual coherence），量化衡量分群結果的表現。並以各群文獻標題中之高頻詞彙，確認各群的主題後，比較各模型間主題分析結果。最後，於節點與關係的部分，則檢視各模型中，來源文章在直接引用網路中的被引用的傾向與次數，以及書目耦合、共被引網路中的強關係書目組。經由前述方式分析不同模型在衡量各類引用關係的表現，本研究對在目前引用分析中，應用語意分析技術之優缺，加以綜整分析。
基於上述研究設計，本研究選定圖書資訊學領域之十五種期刊，以其中所刊登之10,088篇文章做為研究對象。由網路層級的分析結果來看，在直接引用關係上，判斷情感傾向並移除負面引用之後，對於整體網路與核心網路的結構影響並不明顯。而在書目耦合網路中，對關係強度進行調整後，核心網路的結構上有較大差異，但在整體網路結構上則無明顯變化。於共被引網路時，則不論在整體網路或核心網路上，各網路指標均指出有明顯差異。
由核心網路分群結果的相似程度來看，直接引用的部分，僅有經典模型的結果明顯不同，而頻率模型、Wordnet模型、BERT模型三者的分群結果則十分相似。書目耦合的部分，各模型的結果略有差距，但除了詞彙模型的較為明顯，其它模型間的差距並不明顯。在共被引的部分，各指標則指出，多數模型相互存在明顯差異。而文字群聚度、主題分析結果則顯示，語意模型應用在共被引時，文字群聚度較高，則具發掘研究領域新議題的能力。但當應用於直接引用、書目耦合時，除了沒有明顯改善文字群聚度外，主題分析的結果亦十分類似。
在節點與關係層次上，當來源文獻有被正面引用過時，其被直接引用數更可能高於未被正面引用過的文獻。此一傾向，在多個語意模型均判定此來源文獻有被正面引用或考慮進累積引用所需時間之後，會更為明顯。但在書目耦合與共被引關係的部分，則未觀察到使用語意分析的模型會提供更為優秀的表現。
綜觀而言，目前設計之語意分析模型的影響，依引用關係類型、分析層次的差異，有著不同影響。以網路層次而言，排除負面引用對於網路結果的影響甚微，這可能代表目前語意分析模型在負面引用偵測上仍力有未逮，或負面引用影響不如先前學者預期的明顯。而於書目耦合、共被引上，則對於核心網路結構均產生明顯影響。分群結果的比較，則顯示目前語意分析模型僅應用於共被引時有得到較明顯的改善。除了在文字群聚度上有較佳表現外，主題分析的結果也較能反映出領域變動情形。但應用於直接引用、書目耦合上時，則未有明顯改善。而由節點與關係層次的分析來看，應用語意分析模型區別引用句的情感傾向，有助於判斷被引用文獻的影響力。但使用語意相似度修正書目耦合與共被引時，則未觀察有進一步的改善。

以文找文

The present study investigates three kinds of citation relationships, including direct citation (DC), bibliographic coupling (BC), and co-citation (CC), to understand the effects of considering semantic meanings when conducting citation analysis. Six models were included in this study. The classical model is the general way to implement citation analysis. The frequency model adjusts the strength of DC, BC, and CC by the number of citations. The lexical model revises the BC and CC strength based on the lexical similarity of citances. The distance model weights CC strength by considering the relative locations between citations. Another two models, Wordnet and BERT models, are based on the open-source tools and trained by the corpus provided by Awais (2011) to decide the citations' sentimental polarity and measure the semantic similarity between two citations. The sentimental polarity and semantic similarity were used to classify DC and weight BC/CC, respectively.
To evaluate these models, the present study compares their results at three levels: network, cluster, and node/relationship. At the network level, six indicators were used, including number of nodes, number of edges, network density, number of connected components, transitivity, and average clustering coefficient. At the cluster level, the clusters resulting from the clustering algorithm based on modularity were first examined by number of clusters, number of singletons, and cluster size. Then, Adjusted Rand Index was used to measure the similarity between the clustering results. This study further evaluated the quality of clustering results based on textual coherence and subject analysis. At node/relationship level, this research examined the correlation between a reference's sentimental types and its DC counts. Whether the citation strength will be higher if two works' topics are highly similar was also investigated.
The present study chose the 10,088 articles published in the fifteen journals of Library and Information Science (LIS) as the research subjects. The examination of network level showed that removing negative citations does not significantly affect the DC citation network. As to BC/CC citation network, weighting strength by the semantic meaning reveals different whole networks, especially the core networks.
Comparing the clustering results of DC core networks indicated that the results of the frequency, Wordnet, and BERT models were highly similar. Only that of the classical model shows a different pattern. As to the BC core networks, no noticeable differences existed between the results of these models except the lexical model. Examining the clustering results of CC core networks revealed the existence of evident divergence. Textual coherence and subject analysis supports that the clustering results of CC core network based on the Wordnet/BERT models have higher textual coherence. The subjects identified from the clustering results of the two models better reflected the development of LIS in this period.
The examination at node/relationship level revealed that the DC is probably higher if the source article has been cited positively. The tendency will be more evident when using multiple semantic models or considering the time effects. However, applying semantic models in weighting BC and CC did not improve their results.
Overall, the effect of the semantic models proposed in this study varies by the type of citation relationship and at which level researchers analyze the result. At the network level, removing negative citations affects slightly. It shows that the current semantic tools may have difficult in identifying negative citations or that the effects of negative citations are not as critical as the arguments of the previous studies. As to BC/CC, however, applying semantic models does significantly affect. The examination at the cluster level indicates that applying semantic models in CC improves its textual coherence and better reflects the evolution in the domain. Yet, no similar effect is found when using semantic models in DC and BC. Additionally, classifying citations by their sentimental polarity helps identify the influence of the cited works. At the node/relationship level, however, adjusting BC and CC based on the semantic similarity may not improve the result.

以文找文

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	運用文獻計量分析創造力教育研究的演進情形與發展趨勢
2.	Differences between Bibliographic Coupling and Co-citation at the Article Level in Library and Information Science Publications
3.	從研究方法角度探討研究前沿

1.	創造力教育研究演進情形與發展趨勢之研究
2.	書目對與共被引文獻之知識擴散路徑：主路徑分析
3.	以書目耦合及共被引探討不同引用區間之研究前沿：以OLED領域為例
4.	以直接引用、書目耦合及共同作者探討圖書資訊學跨學科之變遷

無相關書籍

無相關著作

無相關點閱

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫