The present study investigates three kinds of citation relationships, including direct citation (DC), bibliographic coupling (BC), and co-citation (CC), to understand the effects of considering semantic meanings when conducting citation analysis. Six models were included in this study. The classical model is the general way to implement citation analysis. The frequency model adjusts the strength of DC, BC, and CC by the number of citations. The lexical model revises the BC and CC strength based on the lexical similarity of citances. The distance model weights CC strength by considering the relative locations between citations. Another two models, Wordnet and BERT models, are based on the open-source tools and trained by the corpus provided by Awais (2011) to decide the citations' sentimental polarity and measure the semantic similarity between two citations. The sentimental polarity and semantic similarity were used to classify DC and weight BC/CC, respectively.
To evaluate these models, the present study compares their results at three levels: network, cluster, and node/relationship. At the network level, six indicators were used, including number of nodes, number of edges, network density, number of connected components, transitivity, and average clustering coefficient. At the cluster level, the clusters resulting from the clustering algorithm based on modularity were first examined by number of clusters, number of singletons, and cluster size. Then, Adjusted Rand Index was used to measure the similarity between the clustering results. This study further evaluated the quality of clustering results based on textual coherence and subject analysis. At node/relationship level, this research examined the correlation between a reference's sentimental types and its DC counts. Whether the citation strength will be higher if two works' topics are highly similar was also investigated.
The present study chose the 10,088 articles published in the fifteen journals of Library and Information Science (LIS) as the research subjects. The examination of network level showed that removing negative citations does not significantly affect the DC citation network. As to BC/CC citation network, weighting strength by the semantic meaning reveals different whole networks, especially the core networks.
Comparing the clustering results of DC core networks indicated that the results of the frequency, Wordnet, and BERT models were highly similar. Only that of the classical model shows a different pattern. As to the BC core networks, no noticeable differences existed between the results of these models except the lexical model. Examining the clustering results of CC core networks revealed the existence of evident divergence. Textual coherence and subject analysis supports that the clustering results of CC core network based on the Wordnet/BERT models have higher textual coherence. The subjects identified from the clustering results of the two models better reflected the development of LIS in this period.
The examination at node/relationship level revealed that the DC is probably higher if the source article has been cited positively. The tendency will be more evident when using multiple semantic models or considering the time effects. However, applying semantic models in weighting BC and CC did not improve their results.
Overall, the effect of the semantic models proposed in this study varies by the type of citation relationship and at which level researchers analyze the result. At the network level, removing negative citations affects slightly. It shows that the current semantic tools may have difficult in identifying negative citations or that the effects of negative citations are not as critical as the arguments of the previous studies. As to BC/CC, however, applying semantic models does significantly affect. The examination at the cluster level indicates that applying semantic models in CC improves its textual coherence and better reflects the evolution in the domain. Yet, no similar effect is found when using semantic models in DC and BC. Additionally, classifying citations by their sentimental polarity helps identify the influence of the cited works. At the node/relationship level, however, adjusting BC and CC based on the semantic similarity may not improve the result.