:::

詳目顯示

回上一頁
題名:基於開放信息源的實體挖掘方法研究
書刊名:情報科學
作者:王莉軍李旭婕劉志輝翟云
出版日期:2019
卷期:2019(8)
頁次:139-144
主題關鍵詞:開放信息源知識挖掘實體抽取詞向量條件隨機場Open data sourceKnowledge miningEntity extractionWord embeddingConditional random field
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:1
【目的/意義】互聯網上的信息資源日益豐富,開放信息源成為一些領域知識獲取的重要渠道。本文以中醫領域為例,為向本體和知識圖譜的構建提供數據,提出了一種基于開放信息源的知識挖掘方法。【方法/過程】在缺乏領域訓練語料的情況下,先獲取一部分語料,使用規則模板、詞向量結合詞分類的方法獲取部分領域實體詞,通過回標文本語料得到訓練集,再使用條件隨機場進行實體的識別和抽取。【結果/結論】本文提出的規則結合SVMCRF實體抽取模型具有較高的有效性和通用性。在所使用的中醫實體中,方劑和癥型實體的抽取準確率仍待進一步提升。
【Purpose/significance】With the increasing abundance of information resources on the Internet, open data sources have become an important channel for knowledge acquisition in some fields. Taking the field of traditional Chinese medicine as an example, in order to provide data for the construction of TCM ontology and knowledge atlas, this paper proposes an knowledge mining method based on open data sources.【Method/process】In the absence of field training corpus, a part of corpus is acquired first, and a part of feild entity words are acquired by using rule template and word vector combined with word classification method, training set is obtained through tag again the text corpus. Conditional random fields are used to identify and extract entities.【Result/conclusion】It shows that the rules combined with SVM-CRF entity extraction model proposed in this paper has high validity and generality. In the Chinese medicine entities used in this paper, the extraction accuracy of prescriptions and symptomatic entities still need to be further improved.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE