:::

詳目顯示

回上一頁
題名:影響支持向量機模型語步自動識別效果的因素研究
書刊名:數據分析與知識發現
作者:丁良萍張智雄劉歡
出版日期:2019
卷期:2019(11)
頁次:16-23
主題關鍵詞:語步識別支持向量機結構化摘要Move recognitionSupport vector machineStructured abstracts
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:1
【目的】探討在基于支持向量機(SVM)模型的科技論文摘要自動語步識別過程中,訓練樣本的規模、N元詞(N-gram)的N取值、停用詞以及詞頻加權方式等特征對識別效果的影響。【方法】從72萬余篇科技論文結構化摘要中,抽取出總計110多萬條已標注好的語步為實驗數據,構建SVM模型進行語步識別實驗。采用控制變量方法,基于單一變量原則,通過改變訓練樣本量、N-gram的N取值、是否去除停用詞、詞頻加權方式,對比分析這些特征變化對語步識別效果的影響。【結果】訓練樣本數量為60萬條語步、N元詞的N取值為[1,2]、不去除停用詞、詞頻加權方式采用TF-IDF時模型識別效果最好,為93.50%。【局限】主要以筆者收集的結構化論文摘要為訓練和測試語料,未與其他人的結果比較。【結論】訓練樣本規模以及一些精細的特征對傳統機器學習模型的效果有重要影響,使用者在實踐中需要根據具體情況進行精細的特征選取。
[Objective] The paper explores the influence of sample size, the N value of N-gram, stop words, and weighting methods of word frequency on the automatic recognition of rhetorical moves in scientific paper, aiming to improve the abstracting method based on support vector machine(SVM) model. [Methods] We retrieved a total of 1.1 million labeled moves from 720,000 structured abstracts of scientific papers as experimental data, and constructed SVM model for move recognition. Based on the principle of single variable, we used control variable method by changing the sample size, the N value, removal of stop words, and word frequency weighting methods to analyze their impacts on the model’s performance. [Results] We found that the model yielded the best result with a sample size of 600,000 abstracts, the N value [1,2], keeping stop words, and using TF-IDF to weight word frequency. [Limitations] We only examined the model with structured abstracts, which might not be comparable with other studies. [Conclusions] The sample size and some fine features have significant impacts on the performance of traditional machine learning models.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關博士論文
 
無相關書籍
 
無相關著作
 
無相關點閱
 
QR Code
QRCODE