:::

詳目顯示

回上一頁
題名:基於CapsNet的中文文本分類研究
書刊名:數據分析與知識發現
作者:馮國明張曉冬劉素輝
出版日期:2018
卷期:2018(12)
頁次:68-76
主題關鍵詞:文本分類深度學習文本表示CapsNetTextCNNText categorizationDeep learningText representation
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:2
【目的】解決長文本的表示問題并將CapsNet應用于中文文本分類任務中,提高分類精度。【方法】針對長文本提出LDA矩陣和詞向量體表示法,并結合CapsNet提出基于CapsNet的中文文本分類模型。以搜狗新聞語料與復旦大學文本分類語料作為實驗數據,將TextCNN、DNN等模型作為對比對象進行文本分類實驗與分析。【結果】CapsNet模型在中文文本分類的各評價指標上均優于其他模型,在5類短文本、長文本分類中準確率分別達89.6%、96.9%,且收斂速度比CNN模型快近兩倍。【局限】模型計算時間復雜度高,實驗語料規模受限。【結論】本文方法和CapsNet模型在中文文本分類中相對于已有方法有更好的準確率、收斂速度和魯棒性。
[Objective] This study tries to address the issues facing long text representation and use CapsNet to improve the accuracy of Chinese text classification. [Methods] First, we proposed a LDA matrix and word vector to represent the long texts. Then, we constructed a Chinese classification model based on Caps Net. Third, we examined the proposed model with Sogou news corpus and the text classification corpus of Fudan University. Finally, we compared our results with those of the classic models (e.g., TextCNN, DNN and so on). [Results] The performance of CapsNet model was better than other models. The classification accuracy in five categories of short and long texts reached 89.6% and 96.9% respectively. The convergence speed of the proposed model was almost two times faster than that of the CNN model. [Limitations] The computational complexity of the model is high, which limits the size of testing corpus.[Conclusions] The proposed Chinese text representation method and the modified CapsNet model have better accuracy, convergence speed and robustness than the existing ones.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE