:::

詳目顯示

回上一頁
題名:社會科學研究中的文字探勘應用:以文意為基礎的文件分類及其問題
書刊名:人文及社會科學集刊
作者:陳世榮 引用關係
作者(外文):Chen, Roger S.
出版日期:2015
卷期:27:4
頁次:頁683-718
主題關鍵詞:文字探勘文意區別文件分類機器學習共詞網絡分析Text miningMeaning differentiationDocument classificationMachine learningCo-word network analysis
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(16) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:16
  • 共同引用共同引用:234
  • 點閱點閱:135
隨著電子典藏技術的精進,文字探勘技術逐漸受到重視,本文以社會科學研究在文意區別上的需求,評估監督式機器學習對非結構、複雜文本的分類效果,並就所見問題提出分析與建議。本文從文字探勘與內容分析文意區別上的差異與共通性出發,繼而以新聞報導為分析資料,針就特定文件意向,遵循一般文字探勘程序,以支持向量機與簡易貝式分類器執行文件分類評估。分析結果指出,文字探勘對於複雜文意的判讀效果值得肯定,但經由共詞網絡分析也發現,文件的編撰風格將影響文件分類的效果。建議研究者在資料處理初期,應反覆評估研究目的、資料特性與分類器模型間的契合度。
Along with the growing development of electronic information storage, text mining has increasingly gained attention from scholars and practitioners across various disciplines. In response to the need for meaning differentiation in social studies, the study aims to evaluate supervised machine learning classifiers in terms of the performance of document classification. Setting out from the comparison between traditional content analysis and text mining, the evaluation follows a normal procedure of text mining and applies Support Vector Machine and Naïve Bayes classifiers on non-structural, complex social texts extracted from news media. The outcomes of the analysis validate that text mining manages classification well for documents with complex meaning. However, a further co-word network analysis in the study finds that the editing style of data may affect classifiers' performance. It is suggested that, in the early stage of data processing, greater care must be given to the fit between research problems, editing styles, and classifiers.
期刊論文
1.曾元顯(20020600)。文件主題自動分類成效因素探討。中國圖書館學會會報,68,62-83。new window  延伸查詢new window
2.臧國仁、施祖琪(19990700)。新聞編採手冊與媒介組織特色--風格與新聞風格。新聞學研究,60,1-38。new window  延伸查詢new window
3.Junque de Fortuny, E.、De Smedt, T.、Martens, D.、Daelemans, W.(2012)。Media coverage in times of political crisis: A text mining approach。Expert Systems with Applications,39(14),11616-11622。  new window
4.Laver, Michael、Gary, John(2000)。Estimating Policy Positions from Political Texts。American Journal of Political Science,44(3),619-634。  new window
5.Salton, Gerard、Buckley, Christopher(1988)。Term-weighting approaches in automatic text retrieval。Information Processing & Management,24(5),513-523。  new window
6.戚玉樑、蔡明宏(20070700)。以文件為對象的概念萃取程序建立知識本體的雛型架構。資訊管理學報,14(3),47-66。new window  延伸查詢new window
7.尹其言、楊建民(20101200)。應用文件分群與文字探勘技術於機器學習領域趨勢分析以SSCI資料庫為例。長榮大學學報,14(2),1-16。new window  延伸查詢new window
8.李政儒、游基鑫、陳信希(20120600)。廣義知網詞彙意見極性的預測。International Journal of Computational Linguistics & Chinese Language Processing,17(2),21-36。new window  延伸查詢new window
9.林琬真、郭宗廷、張桐嘉、顏厥安、陳昭如、林守德(20121200)。利用機器學習於中文法律文件之標記、案件分類及量刑預測。International Journal of Computational Linguistics & Chinese Language Processing,17(4),49-67。new window  延伸查詢new window
10.施百俊、施如齡(20061200)。以文字探勘技術探究部落格之網路媒體特性。淡江人文社會學刊,28,95-122。new window  延伸查詢new window
11.施祖琪、臧國仁(20031000)。再論風格與新聞風格--以「綜合月刊」為例。新聞學研究,77,143-185。new window  延伸查詢new window
12.楊善順、吳世弘、陳良圃、邱宏昇、楊仁達(20131200)。蘊涵句型分析於改進中文文字蘊涵識別系統。International Journal of Computational Linguistics & Chinese Language Processing,18(4),1-16。new window  延伸查詢new window
13.蘇中信(20120600)。以紮根理論探討臺灣商管期刊中內容分析法的類型。人文社會科學研究,6(2),1-23。new window  延伸查詢new window
14.Borgatti, Stephen P.、Everett, Matin G.(1997)。Network Analysis of 2-Mode Data。Social Networks,19(3),243-269。  new window
15.Hand, David J.(2006)。Classifier Technology and the Illusion of Progress。Statistical Science,21(1),1-15。  new window
16.Hopkins, Daniel J.、King, Gary(2010)。A Method of Automated Nonparametric Content Analysis for Social Science。American Journal of Political Science,54(1),229-247。  new window
17.林頌堅(20101200)。利用自組織映射圖技術的研究主題視覺呈現及其在資訊傳播學領域的應用。圖書資訊學研究,5(1),23-49。new window  延伸查詢new window
18.許中川、陳景揆(20010100)。探勘中文新聞文件。資訊管理學報,7(2),103-122。new window  延伸查詢new window
19.陳文華、徐聖訓、施人英、吳壽山(20030600)。應用主題地圖於知識整理。圖書資訊學刊,1(1),37-58。new window  延伸查詢new window
20.Kohavi, Ron、Provost, Foster(1998)。Glossary of Terms。Machine Learning,30(2/3),271-274。  new window
21.游美惠(20000800)。內容分析、文本分析與論述分析在社會研究的運用。調查研究:方法與應用,8,5-42。new window  延伸查詢new window
22.Cortes, Corinna、Vapnik, Vladimir N.(1995)。Support-Vector Networks。Machine Learning,20(3),273-297。  new window
23.Watts, Duncan J.、Strogatz, Steven H.(1998)。Collective Dynamics of 'Small-world' Networks。Nature,393(6684),440-442。  new window
會議論文
1.Yang, Yiming、Liu, Xin(1999)。A Re-examination of Text Categorization Methods。The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,F. Gey, M. Hearst, & R. Tong (Chairs) 。ACM Press。42-49。  new window
2.Kohavi, Ron(1995)。A study of cross-validation and bootstrap for accuracy estimation and model selection。The 14th International Joint Conference on Artificial Intelligence。Morgan Kaufmann。1137-1143。  new window
3.Pang, Bo、Lee, Lillian、Vaithyanathan, Shivakumar(2002)。Thumbs Up? Sentiment Classification Using Machine Learning Techniques。The 2002 Conference on Empirical Methods in Natural Language Processing。Pennsylvania。79-86。  new window
4.瞿海源(1982)。論社會科學研究方法的相容性與互補性。社會學理論與方法研討會。臺北:中央研究院民族學研究所。245-266。  延伸查詢new window
5.Caruana, Rich、Munson, Art、Niculescu-Mizil, Alexandru(2006)。Getting the Most Out of Ensemble Selection。Sixth International Conference of Data Mining。Washington, DC:IEEE Computer Society。828-833。  new window
研究報告
1.Alexa, Melina(1997)。Computer-assisted Text Analysis Methodology in the Social Sciences。  new window
圖書
1.Witten, Ian H.、Frank, Eibe、Hall, Mark A.(2011)。Data Mining: Practical machine learning tools and techniques。Burlington, Massachusetts:Morgan Kaufmann。  new window
2.Russell, Stuart、Norvig, Peter、歐崇明、時文中、陳龍(2011)。人工智慧:現代方法。新北市:全華圖書。  延伸查詢new window
3.俞士汶(2003)。計算語言學概論。北京:商務印書館。  延伸查詢new window
4.賴志遠、王玳琪、吳騏、張嘉珍、葉乃菁(2009)。文字探勘在科技政策研究之應用。臺北:財團法人國家實驗研究院科技政策研究與資訊中心。  延伸查詢new window
5.Leetaru, Kalev Hannes(2012)。Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content。New York:Routledge。  new window
6.Miner, Gary、Delen, Dursun、Elder, John、Fast, Andrew、Hill, Thomas、Nisbet, Robert A.(2012)。Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications。Elsevier Science Publishers。  new window
7.Rockwell, Patricia A.(2006)。Sarcasm and Other Mixed Messages: The Ambiguous Ways People Use Language。Lewiston, NY:Edwin Mellen Press。  new window
8.Tufféry, Stéphane(2011)。Data Mining and Statistics for Dicision Making。Chichester:John Wiley & Sons。  new window
9.Luck, Edward C.(1999)。Mixed Messages: American Politics and International Organization, 1919-1999。Washington, DC:Brookings Institution Press。  new window
10.Krippendorff, Klaus H.(2013)。Content Analysis: An Introduction to Its Methodology。Sage。  new window
11.Holsti, Ole R.(1969)。Content Analysis for the Social Sciences and Humanities。Addison-Wesley Pub. Co.。  new window
12.Feldman, Ronen、Sanger, James(2006)。The Text mining handbook: Advanced approaches in analyzing unstructured data。Cambridge University Press。  new window
13.Sullivan, Dan(2001)。Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales。John Wiley & Sons, Inc.。  new window
14.Hanneman, Robert A.、Riddle, Mark、陳世榮(2013)。社會網絡分析方法:UCINET的應用。巨流。  延伸查詢new window
15.Cristianini, N.、Shawe-Taylor, John(2000)。An Introduction to Support Vector Machines and Other Kernel-based Learning Methods。Cambridge University Press。  new window
16.Glaser, Barney G.、Strauss, Anselm L.(1967)。The Discovery of Grounded Theory: Strategies for Qualitative Research。Aldine。  new window
其他
1.中央研究院資訊所(2003)。中文斷詞系統,http://ckipsvr.iis. sinica.edu.tw/, 2013/05/01。  延伸查詢new window
2.(2007)。自由時報電子報,http://news. 1tn.com.tw/search, 2013/03/01。  new window
3.聯合報(2007)。聯合知識庫,http://udndata.com/udn, 2013/03/01。  延伸查詢new window
圖書論文
1.黃居仁、張如瑩、蔡柏生(2004)。語意網時代的網路華語教學--兼介中英雙語知識本體與領域檢索介面。語言,文學與資訊。新竹:清華大學出版社。  延伸查詢new window
2.Blake, Catherine(2011)。Text Mining。Annual Review of Information Science and Technology。Medford, NJ:Information Today。  new window
3.Bock, Mary A.(2009)。Impressionistic Context Analysis: Word Counting in Popular Media。The Content Analysis Reader。Thoundand Oaks, CA:SAGE。  new window
4.Caruana, Rich、Niculescu-Mizil, Alexandru、Crew, Geoff、Ksikes, Alex(2004)。Ensemble Selection from Libraries of Models。Proceedings of the Twenty-first International Conference on Machine Learning。New York:ACM Press。  new window
5.Franzosi, Roberto(2008)。Content Analysis: Objective, Systematic, and Quantitative Description of Content。Content Analysis。London:SAGE。  new window
6.Lasswell, Harold D.(1965)。Why Be Quantitative?。Language of Politics: Studies in Quantitative Semantics。Cambridge, MA:The MIT Press。  new window
7.Turchi, Marco、Mammone, Alessia、Cristianini, Nello(2009)。Analysis of Text Patterns Using Kernel Methods。Text Mining: Classification, Clustering, and Application。Boca Raton, FL:CRC Press。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關著作
 
QR Code
QRCODE