社會科學研究中的文字探勘應用：以文意為基礎的文件分類及其問題_

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

來源文獻資料
摘要
外文摘要
引文資料

題名：	社會科學研究中的文字探勘應用：以文意為基礎的文件分類及其問題
書刊名：	人文及社會科學集刊
作者：	陳世榮
作者(外文)：	Chen, Roger S.
出版日期：	2015
卷期：	27:4
頁次：	頁683-718
主題關鍵詞：	文字探勘；文意區別；文件分類；機器學習；共詞網絡分析；Text mining；Meaning differentiation；Document classification；Machine learning；Co-word network analysis
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(16) 博士論文(0) 專書(0) 專書論文(0) 排除自我引用:16 共同引用:234 點閱:135

隨著電子典藏技術的精進，文字探勘技術逐漸受到重視，本文以社會科學研究在文意區別上的需求，評估監督式機器學習對非結構、複雜文本的分類效果，並就所見問題提出分析與建議。本文從文字探勘與內容分析文意區別上的差異與共通性出發，繼而以新聞報導為分析資料，針就特定文件意向，遵循一般文字探勘程序，以支持向量機與簡易貝式分類器執行文件分類評估。分析結果指出，文字探勘對於複雜文意的判讀效果值得肯定，但經由共詞網絡分析也發現，文件的編撰風格將影響文件分類的效果。建議研究者在資料處理初期，應反覆評估研究目的、資料特性與分類器模型間的契合度。

以文找文

Along with the growing development of electronic information storage, text mining has increasingly gained attention from scholars and practitioners across various disciplines. In response to the need for meaning differentiation in social studies, the study aims to evaluate supervised machine learning classifiers in terms of the performance of document classification. Setting out from the comparison between traditional content analysis and text mining, the evaluation follows a normal procedure of text mining and applies Support Vector Machine and Naïve Bayes classifiers on non-structural, complex social texts extracted from news media. The outcomes of the analysis validate that text mining manages classification well for documents with complex meaning. However, a further co-word network analysis in the study finds that the editing style of data may affect classifiers' performance. It is suggested that, in the early stage of data processing, greater care must be given to the fit between research problems, editing styles, and classifiers.

以文找文

期刊論文
1.	曾元顯(20020600)。文件主題自動分類成效因素探討。中國圖書館學會會報，68，62-83。延伸查詢
2.	臧國仁、施祖琪(19990700)。新聞編採手冊與媒介組織特色--風格與新聞風格。新聞學研究，60，1-38。延伸查詢
3.	Junque de Fortuny, E.、De Smedt, T.、Martens, D.、Daelemans, W.(2012)。Media coverage in times of political crisis: A text mining approach。Expert Systems with Applications，39(14)，11616-11622。
4.	Laver, Michael、Gary, John(2000)。Estimating Policy Positions from Political Texts。American Journal of Political Science，44(3)，619-634。
5.	Salton, Gerard、Buckley, Christopher(1988)。Term-weighting approaches in automatic text retrieval。Information Processing & Management，24(5)，513-523。
6.	戚玉樑、蔡明宏(20070700)。以文件為對象的概念萃取程序建立知識本體的雛型架構。資訊管理學報，14(3)，47-66。延伸查詢
7.	尹其言、楊建民(20101200)。應用文件分群與文字探勘技術於機器學習領域趨勢分析以SSCI資料庫為例。長榮大學學報，14(2)，1-16。延伸查詢
8.	李政儒、游基鑫、陳信希(20120600)。廣義知網詞彙意見極性的預測。International Journal of Computational Linguistics & Chinese Language Processing，17(2)，21-36。延伸查詢
9.	林琬真、郭宗廷、張桐嘉、顏厥安、陳昭如、林守德(20121200)。利用機器學習於中文法律文件之標記、案件分類及量刑預測。International Journal of Computational Linguistics & Chinese Language Processing，17(4)，49-67。延伸查詢
10.	施百俊、施如齡(20061200)。以文字探勘技術探究部落格之網路媒體特性。淡江人文社會學刊，28，95-122。延伸查詢
11.	施祖琪、臧國仁(20031000)。再論風格與新聞風格--以「綜合月刊」為例。新聞學研究，77，143-185。延伸查詢
12.	楊善順、吳世弘、陳良圃、邱宏昇、楊仁達(20131200)。蘊涵句型分析於改進中文文字蘊涵識別系統。International Journal of Computational Linguistics & Chinese Language Processing，18(4)，1-16。延伸查詢
13.	蘇中信(20120600)。以紮根理論探討臺灣商管期刊中內容分析法的類型。人文社會科學研究，6(2)，1-23。延伸查詢
14.	Borgatti, Stephen P.、Everett, Matin G.(1997)。Network Analysis of 2-Mode Data。Social Networks，19(3)，243-269。
15.	Hand, David J.(2006)。Classifier Technology and the Illusion of Progress。Statistical Science，21(1)，1-15。
16.	Hopkins, Daniel J.、King, Gary(2010)。A Method of Automated Nonparametric Content Analysis for Social Science。American Journal of Political Science，54(1)，229-247。
17.	林頌堅(20101200)。利用自組織映射圖技術的研究主題視覺呈現及其在資訊傳播學領域的應用。圖書資訊學研究，5(1)，23-49。延伸查詢
18.	許中川、陳景揆(20010100)。探勘中文新聞文件。資訊管理學報，7(2)，103-122。延伸查詢
19.	陳文華、徐聖訓、施人英、吳壽山(20030600)。應用主題地圖於知識整理。圖書資訊學刊，1(1)，37-58。延伸查詢
20.	Kohavi, Ron、Provost, Foster(1998)。Glossary of Terms。Machine Learning，30(2/3)，271-274。
21.	游美惠(20000800)。內容分析、文本分析與論述分析在社會研究的運用。調查研究：方法與應用，8，5-42。延伸查詢
22.	Cortes, Corinna、Vapnik, Vladimir N.(1995)。Support-Vector Networks。Machine Learning，20(3)，273-297。
23.	Watts, Duncan J.、Strogatz, Steven H.(1998)。Collective Dynamics of 'Small-world' Networks。Nature，393(6684)，440-442。

會議論文
1.	Yang, Yiming、Liu, Xin(1999)。A Re-examination of Text Categorization Methods。The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval，F. Gey, M. Hearst, & R. Tong (Chairs) 。ACM Press。42-49。
2.	Kohavi, Ron(1995)。A study of cross-validation and bootstrap for accuracy estimation and model selection。The 14th International Joint Conference on Artificial Intelligence。Morgan Kaufmann。1137-1143。
3.	Pang, Bo、Lee, Lillian、Vaithyanathan, Shivakumar(2002)。Thumbs Up? Sentiment Classification Using Machine Learning Techniques。The 2002 Conference on Empirical Methods in Natural Language Processing。Pennsylvania。79-86。
4.	瞿海源(1982)。論社會科學研究方法的相容性與互補性。社會學理論與方法研討會。臺北：中央研究院民族學研究所。245-266。延伸查詢
5.	Caruana, Rich、Munson, Art、Niculescu-Mizil, Alexandru(2006)。Getting the Most Out of Ensemble Selection。Sixth International Conference of Data Mining。Washington, DC：IEEE Computer Society。828-833。

研究報告
1.	Alexa, Melina(1997)。Computer-assisted Text Analysis Methodology in the Social Sciences。

圖書
1.	Witten, Ian H.、Frank, Eibe、Hall, Mark A.(2011)。Data Mining: Practical machine learning tools and techniques。Burlington, Massachusetts：Morgan Kaufmann。
2.	Russell, Stuart、Norvig, Peter、歐崇明、時文中、陳龍(2011)。人工智慧：現代方法。新北市：全華圖書。延伸查詢
3.	俞士汶(2003)。計算語言學概論。北京：商務印書館。延伸查詢
4.	賴志遠、王玳琪、吳騏、張嘉珍、葉乃菁(2009)。文字探勘在科技政策研究之應用。臺北：財團法人國家實驗研究院科技政策研究與資訊中心。延伸查詢
5.	Leetaru, Kalev Hannes(2012)。Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content。New York：Routledge。
6.	Miner, Gary、Delen, Dursun、Elder, John、Fast, Andrew、Hill, Thomas、Nisbet, Robert A.(2012)。Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications。Elsevier Science Publishers。
7.	Rockwell, Patricia A.(2006)。Sarcasm and Other Mixed Messages: The Ambiguous Ways People Use Language。Lewiston, NY：Edwin Mellen Press。
8.	Tufféry, Stéphane(2011)。Data Mining and Statistics for Dicision Making。Chichester：John Wiley & Sons。
9.	Luck, Edward C.(1999)。Mixed Messages: American Politics and International Organization, 1919-1999。Washington, DC：Brookings Institution Press。
10.	Krippendorff, Klaus H.(2013)。Content Analysis: An Introduction to Its Methodology。Sage。
11.	Holsti, Ole R.(1969)。Content Analysis for the Social Sciences and Humanities。Addison-Wesley Pub. Co.。
12.	Feldman, Ronen、Sanger, James(2006)。The Text mining handbook: Advanced approaches in analyzing unstructured data。Cambridge University Press。
13.	Sullivan, Dan(2001)。Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales。John Wiley & Sons, Inc.。
14.	Hanneman, Robert A.、Riddle, Mark、陳世榮(2013)。社會網絡分析方法：UCINET的應用。巨流。延伸查詢
15.	Cristianini, N.、Shawe-Taylor, John(2000)。An Introduction to Support Vector Machines and Other Kernel-based Learning Methods。Cambridge University Press。
16.	Glaser, Barney G.、Strauss, Anselm L.(1967)。The Discovery of Grounded Theory: Strategies for Qualitative Research。Aldine。

其他
1.	中央研究院資訊所(2003)。中文斷詞系統，http://ckipsvr.iis. sinica.edu.tw/， 2013/05/01。延伸查詢
2.	(2007)。自由時報電子報，http://news. 1tn.com.tw/search， 2013/03/01。
3.	聯合報(2007)。聯合知識庫，http://udndata.com/udn， 2013/03/01。延伸查詢

圖書論文
1.	黃居仁、張如瑩、蔡柏生(2004)。語意網時代的網路華語教學--兼介中英雙語知識本體與領域檢索介面。語言，文學與資訊。新竹：清華大學出版社。延伸查詢
2.	Blake, Catherine(2011)。Text Mining。Annual Review of Information Science and Technology。Medford, NJ：Information Today。
3.	Bock, Mary A.(2009)。Impressionistic Context Analysis: Word Counting in Popular Media。The Content Analysis Reader。Thoundand Oaks, CA：SAGE。
4.	Caruana, Rich、Niculescu-Mizil, Alexandru、Crew, Geoff、Ksikes, Alex(2004)。Ensemble Selection from Libraries of Models。Proceedings of the Twenty-first International Conference on Machine Learning。New York：ACM Press。
5.	Franzosi, Roberto(2008)。Content Analysis: Objective, Systematic, and Quantitative Description of Content。Content Analysis。London：SAGE。
6.	Lasswell, Harold D.(1965)。Why Be Quantitative?。Language of Politics: Studies in Quantitative Semantics。Cambridge, MA：The MIT Press。
7.	Turchi, Marco、Mammone, Alessia、Cristianini, Nello(2009)。Analysis of Text Patterns Using Kernel Methods。Text Mining: Classification, Clustering, and Application。Boca Raton, FL：CRC Press。

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	探索智慧商業數位轉型之現況與趨勢--以智慧零售為例
2.	高中與社區合作多元選修課程實施探究：以「地方創客」為例
3.	原住民族的轉型正義--社會領域課程綱要與教科書中的原住民族書寫
4.	基於語義之多層式圖書自動分類實證研究
5.	新冠肺炎下的部落社區長照站：影響衝擊與回應
6.	臺灣基督教保守運動的性、家庭與婚姻論述爭議
7.	身心障礙者及其組織以服務使用者角色參與公共政策：以行政部門兩個身心障礙者權益推動小組為例
8.	日治時期嘉義中學校與嘉義農林學校的校歌、校徽與校旗之符號意象探究與比較
9.	探析澳門學校歷史教育的治理術
10.	《天之驕女》之敘事主題構成分析--以Dcard戲劇綜藝討論區為主
11.	臺灣同婚公投的Facebook集體行動框架分析
12.	以文字探勘與機器學習分析日本身分法學之發展
13.	臺灣國際酷兒影展議題之研究：以2014年至2019年網路新聞為例
14.	融合教室觀察訓練與自我調整學習策略於輔導活動教學實習課程之實踐研究
15.	我國撞球學位論文研究主題探討

1.	股票，金融資產和市場影響的價格信息實證分析：實證解析自計量經濟學面板數據和人工智能模型
2.	原住民兒童文學的建構與轉化-從《排灣族100個文本》出發
3.	國小英語教科書性別意涵之研究
4.	宗教治理與文化創新：台灣人間佛教的現代性
5.	漢魏六朝騷體賦研究
6.	運動健康促進課程研究
7.	濕地環境之民眾親環境行為與地方依附之關聯研究
8.	雲端資訊服務產業之虛擬化技術發展、資訊安全管理與商業營運模式
9.	臺灣PM2.5跨界風險知識不足之決策與治理困境
10.	臺灣、中國、香港及新加坡國（初）中國語文教科書老人形象之分析研究
11.	探討社群意見評價之顧客服務回饋機制的研究
12.	八、九○年代臺北城市「生活空間」文學書寫研究
13.	教育部提升國民素養專案中教育人圖像與方案能動性之研究
14.	當代藝術中的後人類身體調控
15.	行旅者生命跨界之旅-以高行健及其《靈山》為例

1.	體育教師角色之輪廓
2.	黨外女性的他者敘述與自我敘述：民主與性別的歧義分析
3.	性別教育：政策與實踐
4.	客家傳播理論與實證
5.	回歸現實 : 臺灣1970年代的戰後世代與文化政治變遷
6.	歷史、結構與教育 : 技職教育變革的探討
7.	建構中國 : 不確定世界中的大國定位與大國外交
8.	競爭時代的報紙：理論與實務
9.	開放式數位典藏系統之研究
10.	流行音樂欣賞教學之理論與實踐--以周杰倫「青花瓷」為例
11.	親職教育與性別平等教育：檢視親職教育通俗書籍的性別平等意識
12.	性別權力與知識建構：《親職教育》教科書的論述分析
13.	身體、性別與性教育：女性主義的觀點

無相關著作

1.	古董的價格：中國文物拍賣市場的社會鑲嵌
2.	抗爭行為的集體行動邏輯：「反貪倒扁運動」之理性選擇分析
3.	「中國因素」或是「公民不服從」？從定群追蹤樣本探討太陽花學運之民意
4.	企業導入ERP並進行BPR之流程及效益分析--以某運用兩岸三地營運模式之公司為例
5.	當責型資訊科技治理架構之個案研究
6.	宅配服務業服務品質與顧客滿意對再購意願影響之實證研究--以結構方程模式分析
7.	再探臺灣的世代政治：交叉分類隨機效應模型的應用，1995~2010
8.	父母對子女教育投資的性別差異--以就學貸款為例
9.	Revisiting Selection in Heterogeneous Returns to College Education
10.	臺灣新移民與本國籍子女隨年級的學習成果差異
11.	傾聽鄉民聲音：交易性社群網站消費決策行為比較
12.	護理人員對行動護理站的接受度與資訊素養之相關性研究
13.	以期望確認理論探討背包客對旅遊網站之持續使用意願
14.	盧梭、康德與永久和平

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫