:::

詳目顯示

回上一頁
題名:運用以卡方為基礎的統計方法於色情網頁分類之研究
書刊名:資訊管理學報
作者:李龍豪陸承志
作者(外文):Lee, Lung-haoLuh, Cheng-jye
出版日期:2007
卷期:14:2
頁次:頁225-246
主題關鍵詞:網路內容分類色情黑名單不當資訊過濾卡方分配Web content ratingPornographic black listInappropriate web content filteringChi-square distribution
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(1) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:1
  • 共同引用共同引用:0
  • 點閱點閱:53
由於網際網路的普及,資訊的散佈非常迅速,網路上充斥著各種良莠不齊的資訊,越來越多的不當資訊,例如色情小說、圖片與粗暴文字等,在缺乏完善的網路內容管理機制之下,使用者只要透過搜尋引擎輸入相關的關鍵字,就可以從搜尋結果藉由超連結輕易存取網站內容,因此網路內容管理已成為刻不容緩的議題。本研究針對不當資訊中的色情範疇,提出一個以色情網頁分類,來蒐集黑名單的方式,對色情網站內容中文字的部份,求出個別字詞(Word)的色情傾向(Porn Tendency),透過卡方分配計算出色情指標值(Indicator Value),將網頁分成色情(Porn)、未確定(Unsure)與非色情(Non-Porn)三類。色情類網頁的網址即為所謂的黑名單,可做為網路色情過濾的依據。本研究針對中文與英文語系網頁實作一個系統,實驗結果顯示,本提議方法具有高度的精確率與相當低的正誤判率。
With the rapid growing of Internet usage, inappropriate materials (e.g. porn, drug, violence et al.) had been flooded on the Web. The open characteristic of the Web allows users to access almost any type of such inappropriate materials, consequently having various negative effects on the users, particularly on the children. Thus, web content rating and filtering mechanism is a worthy and pressing issue. This study proposes a chi-square based statistics method for classifying pornographic materials. Given a web page, its textual content is first split into a list of tokens, along a porn tendency weight for each token. The proposed method then calculates an indicator value (I-value) for the web page by combining the tokens' porn tendency weights through properties of chi-square distribution. The resulting I-value is used to classify the given web page into one of three categories, Porn, Unsure and Non-Porn. The web pages in the Porn Category are finally collected into a black list. Currently, the proposed method can classify English and Chinese Web pages. Experimental results indicate that the proposed method can detect pornographic web content at a superior precision rate along with a very low false positive rate.
期刊論文
1.Kosala, R.、Blockeel, H.(2000)。Web Mining Research: A Survey。ACM SIGKDD Explorations Newsletter Archive,2(1),1-15。  new window
2.Etzioni, Oren(1996)。The World-Wide Web: Quagmire or Gold Mine?。Communications of the ACM,39(11),65-68。  new window
3.Robinson, G.(2003)。A Statistical Approach to the Spam Problem。Linux Journal,107。  new window
4.李秉原、Hui, S. C.、Fong, A. C. M.(2003)。A Structural and Content-based Analysis for Web Filtering。Internet Research: Electronic Networking Applications and Policy,13(1),27-37。  new window
5.Torres, L.、Vila, J.(2002)。Automatic Face Recognition for Video Indexing Application。Pattern Recognition,35(3),615-625。  new window
6.Arentz, W. A.、Olstad, B.(2004)。Classifying Offensive Sites Based on Image Content。Computer Vision and Image Understanding,94,295-310。  new window
7.Goodwin, S.、Vidgen, R.(2002)。Content, Content. Everywhere... Time to Stop and Think? the Process of Web Content Management。Computing and Control Engineering Journal,13(2),66-70。  new window
8.李秉原、Hui, S. C.、Fong, A. C. M.(2002)。Neural Networks for Web Content Filtering。IEEE Intelligent Systems,17(5),48-57。  new window
9.Kolariand, P.、Joshi, A.(2004)。Web Mining: Research and Practice。IEEE Computational Science and Engineering,6(4),49-53。  new window
會議論文
1.Chan, Y.、Harvey, R.、Smith, D.(1999)。Building Systems to Block Pornography。0。  new window
2.邱志傑、王明習、賴溪松(2003)。TANet不當資訊尋與分析。0。  延伸查詢new window
3.林宜隆、李璘昱、劉金和、莊育秀、許盛凱(2003)。不當資訊防制政策與管理策略之初探。0。  延伸查詢new window
4.邱志傑、王明智、賴溪松(2004)。不當資訊防制分析。0。  延伸查詢new window
5.王鐵雄、陳思翰、蔡顯明、林俊男、李新林(2004)。從眾行為在不當資訊防制上的應用。0。  延伸查詢new window
6.Meyer, T. A.、Whateley, B.(2004)。SpamBayes: Effective Open-source, Bayesian Based, Email Classification System。0。  new window
7.李龍豪、陸承志、黃威穎(2005)。參數調校模擬於高效率的色情網頁分類機制之應用。0。  延伸查詢new window
8.Smith, D.、Harvey, R.、Chen, Y.、Bangham, A.(1999)。Classifying Web Pages by Content。0。  new window
9.Jicheng, W.、Yuan, H.、Gangshen, W.、Fuyan, Z.(1999)。Web Mining: Knowledge Discovery on the Web。0。137-141。  new window
10.Jiao, F.、Gao, W.、Duan, L.、Cui, G.(2001)。Detecting Adult Images Using Multiple Features。0。378-383。  new window
11.Duan, L.、Cui, G.、Gao, W.、Zhang, H.(2002)。Adult Image Detection Method Base-on Skin Color Model and Support Vector Machine。0。797-780。  new window
12.Bosson, A.、Cawley, G. C.、Chan, Y.、Harvey, R.(2002)。Non-retrieval: Blocking Pornographic Images。0。50-60。  new window
13.Liu, L.、Chen, J.、Song, H.(2002)。The Research of Web Mining。0。2333-2337。  new window
14.Srivastava, J.、Desikan, P.、Kumar, V.(2002)。Web Mining: Accomplishments and Furture Directions。0。51-70。  new window
15.Schettini, R.、Brambilla, C.、Cusano, C.、Ciocca, G.(2003)。On the Detection of Pornographic Digital Images。0。2105-2113。  new window
16.Hammami, M.、Chahir, Y.、Chen, L.(2003)。WebGuard: Web Based Adult Content Detection and Filtering System。0。574-578。  new window
學位論文
1.邱忠俊(1999)。犯罪語言學與資料檢索應用觀念之研究--以網際網路情色文學為例(碩士論文)。中央警察大學。  延伸查詢new window
2.楊良吉(2001)。全球資訊網過濾軟題之研究,0。  延伸查詢new window
3.郭永明(2001)。利用類神經網路決定膚色色彩空間之色情影像偵測,0。  延伸查詢new window
4.邱建明(2004)。結合影像與文字辨識的網路色情過濾,0。  延伸查詢new window
圖書
1.Baeza-Yates, R.、Ribeiro-Neto, B.(1999)。Modern information retrieval。Addison-Wesley。  new window
2.Casell, G.、Berger, R. L.(2001)。Statistical Inference。Statistical Inference。0。  new window
3.Ross, S. M.(2004)。Introduction to Probability and Statistics for Engineers and Scientists。Introduction to Probability and Statistics for Engineers and Scientists。0。  new window
其他
1.Balkin, J. M.,Noveck, B. S.,Roosevelt, K.(1999)。Filtering the Internet: A Best Practices Model,0。  new window
2.Graham, P.(2002)。A Plan for Spam,0。  new window
3.Anthony(2003)。SpamBayes Background Reading,0。  new window
4.Robinson, G.(2004)。Handling Redundancy in Email Token Probabilities, Version 0.94,0。  new window
5.Robinson, G.(2004)。Why Chi? Motivations for the Use of Fisher's Inverse Chi-square Procedure in Spam Classification, Version 0.93,0。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top