:::

詳目顯示

回上一頁
題名:中文OCR文件檢索測試集之製作與應用
書刊名:教育資料與圖書館學
作者:蔡孟竹曾元顯 引用關係
作者(外文):Tsai, Mung-chuTseng, Yuen-hsien
出版日期:2003
卷期:40:3
頁次:頁325-344
主題關鍵詞:光學文字辨識資訊檢索測試集成效評估中文檢索OCRInformation retrievalTest collectionEffectiveness evalutionChinese document retriveal
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(4) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:4
  • 共同引用共同引用:13
  • 點閱點閱:26
     本文描述一套中文OCR檢索測試集的建構過程及其實際的檢索應用。我們克服回溯性資訊需求難以獲得的困難,擬定出30道模擬使用者需求的查詢主題。為獲得真實的OCR文件,我們以OCR軟體將8439篇全文影像轉換成數位檔案,並評估其辨識率在7成上下。為了求得每一道查調主題的相關文件,我們邀請三位人員分別檢視並判斷每一篇文華是否跟查詢主題相關。經由Kendall和諧係數的統計驗證,這三位判斷者在20道查調主題上,相關判斷的結果非常一致,顯示標準答案(即相關文件)有足夠的共識。最後,以12種檢索策略來比較OCR文件的檢索成效,我們發現辨識率降低到7成的情況下,檢索成效差不多也降低到7成左右。
     This article describes the process of constructing a Chinese OCR test collection and the application of this collection in an retrieval experiment. We have overcome the difficulty of obtaining past information need for retrospective data and created 30query topics that simulate real user needs. To obtain real OCR documents instead of simulated ones, we have converted 8439 full-text images into 8439 OCR test files. An evaluation of the OCR documents reveals an average of 70% of recognition accuracy. To obtain the relevant documents for each query, we invited 3 judges to examine each of 8439 images and give relevance score to each document for each topic. According to Kendall's statistical coefficient, highly consistent judgments are obtained in 20 query topics. Finally in our experiment with 12 search strategies, our results show that the retrieval effectiveness of OCR documents decrease to 70% when the recognition accuracy is about 70%.
期刊論文
1.陳光華、江玉婷(20000300)。中文資訊檢索測試集之設計與製作。資訊傳播與圖書館學,6:3,頁61-80。new window  延伸查詢new window
2.江玉婷、陳光華(19990500)。TREC現況及其對資訊檢索研究之影響。圖書與資訊學刊,29,36-59。new window  延伸查詢new window
3.曾元顯(19981200)。An Approach to Retrieval of OCR Degraded Text。圖書館學刊,13,153-168。new window  new window
4.Salton, Gearrd(1992)。The State of Retrieval System Evaluation。Information Processing & Management,28(4),441-449。  new window
5.曾元顯、林瑜一(19981200)。模糊搜尋、相關詞提示與相關詞回饋在OPAC系統中的成效評估。中國圖書館學會會報,61,103-125。new window  延伸查詢new window
6.Voorhees, Ellen M.(2000)。Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness。Information Processing and Management,36(5),697-698。  new window
7.Keen Michael(1998)。Cyril W. Cleverdon。The Journal of Documentation,54(3),269。  new window
8.Voorhees, Ellen M.、Harman, Donna(2000)。Overview of the Sixth Text Retrieval Conference (TREC-6)。Information Processing and Management,36(1),8。  new window
會議論文
1.Kanungo, Tapas、Resnik, Philip(1999)。The Bible, Truth, and Multilingual OCR Evaluation。San Jose, Canada。87。  new window
2.Bulbul, Osama、Kanungo, Tapas、Marton, Gregory A.(1998)。Performance Evaluation of Two Arabic OCR Products。Washington, DC。76-77。  new window
3.McCormack, Gavan、Palmer, Christopher R.、Clarke, Charles L. A.(1998)。Efficient Construction of Large Test Collections。Melbourne, Australia。282。  new window
4.Cleverdon, Cyril W.(1991)。The Significance of the Cranfield Tests on Index Languages7。  new window
5.陳光華(2001)。資訊檢索系統的評估─NTCIR會議。臺北。84。  延伸查詢new window
6.曾元顯(2001)。回溯性資料數位化服務之規劃與建置。臺北。255-274。new window  延伸查詢new window
7.Zobel, Justin.(1998)。How Reliable are the Results of Large-Scale Information Retrieval Experiments?。Melbourne, Australia。397。  new window
學位論文
1.江玉婷(1999)。中文資訊檢索測試集設計與製作之研究,臺北。  延伸查詢new window
圖書
1.黃國光(2000)。SPSS與統計學原理剖析。SPSS與統計學原理剖析。臺北。  延伸查詢new window
2.顔月珠(1986)。實用無母數統計方法。臺北。  延伸查詢new window
3.張勝溢(1993)。SPSS/PC進階篇。SPSS/PC進階篇。臺北市。  延伸查詢new window
其他
1.Voorhees, Ellen M.,Harman, Donna。Overview of the Ninth Text Retrieval Conference (TREC-9)。  new window
2.Voorhees, Ellen M.,Harman, Donna。Overview of the Eighth Text Retrieval Conference (TREC-8)。  new window
3.Text Retrieval Conference (TREC) Data-English Relevance Judgments。  new window
4.NTCIR Workshop。  new window
5.曾元顯,Oard, Douglas W.(2001)。Document Image Retrieval Techniques for Chinese,Columbia Maryland。  new window
6.Document Analysis and Recognition。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top