中文OCR文件檢索測試集之製作與應用__臺灣人文及社會科學引文索引資料庫

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

來源文獻資料
摘要
外文摘要
引文資料

題名：	中文OCR文件檢索測試集之製作與應用
書刊名：	教育資料與圖書館學
作者：	蔡孟竹／曾元顯
作者(外文)：	Tsai, Mung-chu／Tseng, Yuen-hsien
出版日期：	2003
卷期：	40:3
頁次：	頁325-344
主題關鍵詞：	光學文字辨識；資訊檢索；測試集；成效評估；中文檢索；OCR；Information retrieval；Test collection；Effectiveness evalution；Chinese document retriveal
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(4) 博士論文(0) 專書(0) 專書論文(0) 排除自我引用:4 共同引用:13 點閱:26

　　　　　本文描述一套中文OCR檢索測試集的建構過程及其實際的檢索應用。我們克服回溯性資訊需求難以獲得的困難，擬定出30道模擬使用者需求的查詢主題。為獲得真實的OCR文件，我們以OCR軟體將8439篇全文影像轉換成數位檔案，並評估其辨識率在7成上下。為了求得每一道查調主題的相關文件，我們邀請三位人員分別檢視並判斷每一篇文華是否跟查詢主題相關。經由Kendall和諧係數的統計驗證，這三位判斷者在20道查調主題上，相關判斷的結果非常一致，顯示標準答案（即相關文件）有足夠的共識。最後，以12種檢索策略來比較OCR文件的檢索成效，我們發現辨識率降低到7成的情況下，檢索成效差不多也降低到7成左右。

以文找文

　　　　　This article describes the process of constructing a Chinese OCR test collection and the application of this collection in an retrieval experiment. We have overcome the difficulty of obtaining past information need for retrospective data and created 30query topics that simulate real user needs. To obtain real OCR documents instead of simulated ones, we have converted 8439 full-text images into 8439 OCR test files. An evaluation of the OCR documents reveals an average of 70% of recognition accuracy. To obtain the relevant documents for each query, we invited 3 judges to examine each of 8439 images and give relevance score to each document for each topic. According to Kendall's statistical coefficient, highly consistent judgments are obtained in 20 query topics. Finally in our experiment with 12 search strategies, our results show that the retrieval effectiveness of OCR documents decrease to 70% when the recognition accuracy is about 70%.

以文找文

期刊論文
1.	陳光華、江玉婷(20000300)。中文資訊檢索測試集之設計與製作。資訊傳播與圖書館學，6:3，頁61-80。延伸查詢
2.	江玉婷、陳光華(19990500)。TREC現況及其對資訊檢索研究之影響。圖書與資訊學刊，29，36-59。延伸查詢
3.	曾元顯(19981200)。An Approach to Retrieval of OCR Degraded Text。圖書館學刊，13，153-168。
4.	Salton, Gearrd(1992)。The State of Retrieval System Evaluation。Information Processing & Management，28(4)，441-449。
5.	曾元顯、林瑜一(19981200)。模糊搜尋、相關詞提示與相關詞回饋在OPAC系統中的成效評估。中國圖書館學會會報，61，103-125。延伸查詢
6.	Voorhees, Ellen M.(2000)。Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness。Information Processing and Management，36(5)，697-698。
7.	Keen Michael(1998)。Cyril W. Cleverdon。The Journal of Documentation，54(3)，269。
8.	Voorhees, Ellen M.、Harman, Donna(2000)。Overview of the Sixth Text Retrieval Conference (TREC-6)。Information Processing and Management，36(1)，8。

會議論文
1.	Kanungo, Tapas、Resnik, Philip(1999)。The Bible, Truth, and Multilingual OCR Evaluation。San Jose, Canada。87。
2.	Bulbul, Osama、Kanungo, Tapas、Marton, Gregory A.(1998)。Performance Evaluation of Two Arabic OCR Products。Washington, DC。76-77。
3.	McCormack, Gavan、Palmer, Christopher R.、Clarke, Charles L. A.(1998)。Efficient Construction of Large Test Collections。Melbourne, Australia。282。
4.	Cleverdon, Cyril W.(1991)。The Significance of the Cranfield Tests on Index Languages7。
5.	陳光華(2001)。資訊檢索系統的評估─NTCIR會議。臺北。84。延伸查詢
6.	曾元顯(2001)。回溯性資料數位化服務之規劃與建置。臺北。255-274。延伸查詢
7.	Zobel, Justin.(1998)。How Reliable are the Results of Large-Scale Information Retrieval Experiments?。Melbourne, Australia。397。

學位論文
1.	江玉婷(1999)。中文資訊檢索測試集設計與製作之研究，臺北。延伸查詢

圖書
1.	黃國光(2000)。SPSS與統計學原理剖析。SPSS與統計學原理剖析。臺北。延伸查詢
2.	顔月珠(1986)。實用無母數統計方法。臺北。延伸查詢
3.	張勝溢(1993)。SPSS/PC進階篇。SPSS/PC進階篇。臺北市。延伸查詢

其他
1.	Voorhees, Ellen M.，Harman, Donna。Overview of the Ninth Text Retrieval Conference (TREC-9)。
2.	Voorhees, Ellen M.，Harman, Donna。Overview of the Eighth Text Retrieval Conference (TREC-8)。
3.	Text Retrieval Conference (TREC) Data-English Relevance Judgments。
4.	NTCIR Workshop。
5.	曾元顯，Oard, Douglas W.(2001)。Document Image Retrieval Techniques for Chinese，Columbia Maryland。
6.	Document Analysis and Recognition。

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	3D列印握筆器個別化設計發展對一位大專院校脊髓性肌肉萎縮症學生書寫功能表現之研究
2.	古籍資料庫系統提供文史研究之功能設計及其重要性分析
3.	自動化資訊組織與主題分析近二十年來的研究與發展
4.	以籠統查詢評估查詢擴展方法與線上搜尋引擎之資訊檢索效能
5.	中文專利前案檢索模式之成效評估
6.	跨語言資訊檢索中查詢問題特性於檢索效益之影響
7.	「相關」與「模糊」在資訊檢索領域中關係驗證與分析
8.	相關排序於資訊檢索之發展與探討
9.	開放古籍平臺的意義與實作
10.	應用於資訊檢索的中文OCR錯誤詞彙自動更正
11.	網路著作權之刑事訴究問題
12.	數位文件之資訊組織與主題分析自動化之技術與應用
13.	黃帝內經貫珠詞典庫之研發與應用
14.	資訊檢索之中文詞彙擴展
15.	中文資訊檢索測試集之設計與製作

無相關博士論文

無相關書籍

無相關著作

無相關點閱

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫