:::

詳目顯示

回上一頁
題名:建構一個以共時與歷時語言研究為導向的歷史語料庫
書刊名:International Journal of Computational Linguistics & Chinese Language Processing
作者:魏培泉 引用關係譚樸森劉承慧黃居仁孫朝奮
作者(外文):Wei, Pei-chuanThompson, P. M.Liu, Cheng-huiHuang, Chu-renSun, Chaofen
出版日期:1997
卷期:2:1
頁次:頁131-145
主題關鍵詞:語料庫詞彙庫詞類標記檢索古代漢語中古漢語近代漢語CorpusLexical databasePart-of-speechMark-upTaggingOld ChineseMiddle ChineseEarly mandarin Chinese
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(1) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:31
     中央研究院古漢語語料庫是為古漢語語言研究而構建的。這個語料庫不但具有大量的可作為古漢語語法及詞彙研究的電子文獻,而且擁有可以對文獻的語詞進行檢索、統計、搭配的多功能程式。以語法的發展為準,這個語料庫又分作上古漢語、中古漢語、近代漢語等三個次語料庫,相信這樣的劃分對古漢語的共時或歷時的研究都是頗為便益的。 現在上古漢語語料庫中有相當數量的文獻已經依據其原典、作者、文體等等完成了分類及標注的工作,其中又有不少文獻已經做了斷詞,在已斷詞的文獻中又有幾部古籍已完成詞類的標記。這些斷詞以及詞類標記的成果現已構成我們上古漢語詞彙庫的基礎。
     The Academia Sinica Ancient Chinese Corpus is designed for linguistic research. The corpus contains ancient texts that are selected because of their usefulness in grammatical and lexical studies, as well as an inspection program with keyword searching, statistics, and collocation functions. The corpus is divided into three subcorpora according to stages of grammatical developments, thus both synchronic and diachronic studies can be performed on them. Their current sizes are as follows: A. Old Chinese subcorpus (from pre-Qin to Pre-Han):5,128,068 characters. B. Middle Chinese subcorpus (from Late Han to the Six Dynasties):8,101,662 characters. C. Early Mandarin Chinese subcorpus (from Tang to Ching):4,406,381 characters. A great portion of the texts from the Old Chinese subcorpus (4,497,051 characters) has been textually classified and marked-up according to their source books, author, text genre etc. A substantive part (520,794 characters) of the same subcorpus has also been segmented into words, which are in turn given part-of-speech tagging. Results of the above two tasks form the basis of our Old Chinese Lexical Database.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE