:::

詳目顯示

回上一頁
題名:使用概念資訊於中文大詞彙連續語音辨識之研究
書刊名:International Journal of Computational Linguistics & Chinese Language Processing
作者:郝柏翰陳思澄陳柏琳
作者(外文):Hao, Po-hanChen, Ssu-chengChen, Berlin
出版日期:2014
卷期:19:4
頁次:頁47-59
主題關鍵詞:語音辨識語言模型概念資訊模型調適Speech recognitionLanguage modelConcept informationModel adaptation
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:8
  • 點閱點閱:10
語言模型是語音辨識系統中的關鍵組成,其主要的功能通常是藉由已解碼的歷 史詞序列資訊來預測下一個詞彙為何的可能性最大,以協助語音辨識系統從眾 多混淆的候選詞序列假設中找出最有可能的結果。本論文旨在於發展新穎動態 語言模型調適技術,用以輔助並彌補傳統N 連(N-gram)語言模型不足之處,其 主要貢獻有二。首先,我們提出所謂的概念語言模型(Concept Language Model, CLM),其主要目的在於近似隱含在歷史詞序列中語者內心所欲表達之概念, 並藉以獲得基於此概念下詞彙使用分布資訊,做為動態語言模型調適之線索來 源。其次,我們嘗試以不同方式來估測此種概念語言模型,並將不同程度的鄰 近資訊(Proximity Information) 融入概念語言模型以放寬其既有詞袋 (Bag-of-Words) 假設的限制。本論文是以中文大詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)為任務目標,以比較我們所 提出語言模型調適技術與其它當今常用技術之效能。實驗結果顯示我們的語言 模型調適技在以字錯誤率(Character Error Rate, CER)評估標準之下,對於僅使 用N 連語言模型的基礎語音辨識系統皆能有明顯的效能提升。
Language modeling (LM) is part and parcel of automatic speech recognition (ASR), since it can assist ASR to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final output hypothesis given an input utterance. This paper investigates and develops language model adaptation techniques for use in ASR and its main contribution is two-fold. First, we propose a novel concept language modeling (CLM) approach to rendering the relationships between a search history and an upcoming word. Second, the instantiations of CLM are constructed with different levels of lexical granularities, such as words and document clusters. In addition, we also explore the incorporation of word proximity cues into the model formulation of CLM, getting around the “bag-of-words” assumption. A series of experiments conducted on a Mandarin large vocabulary continuous speech recognition (LVCSR) task demonstrate that our proposed language models can offer substantial improvements over the baseline N-gram system, and achieve performance competitive to, or better than, some state-of-the-art language model adaptation methods.
期刊論文
1.Blei, D. M.(2014)。Build, compute, critique, repeat: Data analysis with latent variable models。Annual Review of Statistics and Its Application,1,203-232。  new window
2.Furui, S.、Deng, L.、Gales, M.、Ney, H.、Tokuda, K.(2012)。Fundamental technologies in modern speech recognition。IEEE Signal Processing Magazine,29(6),16-17。  new window
3.O'Shaughnessy, D.、Deng, L.、Li, H.(2013)。Speech information processing: Theory and applications。Proceedings of the IEEE,101(5),1034-1037。  new window
4.Zhai, C. X.(2008)。Statistical language models for information retrieval: A critical review。Foundations and Trends in Information Retrieval,2(3),137-213。  new window
5.Kullback, Solomon、Leibler, Richard A.(1951)。On Information and Sufficiency。The Annals of Mathematical Statistics,22(1),79-86。  new window
6.Wang, Hsin-min、Chen, Berlin、Kuo, Jen-wei、Cheng, Shih-sian(20050600)。MATBN: A Mandarin Chinese Broadcast News Corpus。International Journal of Computational Linguistics & Chinese Language Processing,10(2),219-235。new window  new window
7.Dempster, Arthur P.、Laird, Nan M.、Rubin, Donald B.(1977)。Maximum likelihood from incomplete data via the EM algorithm。Journal of the Royal Statistical Society: Series B (Methodological),39(1),1-38。  new window
8.Blei, David M.、Ng, Andrew Y.、Jordan, Michael I.(2003)。Latent Dirichlet allocation。Journal of Machine Learning Research,3(4/5),993-1022。  new window
9.Bellegarda, J. R.(2004)。Statistical Language Model Adaptation: Review and Perspectives。Speech Communication,42,93-108。  new window
10.Ortmanns, S.、Ney, H.、Aubert, X. L.(1997)。A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition。Computer Speech and Language,11,43-72。  new window
11.Rosenfeld, R.(2000)。Two Decades of Statistical Language Modeling: Where Do We Go from Here。Proceedings of the IEEE,88(8),1270-1278。  new window
會議論文
1.Chen, K.(2014)。Leveraging effective query modeling techniques for speech recognition and summarization。The Conference on Empirical Methods on Natural Language Processing。  new window
2.Gildea, D.、Hofmann, T.(1999)。Topic-based language models using EM。The European Conference on Speech Communication and Technology,2167-2170。  new window
3.Kim, D.(2013)。A variational approximation for topic modeling of hierarchical corpora。The International Conference on Machine Learning。  new window
4.Kuhn, R.(1988)。Speech recognition and the frequency of recently used words: A modified Markov model for natural language。International Conference on Computational Linguistics,348-350。  new window
5.Lau, R.、Rosenfeld, R.、Roukos, S.(1993)。Trigger-based language models: a maximum entropy approach。The IEEE International Conference on Acoustics, Speech, Signal Processing,45-48。  new window
6.Liu, S. H.、Chu, F. H.、Lin, S. H.、Lee, H. S.、Chen, B.(2007)。Training data selection for improving discriminative training of acoustic models。IEEE workshop on Automatic Speech Recognition and Understanding,284-289。  new window
7.Mikolov, T.、Karafiát, M.、Burget, L.、Černocký, J.、Khudanpur, S.(2010)。Recurrent neural network based language model。The Eleventh Annual Conference of the International Speech Communication Association 2010,1045-1048。  new window
8.Potapenko, A.、Konstantin, V.(2013)。Robust PLSA performs better than LDA。The European Conference on Information Retrieval,784-787。  new window
9.Tam, Y.、Schultz, T.(2005)。Dynamic language model adaptation using variational Bayes inference。The Annual Conference of the International Speech Communication Association,5-8。  new window
10.Troncoso, C.、Kawahara, T.(2005)。Trigger-based language model adaptation for automatic meeting transcription。The Annual Conference of the International Speech Communication Association,1297-1300。  new window
11.Hofmann, T.(1999)。Probabilistic latent semantic indexing。The ACM Special Interest Group on Information Retrieval,(會議日期: August 15-19),50-57。  new window
12.Chen, B.、Kuo, J. W.、Tsai, W. H.(2004)。Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription。The IEEE International Conference on Acoustics, Speech, and Signal Processing 2004,777-780。  new window
圖書
1.Baeza-Yates, R.、Ribeiro-Neto, B.(2011)。Modern Information Retrieval: the Concepts and Technology behind Search。Addison-Wesley。  new window
其他
1.Stolcke, A.(2000)。SRI Language Modeling Toolkit,http://www.speech.sri.com/projects/srilm。  new window
圖書論文
1.Deng, L.、Yu, D.(2014)。Deep Learning: Methods and Applications。Foundations and Trends in Signal Processing。Now Publishers。  new window
2.Blei, D.、Lafferty, J.(2009)。Topic models。Text Mining: Theory and Applications。Taylor and Francis。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top