| 期刊論文1. | Peddinti, V.、Wang, Y.、Povey, D.、Khudanpur, S.(2018)。Low latency acoustic modeling using temporal convolution and lstms。IEEE Signal Processing Letters,25(3),373-377。 | 會議論文1. | Povey, D.、Ghoshal, A.、Boulianne, G.、Burget, L.、Glembek, O.、Goel, N.、Hannemann, M.、Motlicek, P.、Qian, Y.、Schwarz, P.、Silovsky, J.、Stemmer, G.、Vesely, K.(2011)。The Kaldi speech recognition toolkit。The IEEE 2011 Workshop on Automatic Speech Recognition and Understanding。 | 2. | Povey, D.、Peddinti, V.、Galvez, D.、Ghahrmani, P.、Manohar, V.、Na, X.、Wang, Y.、Khudanpur, S.(2016)。Purely sequence-trained neural networks for ASR based on lattice-free MMI。17th Annual Conference of the International Speech Communication Association,2751-2755。 | 3. | Povey, D.、Cheng, G.、Wang, Y.、Li, K.、Xu, H.、Yarmohamadi, M.、Khudanpur, S.(2018)。Semi-orthogonal low-rank matrix factorization for deep neural networks。19th Annual Conference of the International Speech Communication Association,3743-3747。 | 4. | Povey, D.、Hadian, H.、Ghahremani, P.、Li, K.、Khudanpur, S.(2018)。A time-restricted self-attention layer for ASR。IEEE International Conference on Acoustics, Speech and Signal Processing,5874-5878。 | 5. | Watanabe, S.、Hori, T.、Karita, S.、Hayashi, T.、Nishitoba, J.、Unno, Y.、Ochiai, T.、Soplin, N. E. Y.、Heymann, J.、Wiesner, M.、Chen, N.、Renduchintala, A.(2018)。ESPnet: End-to-End Speech Processing Toolkit2207-2211。 | 6. | Ko, T.、Peddinti, V.、Povey, D.、Khudanpur, S.(2015)。Audio augmentation for speech recognition。16th Annual Conference of the International Speech Communication Association,3586-3589。 | 7. | Liao, Y.-F.、Chang, C.-Y.、Tiun, H.-K.、Su, H.-L.、Khoo, H.-L.、Tsay, J. S.、Tan, L.-K.、Kang, P.、Thiann, T.-G.、Iunn, U.-G.、Yang, J.-H.、Liang, C.-N.(2020)。Formosa Speech Recognition Challenge 2020 and Taiwanese Across Taiwan Corpus。23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques,65-70。 | 8. | Graves, A.、Fernández, S.、Gomez, F.、Schmidhuber, J.(2006)。Connectionist temporal classification: labelling unsegmented sequence data with recur-rent neural networks369-376。 | 9. | Dong, L.、Xu, S.、Xu, B.(2018)。Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition。2018 IEEE International Conference on Acoustics, Speech and Signal,5884-5888。 | 10. | Dauphin, Y. N.、Fan, A.、Auli, M.、Grangier, D.(2017)。Language modeling with gated convolutional networks933-941。 | 11. | Karita, S.、Soplin, N. E. Y.、Watanabe, S.、Delcroix, M.、Ogawa, A.、Nakatani, T.(2019)。Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration。The 20th Annual Conference of the International Speech Communication Association,1408-1412。 | 12. | Kürzinger, L.、Winkelbauer, D.、Li, L.、Watzel, T.、Rigoll, G.(2020)。CTC-segmentation of large corpora for German end-to-end speech recognition。22nd International Conference on Speech and Computer,267-278。 | 13. | Vaswani, Ashish、Shazeer, Noam、Parmar, Niki、Uszkoreit, Jakob、Jones, Llion、Gomez, Aidan N.、Kaiser, L.、Kaiser, Łukasz、Polosukhin, Illia(2017)。Attention is all you need。31st Annual Conference on Neural Information Processing Systems,5998-6008。 | 單篇論文1. | Ba, J. L.,Kiros, J. R.,Hinton, G. E.(2016)。Layer normalization(1607.06450)。 | 2. | Kingma, D. P.,Ba, Jimmy Lei(2014)。Adam: A Method for Stochastic Optimization(1412.6980)。 | 3. | Dai, Z.,Yang, Z.,Yang, Y.,Carbonell, J.,Le, Q. V.,Salakhutdinov, R.(2019)。Transformer-xl: Attentive language models beyond a fixed-length context(1901.02860)。 | 4. | Gulati, A.,Qin, J.,Chiu, C.-C.,Parmar, N.,Zhang, Y.,Yu, J.,Han, W.,Wang, S.,Zhang, Z.,Wu, Y.,Pang, R.(2020)。Conformer: Convolution-augmented transformer for speech recognition(2005.08100)。 | 5. | Park, D. S.,Chan, W.,Zhang, Y.,Chiu, C.-C.,Zoph, B.,Cubuk, E. D.,Le, Q. V.(2019)。Specaugment: A simple data augmentation method for automatic speech recognition(1904.08779)。 | 6. | Lu, Y.,Li, Z.,He, D.,Sun, Z.,Dong, B.,Qin, T.,Wang, L.,Liu, T.-Y.(2019)。Understanding and improving transformer from a multi-particle dynamic system point of view(1906.02762)。 | 7. | Ramachandran, P.,Zoph, B.,Le, Q. V.(2017)。Searching for activation functions(1710.05941)。 | 8. | Sennrich, R.,Haddow, B.,Birch, A.(2015)。Neural machine translation of rare words with subword units(1508.07909)。 | |