Selected Publications

Thesis

  • Kai Yu (2006).
    Adaptive training for large vocabulary continuous speech recognition.

Journal Papers

  • Kai Sun, Qizhe Xie and Kai Yu.
    Recurrent Polynomial Network for Dialogue State Tracking.
    Dialogue & Discourse, vol. 7, no. 3, 65-88, 2016
  • Tian Tan, Yanmin Qian and Kai Yu.
    Cluster Adaptive Training for Deep Neural Network Based Acoustic Model .
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, 459-468, 2016.
  • Kai Yu, Lu Chen, Kai Sun, Qizhe Xie and Su Zhu.
    Evolvable dialogue state tracking for statistical dialogue management.
    Frontiers of Computer Science, vol. 10, no. 2, 201-215, 2016.
  • Kai Yu, Kai Sun, Lu Chen and Su Zhu.
    Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking.
    IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, no. 12, 2177-2188, 2015.
  • Yuan Liu, Yanmin Qian, Nanxin Chen, Tianfan Fu, Ya Zhang and Kai Yu.
    Deep Feature for Text-dependent Speaker Verification.
    Speech Communication, vol. 73, 1-13, 2015.
  • Kai Yu, Lu Chen, Bo Chen, Kai Sun and Su Zhu.
    Cognitive Technology in Task-oriented Dialogue Systems – Concepts, Advances and Future.
    Chinese Journal of Computers, 2014, 37.
  • K. Yu, H. Zen, F. Mairesse and S. Young.
    Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis.
    Speech Communication, vol.53, no.6, 914--923, 2011.
  • K. Yu and S. Young.
    Continuous F0 modelling for HMM based statistical parametric speech synthesis.
    IEEE Transactions on Audio, Speech and Language Processing, vol.19, no.5, 1071--1079, 2011.
  • K. Yu, M. J. F. Gales, L. Wang and P. C. Woodland.
    Unsupervised training and directed manual transcription for LVCSR.
    Speech Communication, 52, 652--663, 2010.
  • S. Young, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson and K. Yu.
    The hidden information state model: a practical framework for POMDP-based spoken language management.
    Computer Speech and Language, vol. 24, no. 2, 150--174, 2009.
  • K. Yu, M. J. F. Gales and P. C. Woodland.
    Unsupervised adaptation with discriminative mapping transforms.
    IEEE Transactions on Audio, Speech and Language Processing, vol.17, no.4, 714--723, 2009.
  • K. Yu and M. J. F. Gales.
    Bayesian adaptive inference and adaptive training.
    IEEE Transactions on Audio, Speech and Language Processing, vol.15, no.6, 1932--1943, 2007.
  • K. Yu and M. J. F. Gales.
    Discriminative cluster adaptive training.
    IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no.5, 1694--1703, 2006.
  • K. Yu and L. Ji.
    Karyotyping of CGH human metaphases using kernel nearest-neighbor algorithm.
    Cytometry, vol. 48, no.4, 202--208, 2002.
  • K. Yu, L. Ji and X. Zhang.
    Kernel nearest-neighbor algorithm.
    Neural Processing Letters, vol. 15, no. 2, 147--156, 2002.
  • K. Yu, L. Ji, L. Wang and P. Xue.
    How to optimize OCT image.
    Optics Express, vol. 9, no. 1, 24--35, 2001.

Peer Reviewed Conference Papers

  • Maofan Yin, Sunil Sivadas, Kai Yu, Bin Ma. (2016)
    Discriminatively Trained Joint Speaker and Environment Representations for Adaptation of Deep Neural Network Acoustic Models.
    ICASSP 2016.
  • Sibo Tong, Hao Gu and Kai Yu. (2016)
    A Comparative Study of Robustness of Deep Learning Approaches for VAD.
    ICASSP 2016.
  • Yanmin Qian, Maofan Yin, Yongbin You and Kai Yu. (2015)
    Multi-Task Joint-Learning of Deep Neural Networks for Robust Speech Recognition.
    IEEE ASRU 2015.
  • Mengxiao Bi, Yanmin Qian, Kai Yu. (2015)
    Very Deep Convolutional Neural Networks for LVCSR.
    InterSpeech 2015.
  • Bo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu. (2015)
    An Investigation of Context Clustering for Statistical Speech Synthesis with Deep Neural Network.
    InterSpeech 2015.
  • Nanxin Chen, Yanmin Qian, Heinrich Dinkel, Bo Chen, Kai Yu. (2015)
    Robust Deep Feature for Spoofing Detection - The SJTU System for ASVspoof 2015 Challenge.
    InterSpeech 2015.
  • Nanxin Chen, Yanmin Qian, Kai Yu. (2015)
    Multi-Task Learning for Text-dependent Speaker Verification.
    InterSpeech 2015.
  • Wengong Jin, Tianxing He, Yanmin Qian, Kai Yu. (2015)
    Paragraph Vector based Topic Model for Language Model Adaptation.
    InterSpeech 2015.
  • Qizhe Xie, Kai Sun, Su Zhu, Lu Chen and Kai Yu. (2015)
    Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers.
    SigDial 2015.
  • Tianxing He, Xu Xiang, Yanmin Qian and Kai Yu. (2015)
    Recurrent Neural Network Language Model with Structured Word Embeddings for Speech Recognition.
    ICASSP 2015.
  • Tian Tan, Yanmin Qian, Maofan Yin, Yimeng Zhuang and Kai Yu. (2015)
    Cluster Adaptive Training For Deep Neural Network.
    ICASSP 2015.
  • Kai Sun, Lu Chen, Su Zhu and Kai Yu. (2014)
    A generalized rule based tracker for dialogue state tracking.
    IEEE SLT 2014.
  • Su Zhu, Lu Chen, Kai Sun, Da Zheng and Kai Yu. (2014)
    Semantic enhancement for dialogue domain extension with little data.
    IEEE SLT 2014.
  • Tianfan Fu, Yanmin Qian, Yuan Liu and Kai Yu. (2014)
    Tandem Deep Features for Text-Dependent Speaker Verification.
    INTERSPEECH 2014.
  • Suliang Bu, Yanmin Qian and Kai Yu. (2014)
    A Novel Dynamic Parameters Calculation Approach for Model Compensation.
    INTERSPEECH 2014.
  • Kai Sun, Lu Chen, Su Zhu and Kai Yu. (2014)
    The SJTU System for Dialog State Tracking Challenge 2.
    SigDial 2014.
  • Wei Deng, Yanmin Qian, Yuchen Fan, Tianfan Fu and Kai Yu. (2014)
    Stochastic Data Sweeping for Fast DNN Training.
    ICASSP 2014.
  • Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, and Kai Yu. (2014)
    Reshaping Deep Neural Network for Fast Decoding by Node-pruning.
    ICASSP 2014.
  • Suliang Bu, Yanmin Qian, Khe Chai Sim, Yongbin You, and Kai Yu. (2014)
    Second Order Vector Taylor Series Based Robust Speech Recognition.
    ICASSP 2014.
  • Yanmin Qian, Kai Yu and Jia Liu. (2013)
    Combination of Data Borrowing Strategies for Low-Resource LVCSR.
    IEEE ASRU 2013.
  • Kai Yu and Hainan Xu. (2013)
    Cluster Adaptive Training With Factorized Decision Trees For Speech Recognition.
    INTERSPEECH 2013.
  • H. Xu, Y. Fan and Kai Yu. (2012)
    Development of the 2012 SJTU HVR System.
    ICMI 2012.
  • Kai Yu. (2012)
    Review of F0 modelling and generation in HMM based speech synthesis.
    ICSP 2012.
  • Matthew Henderson, Milica Gasic, Blaise Thomson, Pirros Tsiakoulis, Kai Yu, Steve Young. (2012)
    Discriminative Spoken Language Understanding Using Word Confusion Net- works.
    IEEE SLT 2012.
  • M. Gasic, P. Tsiakoulis, M. Henderson, B. Thomson, K. Yu, E. Tzirkel and S. Young. (2012)
    The effect of cognitive load on a statistical dialogue system.
    SIGDIAL 2012.
  • Pirros Tsiakoulis, Milica Gasic, Matthew Henderson, Jorge Prombonas, Blaise Thom- son, Kai Yu, Steve Young. (2012)
    Statistical Methods for Building Robust Spoken Dialogue Systems in an Automobile.
    ICAHFE 2012.
  • Milica Gasic, Filip Jurcicek, Blaise Thomson, Kai Yu and Steve Young. (2011)
    On-line policy optimisation of spoken dialogue system via live interaction with human subjects .
    IEEE ASRU 2011.
  • A. W. Black, S. Burger, A. Conkie, H. Hastie, S. Keizer, O. Lemon, N. Merigaud, G. Parent, G. Schubiner, B. Thomson, J. D. Williams, K. Yu, S. Young and M. Eskenazi. (2011)
    Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results .
    SIGDial 2011.
  • F. Jurcicek, S. Keizer, M. Gasic, F. Mairesse, B. Thomson, K. Yu, and S. Young. (2011)
    Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk.
    INTERSPEECH 2011.
  • K. Yu and S. Young. (2011)
    Joint modelling of voicing label and continuous F0 for HMM based speech synthesis.
    ICASSP 2011.
  • L. Jia, K. Yu and B. Xu. (2011)
    Structured precision modelling with Cholesky basis superposition for speech recognition.
    ICASSP 2011.
  • B. Thomson, K. Yu, S. Keizer, M. Gasic, F. Jurcicek, F. Mairesse and S. Young. (2010)
    Bayesian dialogue system for the Let's Go spoken dialogue challenge.
    IEEE SLT 2010.
  • B. Thomson, F. Jurcicek, M. Gasic, S. Keizer, F. Mairesse, K. Yu and S. Young. (2010)
    Parameter learning for POMDP spoken dialogue models.
    IEEE SLT 2010.
  • K. Yu, B. Thomson and S. Young. (2010)
    From discontinuous to continuous F0 modelling In HMM-based speech synthesis.
    ISCA SSW7 2010.
  • K. Yu, H. Zen, F. Mairesse and S. Young. (2010)
    Context adaptive training with factorized decision trees for HMM-based speech synthesis.
    INTERSPEECH 2010.
  • M. Gales and K. Yu. (2010)
    Canonical state models for automatic speech recognition.
    INTERSPEECH 2010.
  • F. Jurcicek, B. Thomson, S. Keizer, F. Mairesse, M. Gasic, K. Yu and S. Young. (2010)
    Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.
    INTERSPEECH 2010.
  • M. Gasic, F. Jurcicek, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young. (2010)
    Gaussian processes for fast policy optimisation of POMDP-based dialogue managers.
    SIGDial 2010.
  • S. Keizer, M. Gasic, F. Jurcicek, F. Mairesse, B. Thomson, K. Yu and S. Young. (2010)
    Parameter estimation for agenda-based user simulation.
    SIGDial 2010.
  • F. Mairesse, M. Gasic, F. Jurcicek, S. Keizer, J. Prombonas, B. Thomson, K. Yu and S. Young. (2010)
    Phrase-based statistical language generation using graphical models and active learning.
    ACL 2010.
  • K. Yu, F. Mairesse and S. Young. (2010)
    Word-level emphasis modelling in HMM-based speech synthesis.
    ICASSP 2010.
  • M. Gasic, F. Lefevre, F. Jurcicek, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. (2009)
    Back-off action selection in summary space-based POMDP dialogue systems.
    IEEE ASRU 2009.
  • F. Lefevre, M. Gasic, F. Jurcicek, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. (2009)
    K-nearest neighbor Monte-Carlo control algorithm for POMDP-based dialogue systems.
    SIGDial 2009.
  • F. Jurcicek, M. Gasic, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. (2009)
    Transformation-based learning for semantic parsing.
    INTERSPEECH 2009.
  • K. Yu, T. Toda, M. Gasic, S. Keizer, F. Mairesse, B. Thomson and S. Young. (2009)
    Probabilistic modelling of F0 in unvoiced regions in HMM based speech synthesis.
    ICASSP 2009.
  • F. Mairesse, M. Gasic, F. Jurcicek, S. Keizer, B. Thomson, K. Yu and S. Young. (2009)
    Spoken language understanding from unaligned data using discriminative classification models.
    ICASSP 2009.
  • S. Keizer, M. Gasic, F. Mairesse, B. Thomson, K. Yu and S. Young. (2008)
    Modelling user behaviour in the HIS-POMDP dialogue manager.
    IEEE SLT 2008.
  • C. K. Raut, K. Yu and M. J. F. Gales. (2008)
    Adaptive training using discriminative mapping transforms.
    INTERSPEECH, 2008.
  • B. Thomson, K. Yu, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann and S. Young. (2008)
    Evaluating semantic-level confidence scores with multiple hypotheses.
    INTERSPEECH, 2008.
  • B. Thomson, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu and S. Young. (2008)
    User study of the Bayesian Update of Dialogue State approach to dialogue management.
    INTERSPEECH, 2008.
  • M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu and S. Young. (2008)
    Training and evaluation of the HIS POMDP dialogue system in noise.
    SIGDIAL 2008
  • K. Yu, M. J. F. Gales and P. C. Woodland. (2008)
    Unsupervised discriminative adaptation using discriminative mapping transforms.
    ICASSP 2008
  • X. Liu, W. Byrne, M. J. F. Gales, A. Gispert, M. Tomalin, P. C. Woodland and K. Yu. (2007)
    Discriminative language model adaptation for Mandarin broadcast speech transcription and translation.
    IEEE ASRU 2007
  • M. J. F. Gales, F. Diehl, C. K. Raut, M. Tomalin, P. C. Woodland and K. Yu. (2007)
    Development of a phonetic system for large vocabulary Arabic speech recognition.
    IEEE ASRU 2007
  • K. Yu, M. J. F. Gales and P. C. Woodland. (2007)
    Unsupervised training with directed manual transcription for recognizing Mandarin broadcast audio.
    INTERSPEECH 2007
  • M. J. F. Gales, X. Liu, R. Sinha, P. C. Woodland, K. Yu, S. Matsoukas, T. Ng, K. Nguyen, L. Nguyen, J.-L. Gauvain, L. Lamel, A.Messaoudi. (2007)
    Speech recognition system combination for machine translation.
    ICASSP 2007
  • M. Tomalin, M. J. F. Gales, X. Liu, K. C. Sim, R. Sinha, L. Wang, P. C. Woodland and K. Yu. (2007)
    Improving speech transcription for Mandarin-English translation.
    ICASSP 2007
  • K. Yu and M.J.F. Gales (2006).
    Incremental adaptation using Bayesian inference.
    ICASSP 2006
  • K. Yu and M.J.F. Gales (2005).
    Bayesian adaptation and adaptively trained systems.
    IEEE ASRU 2005
  • G. Evermann, H. Y. Chan, M. J. F. Gales, B. Jia, D. Mrva, P. C. Woodland and K. Yu. (2005)
    Training LVCSR systems on thousands of hours of data.
    ICASSP 2005
  • M. J. F. Gales, B. Jia, X. Liu, K. C. Sim, P. C. Woodland and K. Yu. (2005)
    Development of the CUHTK 2004 Mandarin conversational telephone speech transcription system.
    ICASSP 2005
  • X. Liu, M. J. F. Gales, K. C. Sim and K. Yu. (2005)
    Investigation of acoustic modeling techniques for LVCSR systems.
    ICASSP 2005
  • K. Yu and M.J.F. Gales (2004).
    Adaptive Training Using Structured Transforms.
    ICASSP 2004
  • S. E. Tranter, K. Yu, G. Evermann and P. C. Woodland. (2004)
    Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech.
    ICASSP 2004

Technical Reports

  • K. Yu and M.J.F. Gales (2006).
    Bayesian adaptation and adaptive training.
    Technical Report CUED/F-INFENG/TR542
  • K. Yu and M.J.F. Gales (2004).
    Discriminative cluster adaptive training.
    Technical Report CUED/F-INFENG/TR486
  • S. E. Tranter, K. Yu, D. A. Reynolds, G. Evermann, D. Y. Kim and P. C. Woodland.
    An investigation into the interactions between speaker diarisation systems and automatic speech transcription.
    Technical Report CUED/F-INFENG/TR464