You are here

Research

Automatic Speech Recognition
 
Automatic speech recognition (ASR) converts human speech waveform to text. Statistical ASR approaches are the focus. HMM-based acoustic modelling, statistical language model and decoding algorithm are the main areas. Research topics include, but not limited to, adaptation, low-resource ASR, robust and multi-lingual ASR, deep learning, discriminative training and software engineering for ASR.
 
 
Statistical Speech Synthesis
 
Speech synthesis, or Text-To-Speech (TTS) is the artificial reproduction of natural speech. A text-to-speech (TTS) system converts normal language text into speech. We mainly focus on Hidden Markov Model (HMM) based synthesis methods. Research goal is to synthesise highly expressive and flexible speech.
 
 
Spoken Dialogue System
 
Spoken Dialogue System (SDS) research mainly focus on the application of statistical approaches to speech understanding and dialogue management. SDS architecture, joint optimisation and system engineering are also studied. The aim is to build intelligent end-to-end systems, especially task-oriented systems, which can explicitly deal with the uncertainty arising in human-machine interaction and correctly understand the intention of the users.
 
 
Rich Audio Analysis
 
Rich Audio Analysis (RAA) focus on analysis and classification of non-text information within human speech. The information may involve speaker,emotion, noise, speaking style and so on. In addition, pronunciation evaluation and oral communication skill evaluation are related research topics. The aim is to use intelligent speech technology to assist language learning and examination.

 


Selected Publication

2015

  • Book
    • Kai Yu. Statistical Models for Dealing with Discontinuity of Fundamental Frequency. in Keikichi Hirose and Jianhua Tao (eds.), Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis, Prosody, Phonology and Phonetics. Springer-Verlag, Berlin Heidelberg 2015: 123-144. 
  • Conference
    • Tianxing He, Xu Xiang, Yanmin Qian and Kai Yu. Recurrent Neural Network Language Model with Structured Word Embeddings for Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Brisbane, Australia, 2015:5396-5400. 
    • Tian Tan, Yanmin Qian, Maofan Yin, Yimeng Zhuang and Kai Yu. Cluster Adaptive Training For Deep Neural Network. IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Brisbane, Australia, 2015: 4325-4329. [IEEE Spoken Language Processing Student Travel Grant].

2014

  • Technical Report
    • Kai Yu and Lu Chen. Dialogue state tracking in statistical dialogue management. IEICE Technical Report, SP2014-108(2014-12). 2014. [Invited Talk]. 
  • Journal
    • Kai Yu, Lu Chen, Bo Chen, Kai Sun and Su Zhu. Cognitive Technology in Task-oriented Dialogue Systems – Concepts, Advances and Future. Chinese Journal of Computers, 2014, 37.(In Chinese, Online Publication) 
  • Conference
    • Kai Sun, Lu Chen, Su Zhu and Kai Yu. A Generalized Rule Based Tracker for Dialogue State Tracking. IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, USA, 2014: 330-335. 
    • Su Zhu, Lu Chen, Kai Sun, Da Zheng and Kai Yu. Semantic Parser Enhancement for Dialogue Domain Extension with Little Data. IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, USA, 2014: 336-341.
    • Zhehuai Chen, Kai Yu, An Investigation of Implementation and Performance Analysis of DNN Based Speech Synthesis System. 12th IEEE International Conference on Signal Processing(ISCP), Hangzhou, 2014: 577-582. 
    • Sibo Tong, Nanxin Chen, Yanmin Qian and Kai Yu. Evaluating VAD For Automatic Speech Recognition. 12th IEEE International Conference on Signal Processing(ICSP), Hangzhou, 2014: 2308-2314. 
    • Suliang Bu, Yanmin Qian and Kai Yu. A Novel Dynamic Parameters Calculation Approach For Model Compensation. 15th Annual Conference of the International Speech Communication Association(InterSpeech), Singapore, 2014: 2744-2748. 
    • Tianfan Fu, Yanmin Qian, Yuan Liu and Kai Yu. Tandem Deep Features for Text-Dependent Speaker Verification. 15th Annual Conference of the International Speech Communication Association(InterSpeech), Singapore, 2014: 1327-1331. 
    • Jianwei Niu, Yanmin Qian and Kai Yu. Acoustic Emotion Recognition using Deep Neural Network. The 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, 2014: 128-132. 
    • Kai Sun, Lu Chen, Su Zhu and Kai Yu. The SJTU System for Dialog State Tracking Challenge 2. The 15th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), Pennsylvania, USA, 2014: 318-326. 
    • Wei Deng, Yanmin Qian, Yuchen Fan, Tianfan Fu and Kai Yu. Stochastic Data Sweeping for Fast DNN Training. IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Florence, Italty, 2014: 240-244.
    • Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, and Kai Yu. Reshaping Deep Neural Network for Fast Decoding by Node-pruning. IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Florence, Italty, 2014:245-249. 
    • Suliang Bu, Yanmin Qian, Khe Chai Sim, Yongbin You, and Kai Yu. Second Order Vector Taylor Series Based Robust Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Florence, Italty, 2014: 1788-1792. 
    • Yuan Liu, Tianfan Fu, Yuchen Fan, Yanmin Qian and Kai Yu. Speaker Verification with Deep Features. IEEE International Joint Conference on Neural Networks(IJCNN), Beijing, China, 2014: 747-753. 

2013:

  • Yanmin Qian, Kai Yu and Jia Liu. Combination of Data Borrowing Strategies for Low-Resource LVCSR. ASRU, Olomouc, Czech Republic, 2013.
  • Yanmin Qian and Jia Liu. MLP-HMM Two-Stage Unsupervised Training for Low-Resource Languages on Conversational Telephone Speech Recognition. !InterSpeech, Lyon, France, 2013.
  • Kai Yu and Hainan Xu. Cluster Adaptive Training With Factorized Decision Trees For Speech Recognition. !InterSpeech, Lyon, France, 2013.
  • Peilu Wang, Ruihua Sun, Hai Zhao and Kai Yu. A New Word Language Model Evaluation Metric For Character Based Languages. The 12th China National Conference on Computational Linguistics (CNNCL), Suzhou, China, 2013.

2012:

  • Hainan Xu, Yuchen Fan and Kai Yu. Development of the 2012 SJTU HVR System. The 14th ACM International Conference on Multimodal Interaction (ICMI), Santa Monica, U.S.A., 2012.
  • Khechai Sim, Shengdong Zhao, Kai Yu and Hank Liao. ICMI'12 Grand Challenge - Haptic Voice Recognition. The 14th ACM International Conference on Multimodal Interaction (ICMI), Santa Monica, U.S.A., 2012.
  • Kai Yu, Review of F0 modelling and generation in HMM based speech synthesis. In The 11th International Conference on Signal Processing (ICSP), Beijing, 2012.