Welcome to SJTU X-LANCE Lab!

创立时间

X-LANCE实验室全称跨媒体语言智能实验室,英文Cross Media (X-)Language Intelligence Lab, 最初成立于2012年,2020年由原SpeechLab和E-learning实验室合并,目前有教师5人,博士、硕士和本科生共计80余人。

本科生实习

实验室本科生实习成绩卓著:累计获市级及校级优秀学士毕业生6人,优异学士毕业论文4人,CCF优秀大学生4人(每年计算机系共2人),共发表国际会议及期刊31篇,其中第一作者11篇,参加国际会议8次。

实验室资源

拥有两个专业的并行计算集群、标注音频数千小时、文本数亿句子,拥有海量音频和文本数据研究使用权。

实验室理念

注重学生工程科研能力两个方面的培养。坚信一个优秀的科学家必定是一个优秀的工程师,一个优秀的工程师必定也是一个优秀的科学家。'

实验室准则

关注实验室成员的意志品质,做人要诚信,保持好奇心,做事要严谨,注意细节与态度,做学问要勤奋,积极主动独立思考。

X-LANCE Lab Group

Automatic Speech Recognition

2 Ph.D | 6 Masters | 6 Bachelors

Automatic speech recognition (ASR) converts human speech waveform to text. Statistical ASR approaches are the focus. HMM-based acoustic modelling, statistical language model and decoding algorithm are the main areas. Research topics include, but not limited to, adaptation, low-resource ASR, robust and multi-lingual ASR, deep learning, discriminative training and software engineering for ASR.

Statistical Speech Synthesis

1 Ph.D | 2 Masters | 2 Bachelors

Speech Synthesis is the technique to produce natural human speech. It mainly consists of Text-to-speech (TTS) and Voice Conversion (VC). The TTS system produces the human speech from natural language. We follows the latest end-to-end techniques (e.g. Tacotron, WaveNet) to improve the quality and expressiveness of the generated waveform. The VC system converts the speech waveform from a source style to a target style (e.g. speaker, emotion). Our research interest is to improve the naturalness and similarity of the converted speech.

Spoken Dialogue System

2 Ph.D | 0 Masters | 7 Bachelors

Spoken Dialogue System (SDS) research mainly focus on the application of statistical approaches to speech understanding and dialogue management. SDS architecture, joint optimisation and system engineering are also studied. The aim is to build intelligent end-to-end systems, especially task-oriented systems, which can explicitly deal with the uncertainty arising in human-machine interaction and correctly understand the intention of the users.

Spoken Language Understanding

1 Ph.D | 1 Masters | 2 Bachelors

Spoken Language Understanding (SLU) serves as an interface between ASR and SDS, which converts a sentence to a structured representation of user meaning. Unlike general-domain NLU, SLU focuses only on specific application domains (in the current state of technology). Typically, SLU includes three tasks like domain classification, intent detection, slot filling. Our main research interests focus on deep learning for SLU, SLU domain adaptation & transfer, ASR-error robust SLU, deeper understanding, end-to-end SLU and so on.

Rich Audio Analysis

2 Ph.D | 2 Masters | 3 Bachelors

Rich Audio Analysis (RAA) focus on analysis and classification of non-text information within human speech. The information may involve speaker,emotion, noise, speaking style and so on. In addition, pronunciation evaluation and oral communication skill evaluation are related research topics. The aim is to use intelligent speech technology to assist language learning and examination.

Language Model

1 Ph.D | 3 Masters | 1 Bachelors

Language Model (LM) researches the statistical probability distribution of human languages. LM is usually used in natural language processing, speech recognition, machine translation, handwriting recognition and other applications. Our aim is to propose general LM for both evaluation and generation. We are now focus on the combination of traditional statistical LM and deep learning or reinforcement learning. We are also interested in structured LSTM LM and large vocabulary LM applications.