You are here

“Enriching Sound Interactions throughComputer Audition” by Dr. Zhiyao Duan

May 26, 2017, we are honered to have Dr. Zhiyao Duan to come and give a talk about “Enriching Sound Interactions throughComputer Audition”.

Dr. Zhiyao Duan is an assistant professor anddirector of the Audio Information Research (AIR) lab in the Department ofElectrical and Computer Engineering at the University of Rochester. He receivedhis B.S. in Automation and M.S. in Control Science and Engineering fromTsinghua University, China, in 2004 and 2008, respectively, and received hisPh.D. in Computer Science from Northwestern University in 2013. His researchinterest is in the broad area of computer audition, i.e., designingcomputational systems that are capable of understanding sounds, includingmusic, speech, and environmental sounds. Specific problems that he has beenworking on include automatic music transcription, audio-score alignment, sourceseparation, speech enhancement, sound retrieval, and audio-visual analysis ofmusic. He has published 40 peer-reviewed journal and conference papers. Heco-presented a tutorial on automatic music transcription at the ISMIRconference in 2015. His research is funded by the National Science Foundation.

Sound is an important medium for us tointeract with the world. Humans have a long history of designing tools andsystems to create, modify, record and transmit sounds, which have greatlyenriched our interactions. Designing intelligent computational systems that areable to understand various kinds of sound is the goal of computer audition. Itsprogress, such as that in speech recognition, is again quickly enriching ourinteractions. In this talk, I will present our effort in designing computeraudition systems for non-speech signals. Specifically, I will talk about twoongoing research projects, one on automatic music transcription and one onsound retrieval. The former converts an acoustic piano performance into musicnotation at a high accuracy and readability, allowing musicians to analyzemusically meaningful content in acoustic piano performances. The latter takes avocal imitation as a query and returns a list of sounds that are similar to it,allowing ordinary people to go beyond text-based search for sounds.