- About Us
- Services
- Stories
- Faisal Beg – Algorithms to Advance Research in Medicine
- Yasutaka Furukawa – Smart Building Technologies to Enhance Living Spaces and Create Opportunities
- Mo Chen – AI to Create Safe and Practical Robotics
- Sheelagh Carpendale – Understanding Data Through Interaction and Visualization
- Innovation to Improve 3D Navigation
- Voice AI is Helping Shoppers Make Better Decisions
- Geographic Information Science Can Help Better Track COVID-19
- Deep Learning to Inform Medical Diagnoses
- Protecting Killer Whales from Marine Traffic
- Using Big Data to Boost Athletic Performance
- Machine Reading for Literary Texts
- Finding a Cure for HIV with Big Data
- Linked Data for Women's History
- How Big Data Can Combat Fake News
- Algorithms for Safer Streets
- Discovering Wilde Data
- Deep Blue Data
- Big Data Meets Big Impact
- Previous Next Big Question Fund Projects
- Data Fellowships
- Using Data
- Upcoming Events
Automated Lip-Reading
Automated Lip-Reading: Extracting Speech from Video of a Talking Face
Project Team: Yue Wang (Linguistics, SFU), Ghassan Hamarneh (Computing Science, SFU), Paul Tupper (Mathematics, SFU), Dawn Behne (Psychology, The Norwegian University of Science and Technology), Joan Sereno (Linguistics, University of Kansas), Allard Jongman (Linguistics, University of Kansas).
Speaking face-to-face, voice and coordinated facial movements are simultaneously used to perceive speech. In noisy environments, seeing a speaker’s facial movements makes speech perception easier. Similarly, with multimedia, we rely on visual cues when the audio is not transmitted well (e.g., during video conferencing) or in noisy backgrounds. In the current era of social media, we increasingly encounter multimedia-induced challenges where the audio signal in the video is of poor quality or misaligned (e.g., via Skype). The next big question for speech scientists, and relevant for all multimedia users, is what speech information can be extracted from a face and whether the corresponding audio signal can be recreated from it to enhance speech.