Misplaced Pages

LipNet

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Deep learning model for audio-visual speech recognition
This article contains close paraphrasing of a non-free copyrighted source, https://ui.adsabs.harvard.edu/abs/2016arXiv161101599A/abstract (Copyvios report). Relevant discussion may be found on the talk page. Please help Misplaced Pages by rewriting this article with your own words. (February 2021) (Learn how and when to remove this message)

LipNet is a deep neural network for audio-visual speech recognition (ASVR). It was created by University of Oxford researchers Yannis Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. The technique, outlined in a paper in November 2016, is able to decode text from the movement of a speaker's mouth. Traditional visual speech recognition approaches separated the problem into two stages: designing or learning visual features, and prediction. LipNet was the first end-to-end sentence-level lipreading model that learned spatiotemporal visual features and a sequence model simultaneously. Audio-visual speech recognition has enormous practical potential, with applications such as improved hearing aids, improving the recovery and wellbeing of critically ill patients, and speech recognition in noisy environments, implemented for example in Nvidia's autonomous vehicles.

References

  1. Assael, Yannis M.; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (2016-12-16). "LipNet: End-to-End Sentence-level Lipreading". arXiv:1611.01599 .
  2. "AI that lip-reads 'better than humans'". BBC News. November 8, 2016.
  3. "Home Elementor". Liopa.
  4. Vincent, James (November 7, 2016). "Can deep learning help solve lip reading?". The Verge.
  5. Quach, Katyanna. "Revealed: How Nvidia's 'backseat driver' AI learned to read lips". www.theregister.com.
Categories:
LipNet Add topic