Article snapshot taken from Wikipedia with creative commons attribution-sharealike license.
Give it a read and then ask your questions in the chat.
We can research this topic together.
Vu Digital's service uses predictive analytics, natural language processing, face recognition, object recognition, and audio/image detection to extract metadata from video including transcripts of the audio, and time-tagged references to screen text and appearances of persons or images of interest, such as objects or logos. The core technology includes splitting a video into two components, audio and the video frames. Both components are then processed using speech-to-text transcription, text extraction from images, facial recognition and image recognition. The output is not only the transcript of the video and image frames, but metadata that is timestamped with frame references. For example, if a brand logo is found an hour and a half into a video, the metadata would include that time reference a 01:30. With this technological approach video classification/clustering, search engine indexing, and personalization for content, including targeted advertisements are possible.
Technologies and patents
Vu Video-to-Data (V2D) translates video images and audio to text, affording video producers the ability to tag their content with metadata making it more searchable. Vu Digital has patents pending for Video-to-Data & Vu Finder.