It seems that not a single day could pass without yet another method to surveil humans entering the fray. This time it is a lip-reading AI, that is allegedly “ outperform professional lip readers and the best AI to date” (there are some, like LipNet and of Watch, Attend, and Spell (WAS)). According to the scientific media the folks at DeepMind in London (whom just recently called the AI halo no other than ‘alchemy’) took to the YouTube and trained their AI on not less than 140.000 hours of video of talking heads.
Afterwards they filtered all videos that weren’t met the criterion for the deep learning stage (like non-english lingo) and cropped those parts of the videos where the mouths were visible. These endeavours resulted in a large raw dataset of phonemes. Phonemes are any of a small set of units, considered to be the basic distinctive units of speech sound by which words, and sentences are represented in the human speech. There are usually about 20 to 60 in number, which are different for each language,
According to the paper submitted by the researchers, the AI was then shown a 37-minutes-long video where it correctly identified 59% of the spoken words only by the lip movement. We are happy with it, if it will only be used as an aid for the deaf, but since the technology is around we think that it is reasonably assumable that it will be used in the near future for surveillance aswell.
And that won’t make our life easier in the future if machines will be able to read our lips, period. (Remember: there are OUR machines – and THEIR machines, but the technology is common 🙂 )