We have Alexa in three rooms in our house – living room, kitchen and bedroom and they’re all used every day. I use it for work (calculations for VAT, invoices, scheduling), cooking (timers), shopping (lists), lights (off at night), robot vacuum cleaner and lots of queries. Google Assistant on my Pixel phone is now my PA. Voice, through use and habit, has become part of my life – my frictionless interface – easy and convenient.
As one of the great triumphs of AI voice is on our phones, in our cars and in our homes. Amazon, Google, Microsoft and Apple all see it as a strategic technological advance. We take years learning how to read and write, yet we listen and speak almost effortlessly, grammatical geniuses aged three.
So it was great to come across a readable book that dealt with the territory to date. The history of voice and chatbots is well covered as it did not spring out of nowhere but from centuries of maths, statistics and probability theory, then pioneers who applied the maths, and AI, to the recognition, understanding and generation of language and voice.
Vlahos explains why all the moral hysteria around the gender of voice assistants is misplaced. Far from being a patriarchal plot; Siri, Cortana, Alexa and Google Assistant were all extensively researched and all but one give users the choice of gender. Turns out that even in the womb, a woman’s voice is liked and trusted. We are not only wired for speech but for female speech. The research showed that female voices win hands down. There are also fascinating insights into the personas chosen, all very different.
The chapter on AI, machine learning, deep learning, backpropagation, supervised and unsupervised learning is told well, not too technical. The technology behind speech recognition, language understanding and speech generation is also readable. Good also to see the issue of information retrieval from both structured and unstructured data dealt with - search, knowledge graphs, the Stanford Question Answering Dataset.
He then moves on to the next level with a chapter on ‘conversation’ describing progress through competitions to win the Alexa Prize, Loebner Prize and Winograd Scheme Challenge, These show how difficult it is to sustain conversation in chatbots, with the need for human scripting as well as an ensemble of programming and AI techniques. Above all you learn that the data gathering makes these systems better and better. As Vlahos says, “Voice AIs blur boundaries” of intimacy, privacy, mind and machine, fact and fiction, life and death.
The implications when voice becomes a dominant force may be the weakening of ad revenue as a business model. How do you get your voice heard? Mobile is accelerating voice as are home devices and the demand for voice search and smart assistants. Voice is here to stay and although people think that AI has a heart of stone, chatbots and voice bring humanity to technology. It is an illusory humanity, of course, but it can represent and reflect us, making technology at least more humane, certainly more usable.
Not much has been written on ‘voice’ despite its dramatic rise in consumer technology so for anyone who is involved in AI, chatbots, IOT and wants a feel for this particular strand of technology, this book mines the voice vein rather nicely.
It is a pity that this is not being adopted more widely in online learning, beyond language learning. We have ‘voice recognition’ working in WildFire, an AI content generation tool that creates online learning in minutes not months.