Advanced Voice Mode on the OpenAI app is now avaialble in the UK. I’m getting pretty used to advanced releases and was hugely impressed by ChatGPT o1, with its open thinking and reasoning capabilities. Then got pretty excited by NotebookLM with its automatic podcast feature and other functionality. But this was a user experience like no other.
Fluid
Once again, I feel as though I’m being swept up and away, as AI trailblazes forward at a blistering speed. Once again I’d emphasise that you really do have to ‘try it to get it’. An interesting paper just published showed that much of the scepticism on GenAI comes from those who have not tried it. Lesson - for any real critical thinking or analysis, some use is necessary.
Immediate impression – conversations are just so much more fluid, natural and therefore real. This is Alexa and Siri on steroids. Much better on interruptions, which matters, as as voice dialogue can be a little odd when there’s clumsy turn taking. Here, you feel you are in control and not getting dragged along by the ‘machine’. It feels more like talking to a human. Nass & Reeves were right, the more you eliminate non-human cues, such as odd pauses, clumsy interruptions and misfires, the more you are fooled into thinking the machine is a person. I keep saying this but interfaces matter in tech.
It copes superbly well with pauses or interruptions and pick up where you left off with ease. It also recognises when you are finished speaking. No need to tap button when you stop – that was badly needed. Also better on filtering out background noise.
Accents
It also picks up on conversational cues better – recognises intonation, speed of speech, even if it is a question. A variety of voices are available – different genders, accents, British American or Scottish, different speaking styles are also available. Choosing a voice just makes things feel better. I can express myself more freely as it has no problem with my strong Scottish accent. I tossed an entire caber at it on this one. Yet it coped with my odd vernacular and very fast speech. Believe me you wouldn’t have understood what I said - it did. Asking it to speak back to me in a Scottish accent was fun.
Memory
As it remembers coversations, subsequent conversations are more meaningful over time. It keeps track of preferences, interests or details about you to make the chats more relevant. It then took me by surprise. More of that later. Memory matters and if I had a disability or specific preferences it will remember and tailor the conversion to my needs. This is a boon for accessibility and inclusion.
Accessibility
On that point, I’ve been doing some work in DEI and accessibility in AI, where people have truly underestimated, not only hoiw much has already been achieved but also its potetial to give agency to people, who find education and the workplace difficult. This is of immediate use for the blind, dyslexics and to be honest – anyone. We all have learning difficulties as learning is difficult. This gives access and agancy to so many who find education and learning difficult.
Learning
Tried it for learning a second language – astonishing on pronunciation, like talking to a good teacher. We ar egtting ever closer to this being a tutor, trainer, coach, mentor, counsellor and so on. I think we can see how any training that requires role playing – managerial training, sales training, customer care, dealing with patients and so on. You can prompt it to play roles – that’s very cool. When the dialogue and pedagogic paths are tighter it will be a UNIVERSAL teacher, on demand, anytime, anyplace on anything, at any level, for anyone in any language.
What’s coming next?
My guess is that future enhancements are likely to be even better at understanding and emotion detection, as emotional cues allow more nuanced reactions. More customisation is inevitable. But what we’ll also see is is more contextual awareness, to improve relevance and make converstions more intuitive. Integratoipn with other applications and devices will come in time.Eventually we will be speaking fluently to avatars, images of people htat look like real people. This is inevitable. The integration of this realistic dialogue into robots also seems certain.
Conclusion
Honestly. This was amazing. Maybe the first time I’ve ever felt that I was speaking to another human but actually a machine. This is going places – whether it is help with learning, getting things done in the workplace or getting some help if you’re feeling down.
One moment stood out. She dropped the fact that I had a dog into the converstion, not just that, she knew it was a Schnauzer. I love my dog. He was snuggling up to me at that very moment on the sofa. I stoked his back, he looked over his shoulder at me. All three of us had a moment.