Tuesday, January 23, 2024

Real Time Avatars! Never saw that coming this quick....

Oh boy! Just when I though things were moving fast with facebook's promise of a huge open source LLM (Llama 3), Gemini and the real possibility of GPT- 5, which I think will blow everyone's mind, we have just seen something else that is extraordinary happen - real time avatars.


So I’ve been using my Synthesia avatar for a long time. I went into a London Studio had my head and eye movements captured, that’s my gnarly face and beard, along with my body movements. I then went into a separate studio to get my voice cloned – yes this good old Scottish accent.

It’s great. I simply type in whatever I want it to say and with a few minutes processing it’s my Digital Twin. I can also speak 120 languages from Afrikaans to Zulu – and yes I used both last year in South Africa. Different styles of speech, such as natural, friendly, energetic and professional are available in some of these languages.

Once you have an account it’s easy to generate, and easy to download.


This was different I simply uploaded a video of myself and got it to say anything, also in different languages. Pretty good and I often show myself talking about my German dog, Doug the schnauzer (painting behind me) – in German!


So what’s new. Well sites such as Replica and Inworld offer real time avatars that are a bit primitive. But Inworld has a ton of features, for games characters, that give you cognitive background and behaviours. Cognitively, this includes personality, background, memory, goals and emotions. Behaviours can include speech, gestures, body language, movement and event triggers. These are being sued in computer games where you can speak to NPCs (Non-Player Character) in dialogue, using a LLM-base Chatbot. 


So what’s really new? Well Heygen have just released Real Time avatars that give you dynamic and interactive experiences by streaming avatar from their servers. That was way earlier than I thought. Really way earlier!

Chat.D-ID is another...

When comparing these, there will be lots to consider, including latency and costs. But great things start somewhere.

What to do?

Most people see avatars as talking heads. In learning, a a teacher or tutor. But their most satisfying use in learning is likely to be as patients, customers, interviewees and so on. This is now all about dialogue, not monologue.

The realtime option opens up ChatGPT like dialogue in dynamic learning scenarios. Many moons ago I designed and built several of these scenarios, for interviewing one of eight candidates, appraisals, dealing with conflict in hospitals and so on. You had to script everything and the real design work came in designing the branched scenarios.

I’ve worked with hihaho, who doe seamlessly branched video and do it well. They can also incorporate lots of sophisticated interactions.

The real trick is to have many templated scenarios that deal with specific learning, and deliver specific structures. I have developed a whole set of these over the years and have got AI to provide the scripting and feedback.

Things are moving fast in AI, not even the end of January and we have real time avatars!

No comments: