Thursday, February 15, 2024

Sora and Gemini 1.5 - two mind blowing releases within hours of each other


No sooner than I had written about how important ‘context windows’ for using AI for teaching and learning, within 24 hour Google have written about their next release Gemini Pro 1.5, which blows the whole market open – and guess what the great innovation is? A MASSIVE INCREASE IN THE ‘CONTEXT WINDOW’. 

Then, within a few hours another announcement – OpenAI’s release of Sora and we have an absolutely INSANE text to video model from OpenAI. Creates real & novel scenes just from text descriptions. This is a flip moment, as we all thought this was years off... implications for learning - huge... crazy good videos, lighting and movement. Not only that we see something interesting way out on the horizon. The whole Hollywood, Netflix thing is now up for grabs. Social media may well become the new source of entertainment and art.

The context window, what the model can ingest has just gone through the roof, in fact several roofs. They plant to that start with the standard 128,000 context window, then scale up to 1 million tokens, as they improve the model.  This will mean it can take in huge amounts of tokens and is multimodal. Whole books it east for breakfast, collections of documents, full movies, a whole series of podcasts.

The examples are compelling, so here’s just a few sets of seven. I could have given tons more….

Text

It can ingest giant novels then find exactly what you need. They took Victor Hugo’s five-volume novel “Les Misérables”, which is an astonishing 1382 pages, sketched a scene and asked “Look at the event in this drawing. What page is this on?” Got it right.

The opportunities in learning are many:

1. Summarising any document, not matter how long

2. Finding something within an enormous text file

3. Huge sets of HR documentation tuned into accessible resource

4. Use by a tutorbot to answer student queries and questions

5. Feedback and marking text assessments

Audio

I have recorded a large series of 30 podcasts on Great Minds on Learning. They’re an hour each and the initial tests suggest these could be ingested and used for learning.

The opportunities in learning are again many:

1. Summarising tons of podcasts

2. Finding specific chunks of podcasts to answer a query

3. Interpreting communication skills

4. Feedback and marking of spoken assessments

Video

It gobbles up entire movies and you can ask questions about what happened in those movies. The Buster Keaton example interprets a pawn ticket taken from someone’s pocket. The model can answer complex questions about the video content, or even from a primitive line drawing.

The opportunities in learning are once again many:

1. Allowing search of video for performance support on a specific task then playing it back

2. Allowing the learner to ask for more detail on a specific event or task

3. Looking for a specific solution to a specific problem 

4. Interpreting a trainee’s performance from video identifying successes and failure, with feedback on correcting and improving performance

5. Taking a lecture and annotating it with extra resources

6. Turning any video into a deeper learning experience

7. Interpreting video assessments where content & behaviour matters

Conclusion

These two releases alone will have a huge impact in learning. They bring video PLUS AI ingestion and interpretation of video into play. But we have to be careful. Video is an odd medium for learning. We tend to think it more powerful than it is. that is because of the transience effect. I covered this in detail in my book Learning Experience Design. This is NOT about the generation of media but about the generation of learning.


No comments: