Wednesday, February 14, 2024

AI gets massive memory upgrade - implications for AI in learning


Human memory

A strong feature of intelligence is memory. In humans this is complex, with several different system interacting; sensory, episodic, semantic, along with encoding and retrieval mechanisms. It is not as if human memory is even that good. Our sensory memory is severely limited in range and timescale. Working memory is down at three or four manipulable things within a limited timescale. Long-term memory is fallible and degrades over time, sometimes catastrophically, with dementia and Alzheimer’s. The brain could accurately be described as a forgetting machine, shown by the fact that we forget most of what we try to learn.

AI memory upgrade

The good news is that Gemini and ChatGPT both got a memory upgrade, although Gemini is massive. This is really important as, especially in learning applications, knowing what the learner has said previously does matter. This is not only a context window upgrade – that has been happening for some time, it is also persistence of memory, what it remembers and what control you have over its memory.

First it will eventually be able to remember who you are and things about you that matter for learning, such as first language, age, existing skills sets, diagnosed learning difficulties such as dyslexia, and past exchanges.  Pre-existing knowledge is the big one. One can get this done up front by feeding it personal data or the system can ‘keep in mind’ what you’ve been telling it or what it can infer. You can also harvest data from formative assessment. This can reduce redundant exchanges and increase the efficacy, speed and quality of teaching and learning using AI tutors.

You will also be able to choose from a suite of privacy controls, effectively managing memory or what Chat GPT remembers. For example, you may want to remember a lot for the purposes of a long learning experience or just have a throwaway chat.

Human v Generative AI memory

Both human memory and generative AI involve store and retrieve information. In human memory, this process is biological and involves complex neural networks. In generative AI, information is stored digitally and retrieved through different forms of neural networks on a different substrate.

We are similar but different. For example, we humans recognize patterns based on past experiences stored in our memory, generative AI models also recognize patterns in the data they have been trained on. This ability is crucial for tasks like image recognition, language translation, and generating coherent text in dialogue, as well as the generation of images, audio and video.

Just as humans learn and adapt based on their memories and experiences, generative AI models learn from the data they are exposed to. This learning process is what enables these models to generate new content that is similar in style or content to their training data as well as being trained by humans. Newer model, used in automated cars, for example, take video feeds showing what a driver would see over millions of miles driving to improve performance.

Both human memory and generative AI can generalize from past experiences to new situations. Humans use their memories to apply learned concepts to new scenarios, while generative AI uses its training to generate new outputs that it has never explicitly seen before. Human memory is also associative, meaning that one memory can trigger related memories. Generative AI models can mimic this by generating content based on associations learned from their training data. Both human memory and generative AI adapt and modify their responses or outputs over time, albeit differently. Humans learn from new experiences, while AI models can be retrained or fine-tuned with new data to change their outputs. The first is actually quite haphazard, the second more difficult but defined.

Of course, just as human memory is not a perfect record and can change over time, generative AI also does not produce perfect replicas of its training data. Instead, it creates approximations that can sometimes include errors or novel creations. An interesting aspect of this flexibility, even fallibility of memory, is that just as human creativity is deeply linked to our experiences and memories, generative AI can also 'create' new content.

Context window

One concept fundamental to AI memory is the ‘context window’, the amount of text the model can consider at one time when generating a response in dialogue. It is the maximum span of recent input - words, characters or tokens - that the model can reference while generating output, like our working or short-term memory.

The size of the context window depends on the model. Early versions of GPT had small context windows. GPT-2 had a window of 1,024 tokens, while newer versions such as GPT-3 and GPT-4 have a context window of around 4,000 tokens and now much more. The size of this window impacts how much previous text the model can 'remember' and use to inform its responses. 

This matters because if the input exceeds the model's context window, the model may lose track of earlier parts of the conversation or text. Conversely, a larger context window allows the model to maintain longer conversations or understand longer documents, providing more relevant and coherent responses. However, here’s the downside; processing longer context windows also require more computational power and memory and may also affect accuracy and the quality of the response. Large context windows in Claude led to poorer performance.

All of this matter in practical applications, especially in teaching and learning, as the context window affects tasks like conversation, content generation and text completion. For example, a larger context window allows the model to reference earlier parts of the conversation, making it more effective in maintaining contextually relevant and coherent discussions, obviously useful in teaching and learning, for both the machine tutor and the learner. There are techniques one can use to mitigate these limitations such as a ‘rolling window’ or ‘summarization’ of previous content but it is still a problem. However, this is similar to the problem human teachers face when trying to remember where different students are using their known very limited working and long-term memories.

Cost

One major issue is cost. You can expand the context window but the costs are very high, supporting RAG alternatives.

Conclusion

Generative AI has a long history from Hebbs onwards of mimicking the human brain, either directly or metaphorically. This is especially true of learning (a common word in AI) and the way neural networks evolved and work. They are not the same, indeed very different, but in both cases, humans and the machine and humans learning from the machine, memory really matters in teaching and learning. 

In one sense learning theory is memory theory, if you define learning as a relatively permanent change in long-term memory, which is a pretty good, but still partial, definition. It is a constant battle with forgetting. Keep in mind, or in your memory, however, that despite these similarities, human memory and generative AI operate on fundamentally different principles and mechanisms. Human memory is a complex, biological and messy process, deeply intertwined with consciousness and emotions, while generative AI is a technological process governed by algorithms and data. Oddly, and maybe counterintuitively, the latter approach may result in better actual performance in teaching and learning, even generally. I think this type of informed input from learning science will really improve AI tutor systems. To be fair simply increasing context windows and the functionality will most likely have the same effect.

 

No comments: