Algorithms trump data
Love them or hate them, you use them and algorithms are here
to stay. Algorithmic power drives Google, Facebook, Amazon, Netflix and many
other online services, including many more professional services you use, such
as communications, finance, health, transport and so on. There is some
confusion here, as data is being touted as the next big thing but data is dead
in the water if not interpreted and then used to change or do something. If data is
the new oil, algorithmic power is the new turbo-charged engine. Another important factor here, that has led to the renewed efficacy of algorithms, is the internet. To take our transport metaphor further, if algorithms are the new rockets, the internet is the new rocket fuel, supplying and endless stream of big data to go where no man has gone before.... Houston - these metaphors are starting to break up!
What role for
algorithms in learning?
We are now in the Age of Algorithms and so far, the most promising use of educational data is through
algorithms. Yet algorithms are faceless and anonymous, hidden from view. As users, we rarely know what role they play in our lives, if we're even aware of their agency at all. Like
icebergs, their power lies hidden beneath the surface, with only a user
interface visible above the waterline. So let’s make them a little more
visible.
There are many species of algorithm in learning, with a full 5-Level taxonomy here. First there's algorithms embedded in the tech we use - mobiles, laptops, VR etc., then there's assistive algorithms such as Google search that help us find things, then analytics where we try to predict and improve things from data sets. beyond this are hybrid adaptive systems that help teachers and organisations learn. It’s like using a satnav in your car. It knows where you’ve
come from, where you’re going and how to get you back when you go off course.
It may even know when you need a rest and whether you’re comfortable with
driving on the motorway or would be best routed through other roads. Satnavs
are massively algorithmic, and personalised, as is adaptive, algorithmic
teaching.
In a learning journey, something similar can be implemented, where ensembles of algorithms can analyse data about the student and content, leading to real-time improvements in both. Note that they can do this in real-time and also learn as they go, matching the most appropriate content to the student at any given time. This can lead to quicker course completion, lower drop-out rates, higher attainment and lower costs. Finally there are fully autonomous learning systems, like autonomous cars, where the learner learns, without the aid if a teacher.
In a learning journey, something similar can be implemented, where ensembles of algorithms can analyse data about the student and content, leading to real-time improvements in both. Note that they can do this in real-time and also learn as they go, matching the most appropriate content to the student at any given time. This can lead to quicker course completion, lower drop-out rates, higher attainment and lower costs. Finally there are fully autonomous learning systems, like autonomous cars, where the learner learns, without the aid if a teacher.
How do they work?
(You can skip this if you have no interest in the background
maths.)
2500 years of algorithms
Euclid was the first to formally write down an algorithm, with Aristotle formalising syllogistic logic. But it is the Arab mathematician Al Kwariizmi that gave us the word algorithm, through the Latinised form of his name. We then have logicians like Boole and Frege, alongside probability theorists such as Pascal, Fermat, laplace, Cardano, Berouilli and Bayes. Algorithmic thinking and AI has not sprung up out of nowhere, it has had a two and a half millennia gestation period.
2500 years of algorithms
Euclid was the first to formally write down an algorithm, with Aristotle formalising syllogistic logic. But it is the Arab mathematician Al Kwariizmi that gave us the word algorithm, through the Latinised form of his name. We then have logicians like Boole and Frege, alongside probability theorists such as Pascal, Fermat, laplace, Cardano, Berouilli and Bayes. Algorithmic thinking and AI has not sprung up out of nowhere, it has had a two and a half millennia gestation period.
Bayes theorem
In 1763 a posthumously published essay by the Reverend
Thomas Bayes, presented a single theorem that updated a probability when
presented with new evidence. This gives you the ability to continue to update
probabilities in the light of new evidence, new predictors and so on, all into
a single new probability. In learning, this allows an algorithmic system to continue
to update predictions and recommendations for students and content
configuration over time. Interestingly, this often reduces the probability as
intuition, through cognitive bias, often exaggerates probabilities, through
inadequate analysis.
In addition to the use of Bayesian data analysis is the use
of a Bayesian network. This is a model that has ‘known’ and ‘unknown’ probabilities
from, say student data, behaviour and performance. The network has nodes with
variables (known and unknown) and algorithms can both make decisions and even
learn within these networks. It’s basically the application of Bayes theorem to
solve complex problems, such as the optimal path for personalised learning. The
network will therefore recommend the optimal content going forward.
Enter another important name Andrey Markov, a Russian
mathematician who introduced the Markov network. Whereas a Bayesian network is
directed and not cyclic, a Markov network is undirected and can be cyclic.
Markov models can be used to determine what the learner gets as they attempt a
course based on previous behaviours. You may be unaware, for example, that
these techniques are already used to present you with a different web page from
others from major providers.
Quite separately, Corbett introduced a Bayesian
knowledge-tracing algorithm, directly into the learning field, which is more
directly associated with data mining, from, for example, learning management
systems, which produce large amounts of data about learner behaviour. This can
be used to come to a conclusion and make a decision about what is needed next. Note
that all of these approaches (and there are many more) are very different from
rule-based adaptive systems. The difference between these systems is explained
well in this paper by Jim Thompson.
We should note that this field has 250 years of mathematical
thinking behind it and has an enormous amount of mathematical complexity.
Nevertheless, having born fruit in other online contexts there is every reason
to think it will bear fruit in learning. Learning algorithms can, through
algorithms. embody evidence-based learning theory, to increase the productivity
of the teaching. But what really drives algorithmic, adaptive learning are the
advantages they afford to the learner:
1. Gender, race,
colour, accent, social background
Algorithms are blind to the sort of social biases (gender,
race, colour, age, ethnicity, religion, accent, social background) we commonly
see, not only in society through sexism, racism and snobbery but also in
teaching, where social biases are not uncommon. In education, it is useful to
distinguish between subtle and blatant biases, in that the teacher may be
perceived to be unbiased and not be aware of their own biases. We know, for
example, that gender bias has a strong effect on subject choice and that both
gender and race affect teacher feedback. Algorithms can be free of such social
biases.
2. Free from
cognitive biases
Cognitive biases around ability versus effort, made clear by
the likes of Carol Dweck on fixed versus growth mindsets, clearly affect
teacher and learner behaviour leading to self-fulfilling predictions on student
attainment. Considerable bias in marking and grades has also been evidenced. There
may also be ingrained theories and practices that are out of date and now
disproven, such as learning styles, that heavily influence teaching.
Algorithms, build on sound theory and practice, can, over time, based on actual
evidence, try to eliminate such biases.
3. Never get tired,
ill, irritable or disillusioned
To teach is human and teacher performance is variable.
That is not a criticism of teachers but an observation about human nature and
behaviour. Algorithmic behaviour is only variable in the sense that it uses
variables. Algorithms are at the top of their game (albeit limited) 24/7/365.
Of course, one could argue that the affective, emotional side of learning is
not always provided by algorithmic learning. That is true but good design in can
ensure that it is a feature of delivery. Sentiment or emitional analysis by machine learning is making good progress. So even here, algorithmic techniques
around gesture recognition, attention and emotion are being researched and built.
4. Algorithms can do
things that brains cannot
Seems like a bold claim, but the number of variables, and
sheer formulaic power of an ensemble of algorithms, in many areas, is well
beyond the capability of the brain. In addition, the data feeds and data mining
opportunities, as well as consistent and correct delivery of content may also be
beyond the capability of many teachers. The problem is that most teaching is
not one-to-one and therefore those tacit skills are difficult to apply to different classes of learners, the norm in educational and training institutions. For the
moment there are many tacit skills in teachers that algorithms have not
captured. That has to be recognised but that is not a reason for stopping, only
a reason for driving forward. We will see every more sophiticated analsysis of cognitive behaviour, where the sheer number of gognitive misconceptions and problems cannot be identified by a teacher but can by careful AI analysis.
5. Personalises the
speed of learning
A group of learners can be represented by a distribution
curve. Yet suppose we use a system that is sensitive not just to the bulk of
learners but also the leading and trailing tail? Algorithms treat the learners
as an individual and personalise the learning journey for that learner. You
are, essentially streaming into streams of one. The consequence is the right
route for each individual that leads to learning at the speed of ability at any
given time. The promise is that learners get through courses quicker. More than this, Bloom in his famous 2-Sigma paper, shwoed the significant advantage of one-to-one teaching over other forms of instruction. We now have the opportunity to deliver on this researched promise. We already have evidence that this can be achieved on scale.
6. Prevents
catastrophic failure and drop-out
Slower learners do not get left behind in adaptive, AI-driven systems or suffer catastrophic
failure, often in a final summative exam when it is too late, because the
system brings them along at a speed that suits them. The UK University system has a 16% dropout rate and this is much higher in the US. In schools a considerbale number of students fail to achieve even modest levels of attainment. This approach could can lower drop-out,
something that has critical personal, social and financial consequences.
7. Personal reporting
Such systems can produce reports that really do match
personal attainment, through personal feedback for the learner than informs
their motivation and progress through a course. Rather than standard feedback
and remedial loops, the learner can feel as though they really are being
tutored, as the feedback is detailed and the learning journey finessed to their
personal needs. Teachers also have a lot to gain from feedback out of such systems. Early evidence suggests that good teachers in combination with such systems, produce great results.
8. They learn
Teachers need to learn, though many would question the efficacy of
INSET days or current models of rushed or absent CPD. Algorithmic systems also
learn. It is a mathematical feature of machine learning that the system gets
better the more students that take the course. We must be careful about
exaggerated claims in this area but it is an area of intense research and
development. We are now at a level where adaptive systems themselves adapt, as more and more students go through the system, It is this ability to constantly and relentlessly learn and improve that may, ultimately take AI beyond the ability of teachers to constantly adapt.
9. Course improvement
Courses are often repeated, without a great deal of reflection
on their weaknesses, even inaccuracies. Many studies of textbooks have shown
that they are strewn with mistakes. The same is true of exams and high-satkes assessment. Adaptive, algorithmic systems can be designed
to automatically identify erroneous questions, weak spots, good resources even
optimal paths through a network of learning possibilities. They may even be able to identify cheating. We have seen many examples of data analsis being used to identify teacher and student cheating. One further possibility
is in courses that are semi-porous, where learners use an external resource,
say a Wikipedia page or video, and find it useful, thereby raising its ranking
in the network of available options for future learners. This is true with systems like WildFire.
10. Massively
scalable
Humans are not scalable but algorithms are massively scalable. We have already seen how Google,
Facebook, Amazon, Netflix, retailers and many other services use algorithmic
power to help you make better decisions and these operate at the level of
billions of users. In other words there is no real limit to their scalability.
If we can apply that personalisation of learning on a massive scale, education
could break free of its heavy cost burden.
Conclusion
The algorithmic, adaptive approach to learning promises to
provide things that live teachers cannot and could never deliver. All of the
above is being realised through organisations like CogBooks, who have built adaptive, algorithmic systems. This is important, as we cannot get
fixated by the oft repeated mantra that face-to-face teaching is always a necessary
condition for learning - it is not. Neither should we simply stop at the point
of seeing technology as merely something to be used by a teacher in a
classroom. It can, but it can be more than this. This approach to technology-based
learning could be a massive breakthrough in terms of learning outcomes for
millions of learners. It already operates in the learning sphere, through
search, perhaps the most profound pedagogic change we have seen in the last
century. For me, it is only a matter of when it will be used in more formal
learning environments.
2 comments:
Really interesting and useful article, and glad to see on Twitter that you will write an article on this. I hope you will also cover the things that algorithms aren't good for - or perhaps the important aspects of learning that don't generate data that can be fed into algorithms.
Noted Seb's comments about learner agency too. I wonder if algorithms migt get stuck with early assumptions like 'learning styles' did. A reluctance to change basic assumptions because it mucks up data already collected.
Agree Frances. Lots of things that algorithms don't do well. And as algorithms are designed by humans they can capture bad ideas and processes. Indeed, some systems already capture 'learning styles' in adaptive learning. Will be writing downside pice soon. soon.
Post a Comment