Donald Clark Plan B: July 2018

Friday, July 20, 2018

“Huge milestone in advancing artificial intelligence” (Gates) as AI becomes a team – big implications for learning?

An event took place this month that received little publicity but was described by Bill Gates as a “huge milestone in advancing artificial intelligence”. It could have profound implications for advances in AI. An OpenAI team, of five neural networks, and a lot of other techniques, beat human teams in the game Dota2.

To understand the significance of this, we must understand the complexity of the task. A computer environment like Dota2 is astoundingly complex and it is played by teams of five, all of whom determine long-term success, exploring the complex environment, making decisions in real time, identifying threats, employing clever team strategies. It’s a seriously bad, chaotic, complicated and fast environment made even messier by the fact that you’re playing against some very smart teams of humans.

To win they created five separate neural networks, each representing a player. You need an executive layer to determine the weightings for each of the player’s actions to determine priorities. Using reinforcement learning techniques and playing itself millions of times (equivalent of 180 years of playtime per day!), it learned fast. Not yet at the level of the professional Dota2 team, but it will get there.

What’s frightening is the speed of the training and actual learned competence across a ‘team’ in a team context. It needed to optimize team decisions and think about long-term goals, not short-term wins. This is way beyond single player games like chess and GO.
The implications are huge as AI moves from being good at single, narrow tasks, to general tasks, using a modular approach.

Team AI in military

Let’s take the most obvious parallel. In a battle of robot soldiers versus humans, the robots and drones will simulate the mission, going through millions of possible scenarios, then, having learned all it needs to know, attack in real time, problem solving, as a team, as it goes. Autonomous warfare will be won by the smartest software, organic or not. This is deeply worrying, so worrying that senior AI luminaries, such as Musk, and within Google, have signed a pledge this week saying they will not work on autonomous weapons. So let’s turn to the civilian world.

Team AI in robotics

I’ve written about shared learning in AI across teams of robots which give significant advantages in terms of speed and shared experience. Swarm and cloud robotics are very real. But let’s turn to something very real – business simulations.

Team AI in learning for individuals

Imagine a system that simulates the senior management of a company– its five main Directors. It learns by gathering data about the environment – competitors, financial variables, consumer trends, cost modeling, instantaneous cost-benefit analysis, cashflow projections, resource allocation, recruitment policies, training. It may not be perfect but it is likely to learn faster than any group of five humans and make decisions faster, beating possible competitors. Would a team AI be useful, as an aid to decision making? Do you really need those expensive managers to learn this stuff and make these decisions?

Team AI in learning for individuals

Let’s bring it down to specific strategies for individuals. Suppose I want to maximize my ‘learning’ time. The model in the past has always been one teacher to one or many learners. Imagine pooling the expertise of many teachers to decide what you see and do next in your learning experience or journey? Imagine those teachers having different dimensions, cross-curricular and so on? This breaks the traditional teaching mould. The very idea of a single teacher may just be an artifact of the old model of timetabling and institutions. Multi-teaching, by best of breed systems, may be a better model.

Team AI and blended learning

One could see AI determine the optimal blend for a course or you as an individual. If we take Blended Learning (not Blended Teaching) as our staring point, one of then, based on the learning task, learner and resources available from the AI teacher team, we could see guidance emerge on optimizing the learning journey for that individual, a pedagogic expert who determines what particular type of teaching/learning experience you need at that particular moment?

Conclusion

Rather than seeing AI as a single entity and falsely imagining that it should simulate the behavior of a single ‘teacher’ we should see it as a set of different things that work as a team to help you learn. At the moment we use all sorts of ~AI to help you learn – Google, Recommendation engines, Engagement bots (Differ), Support bots (Jill Watson), Content creation (WildFire), Adaptive Learning (CogBooks), Spaced-practice (WildFire), even Wellbeing. Imagine bringing all of these together for you as an individual learner – coordinated, optimized, not a teacher but a team of teachers.

Wednesday, July 04, 2018

Data is not the new oil, more likely the new snakeoil….

to data analytics projects one must really think hard about what 'data' actually is. The problem with many of these projects is that they can turn into 'data' projects and not business projects.
Before jumping inOil is messy and dirty (crude) when it comes out of the ground but it is largely useful when fractioned. Up to 50% is used for petrol, 20% distillate fuel (heating oil and diesel fuel) and 8% jet fuel. The rest has many other useful purposes. The unwanted elements and compounds are a tiny percentage. Data is not the new oil. It’s stored in weird ways and places, is often old, useless, messy, embarrassing, secret, personal, observed, derived, analytic, may need to be anonymised, training sets identified and subject to GDPR. To quote that old malapropism, ‘data is a minefield of information’!

1. Data dumps

Data is really messy, with much of it in:

odd data structures
odd formats/encrypted
different databases

Just getting a hold of the stuff is difficult.

2. Defunct data

Then there’s the problem of relevance and utility, as much of it is:

old
useless
messy

In fact, much of it could be deleted. We have so much of the stuff because we haven’t known what to do with it, don’t clean it and don’t know how to manage it.

3. Difficult data

There are also problems around data that is:

embarrassing
secret

There may be very good reasons for not opening up historic data, such as emails and internal communications. It may open up a sizeable legal and other HR risks for organisations. Think Wikileaks email dumps. It’s not like a barrel of oil, more like a can of worms. Like oil spills, we also have data leaks.

4. Different data

Once cleaned, one can see that there’s many different types of data. Unlike oil it has not so much fractions as different categories of data. In learning we can have ‘Personal’ data, provided by the person or actions performed by that person with their full knowledge. This may be gender, age, educational background, needs, stated goals and so on. Then there’s ‘Observed’ data from the actions of the user, their routes, clicks, pauses and choices. You also have ‘Derived’ data inferred from existing data to create new data and higher level ‘Analytic’ data from statistical and probability techniques related to that individual. . Data may be created on the fly or stored.

5. Anonymised data

Just when you thought it was getting clearer. You also have ‘Anonymised’ data is a bit like oil of an unknown origin. It is clean of any attributes that may relate it to specific individuals. This is rather difficult to achieve as there are often techniques to back engineer attribution to individuals.as

6. Supervised data

In AI there’s also ‘Training’ data used for training AI systems and ‘Production’ data which the system actually uses when it is launched in the real world. This is not trivial. Given the problems stated above, it is not easy to get a suitable data set, which is clean and reliable for training. Then, when you launch the service or product the new data may be subject to all sorts of unforeseen problems not uncovered in the training

7. Paucity of data

But the problems don’t stop there. In the learning world, the data problem is even worse as there is another problem – the paucity of data. Institutions are not gushing wells of data. Universities, for example, don’t even know how many students turn up for lectures. Data on students is paltry. The main problem with the use of data in learning, is that we have so little of the stuff. SCORM, which has been around for 20 plus years literally stopped the collection of data with its focus in completion. This was the result of a stupid decision by a bunch of folk at ADL. This makes most data analytics projects next to useless. The data can be best handled in a spreadsheet. It is certainly not as large, clean and relevant as it needs to be to produce genuine insights.

Prep

Before entering these data analytics projects ask yourself some serious questions about 'data. Data size by itself, is overated, but size still matters, whether n = tens, hundreds, thousands, millions, the Law of Small Numbers still matters. Don’t jump until you are clear about how much relevant and useful data you have, where it is, how clean it is and in what databases.

New types of data may be more fruitful than legacy data. In learning this could be dwell time on questions, open input data, wrong answers to questions and so on.

More often than not, what you have as data is really proxies for phenomenon. Be careful here, as your oil may actually be snakeoil.

Conclusion

GDPR has made its management and use more difficult. All of this adds up to what I’d call the ‘data delusion’, the idea that data is one thing, easy to manage and generally useful in data analytics projects in institutions and organisations. In general, it is not. That's not to say you should ignore its uses - just don't get sucked into data analytics projects in learning that promise lots but deliver little. Far better to focus on the use of data in adaptive learning or small scale teaching and learning projects where relatively small amounts of data can be put to good use.