Tuesday, October 24, 2017

Kirkpatrick evaluation: kill it - happy sheet nonsense, well past its sell-by-date

Kirkpatrick has for decades been the only game in town in the evaluation of corporate training, although hardly known in education. In his early Techniques for evaluation training programmes (1959) and Evaluating training programmes: The four levels (1994), he proposed a standard approach to the evaluation of training that became a de facto standard. It is a simple and sensible schema but has not stood the test of time. First up - what are the Kirkpatrick's four levels of evaluation?
Four levels of evaluation
Level 1 Reaction
At reaction level one asks learners, usually through ‘happy sheets’ to comment on the adequacy of the training, the approach and perceived relevance. The goal at this stage is to simply identify glaring problems. It is not, to determine whether the training worked.
Level 2 Learning
The learning level is more formal, requiring a pre- and post-tests. This allows you to identify those who had existing knowledge, as well as those at the end who missed key learning points. It is designed to determine whether the learners actually acquired the identified knowledge and skills.
Level 3 Behaviour
At the behavioural level, you measure the transfer of the learning to the job. This may need a mix of questionnaires and interviews with the learners, their peers and their managers. Observation of the trainee on the job is also often necessary. It can include an immediate evaluation after the training and a follow-up after a couple of months. 
Level 4 Results
The results level looks at improvement in the organisation. This can take the form of a return on investment (ROI) evaluation. The costs, benefits and payback period are fully evaluated in relation to the training deliverables. 
JJ Phillips has argued for the addition of a separate, fifth, "Return on Investment (ROI)” level which is essentially about comparing the fourth level of the standard model to the overall costs of training. However, it is not that ROI is a separate level as it can be included in Level 4. Kaufman has argued that it is merely another internal measure and that if there were a fifth level it should be external validation from clients, customers and society. In fact there have been other evalutaion methods with even more levels, completely over-engineering the solution.
Criticism
Level 1 - keep 'em happy
Traci Sitzmann’s meta-studies (68,245 trainees, 354 research reports) ask ‘Do satisfied students learn more than dissatisfied students?’ and ’Are self-assessments of knowledge accurate?’ Self-assessment is only moderately related to learning. Self-assessment captures motivation and satisfaction, not actual knowledge levels.She recommends that self-assessments should NOT be included in course evaluations and should NOT be used as a substitute for objective learning measures.
So Favourable reactions on happy sheets do not guarantee that the learners have learnt anything, so one has to be careful with these results. This data merely measures opinion. 
Learners can be happy and stupid. One can express satisfaction with a learning experience yet still have failed to learn. For example, you may have enjoyed the experience just because the trainer told good jokes and kept them amused. Conversely, learning can occur and job performance improve, even though the participants thought the training was a waste of time. Learners often learn under duress, through failure or through experiences which, although difficult at the time, prove to be useful later. 
Happy sheet data is often flawed as it is neither sampled nor representative. In fact, it is often a skewed sample from those that have pens, are prompted, liked or disliked the experience. In any case it is too often applied after the damage has been done. The data is gathered but by that time the cost has been incurred. More focus on evaluation prior to delivery, during analysis and design, is more likely to eliminate inefficiencies in learning.
Level 2 - Testing, testing
Level 2 recommends measuring difference between pre- and post-test results but pre-tests are often ignored. In addition, end-point testing is often crude, usually testing the learner’s short-term memory. With no adequate reinforcement and push into long-term memory, most of the knowledge will be forgotten, even if the learner did pass the post-test.
Tests are often primitive and narrow, testing knowledge and facts, not real understanding and performance. Again, level 2 is inappropriate for informal learning.
Level 3 – Good behaviour
At this level the transfer of learning to actual performance is measured. Many people can perform tasks without being able to articulate the rules they follow. Conversely, many people can articulate a set of rules well, but perform poorly at putting them into practice. This suggests that ultimately, Level three data should take precedence over Level two data. However, this is complicated, time consuming and expensive and often requires the buy-in of line managers with no training background, as well as their time and effort. In practice it is highly relevant but usually ignored.
Level 4 - Does the business
The ultimate justification for spending money on training should be its impact on the business. Measuring training in relation to business outcomes is exceedingly difficult. However, the difficulty of the task should, perhaps, not discourage efforts in this direction. In practice Level 4 is often ignored in favour of counting courses, attendance and pass marks.
General criticisms
First, Kirkpatrick is the first to admit that there is no research or scientific background to his theory. This is not quite true, as it is clearly steeped in the behaviourism that was current when it was written. It is summative, ignores context and ignores methods of delivery. Some therefore think Kirkpatrick asks all the wrong questions, the task is to create the motivation and context for good learning and knowledge sharing, not to treat learning as an auditable commodity. It is also totally inappropriate for informal learning.
Senior managers rarely want all four levels of data. They want more convincing business arguments. It's the training community that tell senior management that they need Kirkpatrick, not the other way round. In this sense it is over-engineered. The 4 linear levels too much. All the evidence shows that Levels 3 and 4 are rarely attempted, as all of the effort and resource focuses on the easier to collect Levels 1 and 2. Some therefore argue that it is not necessary to do all four levels. Given the time and resources needed, and demand from the organisation for relevant data, it is surely better to go straight to Level four. In practice, Level 4 is rarely reached as fear, disinterest, time, cost, disruption and low skills in statistics mitigate against this type of analysis.
The Kirkpatrick model can therefore be seen as often irrelevant, costly, long-winded, and statistically weak. It rarely involves sampling, and both the collection and analysis of the data is crude and often not significant. As an over-engineered, 50 year old theory, it is badly in need of an overhaul (and not just by adding another Level).
Models and messages
Models such as ADDIE, Malsow's pyramid, VAK learning styles Myers-Briggs and Kirkpatrick send the wrong message. They seem as though they are scientific and certain when they are neither. Kirkpatrick gives the illusion of certainty, but s Wll ThalHeimer showed, Kirkpatrick didn;t come up with the four-level model, he uses Katzell's work. Read Kirkpatrick's paper, as it is there on the first page. It is anot a researched model, it was lifted from someone else and was well marketed. Kirkparick mentions Katzell in his first 1956 paper but never again after 1960. The KIrkPatrick model is not only badly researched, it is downright misleading. It simplifies and suggests a model that starts with learner perceptions and proceeds in a linear fashion to business impact, but as the earlier levels are irrelevant, people set of at Level 1 but the journey is so long they never get to Level 4. It's easy doing smile sheets, hards to measure business impact.
Alternatives
Evaluation should be done externally. The rewards to internal evaluators for producing a favourable evaluation report vastly outweigh the rewards for producing an unfavourable report. There are also lots of shorter, sharper and more relevant approaches; Brinkerhoff’s Success Case Method, Daniel Stufflebeam's CIPP Model, Robert Stake's Responsive Evaluation, Kaufman's Five Levels of Evaluation, CIRO (Context, Input, Reaction, Outcome), PERT (Program Evaluation and Review Technique), Alkins' UCLA Model, Provus's Discrepancy Model and Eisner's Connoisseurship Evaluation Model. However, Kirkpatrick is too deeply embedded in the culture of training, a culture that tends to get stuck with theories that are often 50 years, or more, old.
Evaluation is all about decisions. So it makes sense to customise to decisions and decision makers. And if one asks ‘To what problem is evaluation a solution’ one may find that it may be costs, low productivity, staff retention, customer dissatisfaction and so on. In a sense Kirkpatrick may stop relevant evaluation.
Conclusion
Kirkpatrick’s four levels of evaluation have soldiered on for nearly 60 years as, like much training theory, it is the result of strong marketing, now by his son James Kirkpatrick, and has become fossilised in ‘train the trainer’ courses. It has no real researched or empirical background, is over-engineered, linear and focuses too much on less relevant Level 1 and 2 data drawing effort away from the more relevant Level 4. Time to Kill Kirkpatrick.
Bibliography
Kirkpatrick, D. (1959). Techniques for evaluation training programmes.
Kirkpatrick, D. (1994). Evaluating training programmes: The four levels.
Kirkpatrick, D. and Kirkpatrick J.D. (2006). Evaluating Training Programs (3rd ed.). San Francisco, CA: Berrett-Koehler Publishers
Phillips, J. (1996). How much is the training worth? Training and Development, 50(4),20-24.
Kaufman, R. (1996). Strategic Thinking: A Guide to Identifying and Solving Problems. Arlington, VA. & Washington, D.C. Jointly published by the American Society for Training & Development and the International Society for Performance Improvement
Kaufman, R. (2000). Mega Planning: Practical Tools for Organizational Success. Thousand Oaks, CA. Sage Publications.
Sitzmann, T., Brown, K. G., Casper, W. J., Ely, K., & Zimmerman, R. (2008). A review and meta-analysis of the nomological network of trainee reactions. Journal of Applied Psychology93, 280-295.
Sitzmann, T., Ely, K., Brown, K. G., & Bauer, K. N. (in press). Self-assessment of knowledge: An affective or cognitive learning measure? Academy of Management Learning and Education.

Gagne's 9 dull Commandments - why they cripple learning design...

50 year old theory
It is over 50 years since Gagne, a closet behaviourist, published The Conditions of Learning (1965). In 1968 we got his article Learning Hierarchies, then Domains of Learning in 1972. Gagne’s theory has five categories of learning; Intellectual Skills, Cognitive strategies, Verbal information, Motor skills and Attitudes. OK, I quite like these – better than the oft-quoted Bloom trilogy (1956). Then something horrible happened.

Nine Commandments
He claimed to have found the Nine Commandments of learning. A single method of instruction that applies to all five categories of learning, the secret code for divine instructional design. Follow the linear recipe and learning will surely follow.

1 Gaining attention
2 Stating the objective
3 Stimulating recall of prior learning
4 Presenting the stimulus
5 Providing learning guidance
6 Eliciting performance
7 Providing feedback
8 Assessing performance
9 Enhancing retention and transfer to other contexts

Instructional designers often quote Gagne, and these nine steps in proposals for e-learning and other training courses, but let me present an alternative version of this list:

1 Gaining attention
Normally an overlong animation, coporate intro or dull talking head, rarely an engaging interactive event. You need to grab attention not make the learner sit back in their chair and mind.
2 Stating the objective
Now bore the learner stupid with a list of learning objectives (really trainerspeak). Give the plot away and remind them of how really boring this course is going to be.
3 Stimulating recall of prior learning
Can you think of the last time you considered the details of the Data Protection Act?
4 Presenting the stimulus
Is this a behaviourist I see before me? Yip. Click on Mary, Abdul or Nigel to see wht they think of te data Protection Act - cue speech bubble... or worse some awful game where you collect coins or play the role of Sherlock Holmes....
5 Providing learning guidance
We’ve finally got to some content.
6 Eliciting performance
True/False or Multiple-choice questions each with at least one really stupid option (cheat list for MC here).
7 Providing feedback
Yes/no, right/wrong, correct/incorrect…try again.
8 Assessing performance
Use your short-term memory to choose options in the final multiple-choice quiz.
9 Enhancing retention and transfer to other contexts
Never happens! The course ends here, you’re on your own mate….

Banal and dull
First, much of this is banal – get their attention, elicit performance, give feedback, assess. It’s also an instructional ladder that leads straight to Dullsville, a straightjacket that strips away any sense of build and wonder, almost guaranteed to bore more than enlighten. What other form of presentation would give the game away at the start. Would you go to the cinema and expect to hear the objectives of the film before you start?
It’s time we moved on from this old and now dated theory using what we’ve learnt about the brain and the clever use of media. We have AI-driven approaches such as WildFire and CogBooks that personalise learning.....

And don’t get me started on Maslow, Mager or Kirkpatrick!

Wednesday, October 18, 2017

AI-driven tool produces high quality online learning for global company in days not months

You have a target of a two thousand apprentices by 2020, have a sizeable £2 million plus pot from the Apprenticeship Levy. This money has to, by law, be spent on training. The Head of Apprenticeships in this Global company is a savvy manager and they already have a track record in the delivery of online learning. So they decided to deliver a large portion of that training using online learning.
Blended Learning
Our first task was to identify what was most useful in the context of Blended Learning. It is important to remember that Blended Learning is not Blended TEACHING. The idea is to analyse the types of learning, types of learners, context and resources to identify your optimal blend, not just a bit of classroom, a bit of online stuff, stick them together like Velcro, and call it ‘blended’. In this case the company will be training a wide range of apprentices over the coming years, a major part of their recruitment strategy, important to the company and the young people joining the company.
Learning
The apprentice ‘frameworks’ identify knowledge, behaviours and competences as the three desired types of learning and all of these have to be assessed. The first project, therefore, looked at the ‘knowledge’ component. This was substantial as few new apprentices have much in the way knowledge in this sector. Behaviours and competences need to be primed and supported by underlying knowledge.
Assessment
Additionally, assessment matters in apprenticeships, both formatively, as the apprentices progress, and summatively, at the end. Assessment is a big deal as funding, and the successful attainment of the apprentice, depends on objective and external assessment. It can’t be fudged.
Context
These young apprentices will be widely distributed in retail outlets and other locations, here and abroad. They may also work weekends and shifts. One of our goals was to provide training where and when it was needed, on-demand, at times when workload was low. Content, Level 3 and Level 2, had to be available 24/7, on a range of devices, as tablets were widespread and mobile increasingly popular.
Solution
WildFire was chosen, as it could produce powerful online content that is:

  • Highly retentive
  • Aligned with assessment
  • Deliverable on all devices
  • Quick to produce
  • Low cost

Using an AI-driven content creation tool, we produced 158 modules (60 hours of learning), in days not months. After producing Level 3, we could quickly produce the Level 2 courses and load them up to the LMS for tracking user performance. The learner uses high-retention, open input, rather than weak multiple choice questions. The AI-driven content creation tool not only produced the high quality, online content quickly, it produced links out to additional supplementary content that proved extremely useful in terms of further learning. It only accepts completion when 100% competence is achieved and the learner has to persevere in a module until that is achieved.
Conclusion
The team, both the commissioning manager and the project manager were really up for this. First the use of new AI-driven tech excited them. Second, the process truned out to be quick and relatively hassle free. We produced so much content so quickly that it ran ahead of the organiation's ability to test it! Nevertheless, we got there, met very tight deadlines and came out the other side feeling that this really was a game changer. Three, we were all proud of the output. It's great working with a project manager who sees problems as simply things to be solved. We had to manage expectations on both sides as this approach and process, was very new.

AI is the new UI. Google has long been used in learning and AI shapes almost all online experiences – Facebook, Twitter, Amazon, Netflix and so on. AI can now be used to shape online experiences in learning. It can create high-quality content in minutes not months, at a fraction of the cost, from a document, PPT, podcast or video. I think this changes the game in the e-learning market.
For more detail or a demonstration contact here

Saturday, October 14, 2017

Is there one book you’d recommend as an introduction to AI? Yes. Android Dreams by Toby Walsh

Although there are books galore on AI, from technical textbooks to potboilers, few are actually readable. Nick Bostrom’s ‘Superintelligence’ is dense and needed a good edit, ‘The Future of the Professions’ too dense, ‘The Rise of the Robots’ good but a bit dated, and lacks depth, and ‘Weapons of Math Destruction’ a one-sided and exaggerated contrarian tract. At last there’s an answer to that question “Is there one book you’d recommend as an introduction to AI?” That book is Android Dreams by Toby Walsh.
I met Toby Walsh in Berlin and he’s measured and a serious researcher in AI. So I was looking forward to this book and wasn’t disappointed. The book, like the man, is neither too utopian nor dystopian. He rightly describes AI as an IDIOT SAVANT, and this sets the tone for the whole book. In general, you could identify his position on AI, as overestimated in the short-term, underestimated in the long-term. He sees AI as having some limitations and that progress in robotics, and even the much lauded deep learning, have their Achille’s heels – back-propagation being one.
On ethics he focuses not on the surface criticisms about algorithmic bias but on whether weaponised AI is a threat – it is – and it’s terrifying. Loved it when he skewered the Frey & Osborne Oxford report on the idea that 47% of jobs are at threat from AI. He explains why they got so many things wrong by going through a series of job types, explaining why robots will not be cutting your hair or serving your food in restaurants any time soon. He also takes a healthy potshot at academics and teachers who think that everyone else’s jobs are at risk, except their own.

The book has all the hallmarks of being written by an expert in the field with none of the usual exaggeration or ill-informed negativity, that many commentators have when it comes to AI. AI is not one thing, it is many things, he explains that well. AI can be used for good as well as evil, he explains that well. AI is probably the most important tech development since language, writing and printing – he explains that well. Worth reading, if only for some of his speculative predictions – driverless cars, doctor will be your computer, Marilyn Monroe back in the movies, computer recruitment, talking to rooms,  AI/robot sports, ghost ships, planes and trains, TV news made without humans, personal bot that lives on after you die. This review was partly written using AI. Really.