Kirkpatrick
has for decades been the only game in town in the evaluation of training,
although hardly known in education. In his early Techniques
for evaluation training programmes (1959)
and Evaluating training programmes: The
four levels (1994), he proposed a standard approach to the evaluation of
training that became a de facto standard.
It is a simple and sensible schema but has it stood the test of time?
Level 1 Reaction
At reaction level one asks learners, usually through ‘happy sheets’ to
comment on the adequacy of the training, the approach and perceived relevance.
The goal at this stage is to simply identify glaring problems. It is not, to
determine whether the training worked.
Level 2 Learning
The learning level is more formal, requiring a pre- and post-test. This
allows you to identify those who had existing knowledge, as well as those at
the end who missed key learning points. It is designed to determine whether the
learners actually acquired the identified knowledge and skills.
Level 3 Behaviour
At the behavioural level, you measure the transfer of the learning to the
job. This may need a mix of questionnaires and interviews with the learners,
their peers and their managers. Observation of the trainee on the job is also often
necessary. It can include an immediate evaluation after the training and a
follow-up after a couple of months.
Level 4 Results
The results level looks at improvement in the organisation. This can
take the form of a return on investment (ROI) evaluation. The costs, benefits
and payback period are fully evaluated in relation to the training
deliverables.
JJ Phillips has
argued for the addition of a separate, fifth, "Return on Investment (ROI)”
level which is essentially about comparing the fourth level of the standard
model to the overall costs of training.
However, it is not that ROI is a separate level as it can be included in Level
4. Kaufman has argued that it is merely another internal measure and that of
there were a fifth level it should be external validation from clients,
customers and society.
Criticism
Level 1 - keep 'em happy
Traci Sitzmann’s meta-studies (68,245 trainees, 354 research reports) ask
‘Do satisfied students learn more than dissatisfied students?’ and ’Are
self-assessments of knowledge accurate?’ Self-assessment is only moderately related to learning. Self-assessment
captures motivation and satisfaction, not
actual knowledge levels.
She recommends that self-assessments should NOT be included in course evaluations and should NOT be used as a substitute for
objective learning measures.
So Favourable reactions on happy
sheets do not guarantee that the learners have learnt anything, so one has to
be careful with these results. This data merely measures opinion.
Learners can be happy and stupid. One can express
satisfaction with a learning experience yet still have failed to learn. For
example, you may have enjoyed the experience just because the trainer told good
jokes and kept them amused. Conversely, learning can occur and job performance
improve, even though the participants thought the training was a waste of time.
Learners often learn under duress, through failure or through experiences which,
although difficult at the time, prove to be useful later.
Happy sheet data is often flawed
as it is neither sampled nor representative. In fact, it is often a skewed
sample from those that have pens, are prompted, liked or disliked the
experience. In any case it is too often applied after the damage has been done.
The data is gathered but by that time the cost has been incurred. More focus on
evaluation prior to delivery, during analysis and design, is more likely to
eliminate inefficiencies in learning.
Level 2 - Testing, testing
Level 2 recommends measuring
difference between pre- and post-test results but pre-tests are often ignored. In
addition, end-point testing is often crude, usually testing the learner’s
short-term memory. With no adequate reinforcement and push into long-term
memory, most of the knowledge will be forgotten, even if the learner did pass
the post-test.
Tests are often primitive and
narrow, testing knowledge and facts, not real understanding and performance.
Again, level2 is inappropriate for informal learning.
Level 3 – Good behaviour
At this
level the transfer of learning to actual performance is measured. Many people can perform tasks without being
able to articulate the rules they follow. Conversely, many people can articulate
a set of rules well, but perform poorly at putting them into practice. This
suggests that ultimately, Level three data should take precedence over Level
two data. However, this is complicated,
time consuming and expensive and often requires the buy-in of line managers
with no training background, as well as their time and effort. In
practice it is highly relevant but usually ignored.
Level 4 - Does the business
The ultimate justification for
spending money on training should be its impact on the business. Measuring
training in relation to business outcomes is exceedingly difficult. However,
the difficulty of the task should, perhaps, not discourage efforts in this
direction. In practice Level 4 is often ignored in favour of counting courses,
attendance and pass marks.
General criticisms
First, Kirkpatrick is the first
to admit that there is no research or scientific
background to his theory. This is not quite true, as it is clearly steeped in
the behaviourism that was current when it was written. It is summative,
ignores context and ignores methods of delivery. Some therefore think
Kirkpatrick asks all the wrong questions, the task is to create the motivation
and context for good learning and knowledge sharing, not to treat learning as
an auditable commodity. It is also totally inappropriate for informal learning.
Senior managers rarely want all four
levels of data. They want more convincing business arguments. It's the training
community that tell senior management that they need Kirkpatrick, not the other
way round. In this sense it is over-engineered. The 4 linear levels too much. All the evidence shows that Levels
3 and 4 are rarely attempted, as all of the effort and resource focuses on the
easier to collect Levels 1 and 2. Some therefore argue that it is not necessary
to do all four levels. Given the time and resources needed, and demand from the
organisation for relevant data, it is surely better to go straight to Level
four. In practice, Level 4 is rarely reached as fear, disinterest, time, cost, disruption
and low skills in statistics mitigate against this type of analysis.
The Kirkpatrick model can
therefore be seen as often irrelevant, costly, long-winded, and statistically
weak. It rarely involves sampling, and both the collection and analysis of the
data is crude and often not significant. As an over-engineered, 50 year old theory,
it is badly in need of an overhaul (and not just by adding another Level).
Alternatives
Evaluation should be done
externally. The rewards to internal evaluators for producing a favourable evaluation report vastly outweigh
the rewards for producing an unfavourable report. There are also lots of shorter,
sharper and more relevant approaches; Brinkerhoff’s
Success Case Method, Daniel Stufflebeam's CIPP Model, Robert Stake's Responsive
Evaluation, Kaufman's Five
Levels of Evaluation, CIRO (Context,
Input, Reaction, Outcome), PERT
(Program Evaluation and Review Technique), Alkins' UCLA Model, Provus's
Discrepancy Model and Eisner's Connoisseurship
Evaluation Model. However, Kirkpatrick is too deeply embedded in the culture of
training, a culture that tends to get stuck with theories that are often 50
years, or more, old.
Evaluation is all about
decisions. So it makes sense to customise to decisions and decision makers. And
if one asks ‘To what problem is evaluation a solution’ one may find that it may
be costs, low productivity, staff retention, customer dissatisfaction and so
on. In a sense Kirkpatrick may stop relevant evaluation.
Conclusion
Kirkpatrick’s four levels of
evaluation have soldiered on for over 50 years as, like much training theory, it is the result of strong marketing, now by his son James Kirkpatrick and has become fossilised in ‘train the trainer’ courses. It has no real researched
or empirical background, is over-engineered, linear and focuses too much on
less relevant Level 1 and 2 data drawing effort away from the more relevant Level
4.
Bibliography
Kirkpatrick, D. (1959). Techniques
for evaluation training programmes.
Kirkpatrick, D. (1994). Evaluating
training programmes: The four levels.
Kirkpatrick,
D. and Kirkpatrick J.D. (2006). Evaluating
Training Programs (3rd ed.). San Francisco, CA: Berrett-Koehler Publishers
Phillips,
J. (1996). How much is the training
worth? Training and Development, 50(4),20-24.
Kaufman, R. (1996). Strategic Thinking: A Guide to Identifying
and Solving Problems. Arlington, VA. & Washington, D.C. Jointly
published by the American Society for Training & Development and the
International Society for Performance Improvement
Kaufman, R. (2000). Mega Planning: Practical Tools for
Organizational Success. Thousand Oaks, CA. Sage Publications.
Sitzmann, T., Brown, K. G., Casper,
W. J., Ely, K., & Zimmerman, R. (2008). A review and meta-analysis of the
nomological network of trainee reactions. Journal of Applied Psychology,
93, 280-295.
Sitzmann, T., Ely, K., Brown, K.
G., & Bauer, K. N. (in press). Self-assessment of knowledge: An affective
or cognitive learning measure? Academy of Management Learning and Education.
9 comments:
Donald
Thanks for a very useful review of Kirkpatrick and pointers to many evaluation models I hadn't heard of.
- I'm pretty sure you have mistyped Brinkerhoff as Binkerhoff?
Thanks Alex - sorted.
Ohh Donald I was hoping for a happy sheet for your 50 blogs in etcetc
Anyway some good stuff so thanks
The fact is in many situations businesses like to hear this sort of language from their trainers and the underlying ROI concept is intuitively attractive to managers. While I don't have any problem whatsoever with thinking about ROI as one angle on staff development, there's a very rich history learning evaluation in organizational contexts that, as you point out, is being overlooked.
Hi Irwin
I agree, managers do like ROI but ROI was not a Kirkpatrick concept. ROI is a financial concept, originally used to measure, mathematically, the return on a money investment given the balance of profits and losses over a given period. Managers are right to ask for fiscal evaluations, if they are possible. Wholly agree Irwin that there's been, and continues to be, some excellent work in this area but, by and large, evaluation in training is weak and not at all meaningful or objective.
Hello, I have added your blog to my reading list. This is a great resource. Thank you.
The reason training administrators never achieve Level 4 is that they don't own the yardstick that measures it. Business managers are the judges of what yields business results, not training managers.
Insightful critique, Donald - I found myself nodding enthusiastically at many of your points. But I find your conclusions a tad harsh. The main problem with Kirkpatrick is too many people regard it as "the only game in town", when it is one (just one) of a range of models and tools that have their place in different situations.
Brinkerhoff's Success Case Method (for exmaple) is brilliant, but I wouldn't recommend it for every situation. The same with Kirkpatrick. I don't think we have to dismiss it altogether, just put it in the toolbox alongside all the other options. And I'm not convinced we need a single new all-embracing model to replace it (not sure you are advocating that, but that's one reading of an "overhaul").
Further discussion of situational approaches here: http://www.airthrey.com/papers.htm
Interestingly enough, the Kirkpatrick duo are now pushing their inverted model. You should now start at level 4 and assess what the business need is and the ROI. Then you plan the training.
Or, to put it another way, do what most credible L&D organisations have been doing for years.
Post a Comment