Some online learning designers write their assessment items
first, as they can match the assessment to the objectives or competences before
being distracted by the detail in content. This avoids the trap of writing test
items that simply test atomic facts and words from the presented text.
Many test items are quite simply not fit for purpose, as
they don’t really assess, test the wrong thing or are so badly designed that
they mislead or annoy learners. Bad test item writers tend to simply pluck out
key words from the text and test the recognition or the meaning of those terms.
Good test item writers, and this is a skill that few subject matter experts
possess, understand good design and the need to test deeper understanding.
Here’s a few rules to follow if you’re new to this game:
1. Test understanding, not facts
What’s easy is to simply extract all the nouns, objects and
quantities, then testing for recall. The trick is to push beyond this to test understanding.
It’s not the ‘what’ but the ‘why’ that often matters but remains untested. What’s hard is to write questions that really
do test the learner’s actual knowledge and ability to apply that knowledge. Professor Dylan Wiliam, calls great
questions ‘hinge questions’, as they literally diagnose poor understanding. He
explains how one can use these questions as powerful verbal test items in a
classroom, where it is difficult to diagnose 30 kids quickly but the principle
applies just as well to online learning. Here’s an example:
The ball sitting on a table is not moving. It’s not moving because:
A. No forces are
pushing or pulling on the ball.
B. Gravity is
pulling down, but the table is in the way.
C. The table pushes
up with the same force that gravity pulls down.
D. Gravity is
holding it on to the table.
E. There’s a force
inside the ball keeping it from rolling off the table.
This
question not only catches common misconceptions, it diagnoses between those who
have understood the ‘physics’ and those who have not. There are forces at play,
two in fact, so C is correct. The other options introduce common misconceptions
about the absence of forces in stationary objects (A), single forces (B &
D) or inner forces (E).
There are four components that make great questions:
Make them THINK
Give helpful FEEDBACK
Keep it CONVERSATIONAL
Be POSITIVE
A really bad question will allow the learner to guess the
right answer or simply test recognition from a list. A good question will make
the learner think, a great question will make them look away, even close their
eyes to recall what they know, do something in their head and move towards the
answer. It will push the learner.
Let’s try this:
A bat and ball cost £1.10.
The bat costs one pound more than the ball.
How much does the ball cost?
20p
15p
10p
5p
This question really is a test of a learner’s grasp of algebra and ‘ratios’ in maths, it does a great job, as it really requires you to know how
to apply a principle in mathematical thinking. The answer is not 10p as many choose.
Think about it. If the ball was 10p, the bat, if it were a pound more, would be
£1.10p, making a total of £1.20. The right answer is 5p. If the ball was 5p,
the bat, if it were a pound more, would be £1.05p, making a total of £1.10.
That’s a great question.
Remember also, that a great question also needs great
feedback. In fact, it is better to call these, not questions, but ‘test items’
as you need, not only the question but the feedback. So let’s think about
explanatory feedback for this question, that allows the learner to learn from
the question.
The bat costs one pound more than the ball.
How much does the ball cost?
20p
15p
10p
5p
20p ->
Try again.
15p ->
Try again.
10p ->
Try again.
5p ->
Correct.
This
doesn’t really give any help to the learner who has jumped to the wrong
conclusion or hasn’t thought deeply enough about the question. We need to
provide HELPFUL feedback.
The bat costs one pound more than the ball.
How much does the ball cost?
20p
15p
10p
5p
20p ->
20p+120p=140p Try again.
15p ->
15p+115p=130p Try again.
10p ->
10p+110p=120p Try again.
5p ->
Correct. 5p+105p=110p
The feedback here tries to unpack the nature of the problem
for the learner and gives an explanation as to why they are wrong.
Now let’s add another touch, that human element. Make it
CONVERSATIONAL. In conversation, you
wouldn’t ask a question like this and simply feedback the statement ‘10p+110p=120p Try again.’ You’d make
it a bit more conversational and friendly.
20p ->
If the ball was 20p, and the bat
is £1 more at 120p. The ball (20p) plus bat (120p) gives a total of 140p. Try
again.
15p ->
If the ball was 15p, and the bat
is £1 more at 115p. The ball (15p) plus bat (115p) gives a total of 130p. Try
again.
10p ->
If the ball was 10p, and the bat
is £1 more at 110p, the ball (10p) plus bat (110p)
gives a total of 120p. Try again.
5p ->
Correct. If the ball was 5p, and
the bat is £1 more at 105p, the ball (5p) plus bat (105p) gives a total of 110p
Now be a little more POSITIVE:
20p ->
Sorry. Most people get this wrong on first attempt. Let’s see why this
is wrong. If the ball was 20p, and the bat is £1 more at 120p. The ball (20p)
plus bat (120p) gives a total of 140p. Think of the ratio of the cost between
the ball and bat within the 110p cost. Try again.
15p ->
Sorry. Most people get this wrong on first attempt. If the ball was 15p, and the bat is £1 more at 115p.
The ball (15p) plus bat (115p) gives a total of 130p. Think of the ratio of the
cost between the ball and bat within the 110p cost. Try again.
10p ->
Most people get this wrong first time around and choose this answer.
If the ball was 10p, and the bat is £1
more at 110p, the ball (10p) plus bat (110p) gives a total of 120p. Think of
the ratio of the cost between the ball and bat within the 110p cost. Try again.
5p ->
Well done. If the ball was 5p, and the bat is £1 more at 105p, the ball
(5p) plus bat (105p) gives a total of 110p.
You get the idea. Make test items meaningful, not mundane,
and be helpful, fulsome and conversational in the feedback. This type of
approach to test items will make the learner feel good and propel them forward,
rather than act as barriers or make them feel like failures.
2.
Binary Choice language
We don’t
ask each other questions in normal language using the terms True and false, so
consider more natural terms such as Yes/No, Agree/Disagree
Margaret Thatcher was the only British woman
Prime Minister.
Yes No or
Agree Disagree
3. Binary choice mistakes
Binary
choice questions are often too long or turn into complex tests of logic by
having two or more ideas expressed.
Using NOT in a binary choice simply leads
to a logic test, so best avoided. Finally, and I have seen this, asking the
user to try again is stupid.
4. Edit
out unnecessary words in options
How many earths would
fit into the sun?
approximately 1000
approximately 10,000
approximately 100,000
approximately
1,000,000
This is a good question, a real test of your knowledge of
the relative sizes of the earth and sun. However, take out the repeated word of
each of the options, as there is no need to make the learner go to that extra
effort in reading the same word five times.
Approximately how
many earths would fit into the sun?
1000
10,000
100,000
1,000,000
And with meaningful feedback….
1000 -> Sorry. It
is easy to underestimate the size of the sun. The Sun has a radius 100 times
that of the Earth. Try again.
10,000 -> Sorry. It
is easy to underestimate the size of the sun. The Sun has a radius 100 times
that of the Earth. Try again.
100,000 -> Sorry. It
is easy to underestimate the size of the sun. The Sun has a radius 100 times
that of the Earth. Try again.
1,000,000 ->
That’s right. In fact, you could fit 1.3 million Earths inside the Sun!
It’s huge.
5. Make question grammatically agree with all
answers
In golf, a one under
par is a:
Eagle
Birdie
Bogie
Albatross
The word ‘a’ agrees with two of the options but not the
other two. It’s grammatically wrong, as well as almost giving away the right
answer!
What do you call one
under par in golf?
Eagle
Birdie
Bogie
Albatross
6. Avoid negative questions
Avoid
having NOT in the question, (Which of the following is not…..) unless they
really are testing a competence than involves a common error or misconception.
The ‘not’ can be easily missed by the learner, so capitalize if used. These questions
often simply reinforce irrelevant negative information, rather than testing
much that is useful.
7. Avoid making the longest option the answer
Go through
the assessment items and click on just the longest option. You’d be surprised
to see how often these are weighted towards the correct answer, as designers
often write this first, then come up with easier, shorter distractors. Poundstone
showed that this was common, even in high stakes tests. Options must be roughly
the same length, of similar grammar and random in order of presentation.
8. Avoid ‘All of the above’ and ‘None of the
above’
These are
often imprecise tests and you don’t know what actually misled the learner if
they get it wrong. They are also easy to write, so there is a tendency for poor
test writers to have these as their ‘right’ answer, a fact shown by
Poundstone’s analysis of thousands of test items.
9. All options should be believable.
Answers
that look obviously wrong are pointless or even worse, patronizing and
condescending. These are two questions from BBC Bitesize:
What is magma?
A chocolate ice cream
Molten rock
Bubbles of gas
What are the tiny air sacs in the lungs called?
Ravioli
Bronchioles
Alveoli
I rest my
case. A lesser but similar error is to make one or more of the options too
obviously right or wrong.
10. Test your test items
First test
your test items with an expert, namely an experienced test-item writer. It is a
skill that takes some time to master. Just because you know your subject, does
not mean that you know how to assess and write good test items. Second, try the
questions out with real target learners and ask them to ‘voice’ what they’re
thinking. They invariable say things like ‘I don’t understand this question…
What do you mean by…”. Don’t argue, just change it.
Conclusion
Questions encourage,
not just the acquisition of learning but also critical thinking. Questions stimulate curiosity.
Questions can intrigue and pull the learner towards exploring a subject in more
detail. Good questions can be learning experiences in themselves not just test
items. They can be used to stimulate thought but also, through good feedback,
explain to the learner why they have a misconception, even act as a way of
getting a point across. Questions drive learners forward. Good questions
diagnose strengths and weaknesses. You don't know what you don't know and
questions uncover the often uncomfortable truth that you know less than you
thought you know. Finally, questions can assess and determine whether a learner
is competent against certain criteria.
Questions
really do matter in learning, that’s why it is important to design them so that
they are fit for purpose. Professional examination bodies all too frequently
write questions that are poorly designed and in some cases, unbelievably,
impossible to answer. Experts in their subjects, often write poor test items,
as it is difficult to put yourself in the shoes of a novice and assessment
skills are different from subject matter expertise. In online learning, even
professional vendors often produce badly designed test items, due to a lack of
interactive design expertise. In many ways the test items are more important
than the content.
3 comments:
Thank you for your interesting post. I wonder if it's not possible to add another point to your feedback : explaining some possible reasons of the mistake. For example:
10p -> Most people get this wrong first time around and choose this answer. Let's look at the reasons that lead to this answer. When we read the statement once, we can consider the cost of the bat is £1. Since the total is £1.1. So we deduce that the ball costs £1.1 - 0.1p = 0.1p. The error is that the statement does not say that the bat costs £1 but £1 more.
Moreover if the ball was 10p, and the bat is £1 more at 110p, the ball (10p) plus bat (110p) gives a total of 120p. Think of the ratio of the cost between the ball and bat within the 110p cost. Try again.
Hi Donald, excellent post which I stumbled upon after reading another great post of yours - on the use of animation in elearning. This covers quite a few of the common issues I see from inexperienced test writers (particularly use of 'All of the above' - which, when used, is ALWAYS the correct answer - dead give-away). I have also found that often 'professional' vendors make many of these errors (and are sometimes the worst offenders...)
What I found exceptionally valuable in your post was the inclusion of examples demonstrating good (and not so good) practice, and most importantly - explanations pointing out why they are good / not so good. Practising what you're preaching by providing meaningful examples and 'feedback'.
Testing your test items with a representative sample from the actual target audience using 'think out loud' usability / observation techniques is also a critical but often overlooked practice - not just with test items but elearning in general. The mentality of building, getting stakeholder approval / sign off then pushing it out (or perhaps inflicting it on people) is unfortunately all too common. Thanks for highlighting some important points and provoking better practice. I'll be sharing this one.
Thanks for the great and informative post.
So true that good questions are difficult to write.
I must confess that as I read your post, I quickly realized examples from my own teaching that needed to be repaired before unleashing them on students in an online environment.
It is humbling to read, but I feel that I will write better questions as a result.
Post a Comment