Wednesday, August 26, 2015

Top 10 stupid mistakes in design of Multiple Choice questions

Some online learning designers write their assessment items first, as they can match the assessment to the objectives or competences before being distracted by the detail in content. This avoids the trap of writing test items that simply test atomic facts and words from the presented text.
Many test items are quite simply not fit for purpose, as they don’t really assess, test the wrong thing or are so badly designed that they mislead or annoy learners. Bad test item writers tend to simply pluck out key words from the text and test the recognition or the meaning of those terms. Good test item writers, and this is a skill that few subject matter experts possess, understand good design and the need to test deeper understanding.
Here’s a few rules to follow if you’re new to this game:
1. Test understanding, not facts
What’s easy is to simply extract all the nouns, objects and quantities, then testing for recall. The trick is to push beyond this to test understanding. It’s not the ‘what’ but the ‘why’ that often matters but remains untested. What’s hard is to write questions that really do test the learner’s actual knowledge and ability to apply that knowledge. Professor Dylan Wiliam, calls great questions ‘hinge questions’, as they literally diagnose poor understanding. He explains how one can use these questions as powerful verbal test items in a classroom, where it is difficult to diagnose 30 kids quickly but the principle applies just as well to online learning. Here’s an example:










The ball sitting on a table is not moving. It’s not moving because:
A. No forces are pushing or pulling on the ball.
B. Gravity is pulling down, but the table is in the way.
C. The table pushes up with the same force that gravity pulls down.
D. Gravity is holding it on to the table.
E. There’s a force inside the ball keeping it from rolling off the table.
This question not only catches common misconceptions, it diagnoses between those who have understood the ‘physics’ and those who have not. There are forces at play, two in fact, so C is correct. The other options introduce common misconceptions about the absence of forces in stationary objects (A), single forces (B & D) or inner forces (E).
There are four components that make great questions:
Make them THINK
Give helpful FEEDBACK
Keep it CONVERSATIONAL
Be POSITIVE
A really bad question will allow the learner to guess the right answer or simply test recognition from a list. A good question will make the learner think, a great question will make them look away, even close their eyes to recall what they know, do something in their head and move towards the answer. It will push the learner.
Let’s try this:
A bat and ball cost £1.10.
The bat costs one pound more than the ball.
How much does the ball cost?
20p
15p
10p
5p       
This question really is a test of a learner’s grasp of algebra and ‘ratios’ in maths, it does a great job, as it really requires you to know how to apply a principle in mathematical thinking. The answer is not 10p as many choose. Think about it. If the ball was 10p, the bat, if it were a pound more, would be £1.10p, making a total of £1.20. The right answer is 5p. If the ball was 5p, the bat, if it were a pound more, would be £1.05p, making a total of £1.10. That’s a great question.
Remember also, that a great question also needs great feedback. In fact, it is better to call these, not questions, but ‘test items’ as you need, not only the question but the feedback. So let’s think about explanatory feedback for this question, that allows the learner to learn from the question.
A bat and ball cost £1.10.
The bat costs one pound more than the ball.
How much does the ball cost?
20p
15p
10p  
5p   
   
20p    ->  Try again.
15p    ->  Try again.   
10p    ->  Try again.
5p      ->  Correct.
This doesn’t really give any help to the learner who has jumped to the wrong conclusion or hasn’t thought deeply enough about the question. We need to provide HELPFUL feedback.
A bat and ball cost £1.10.
The bat costs one pound more than the ball.
How much does the ball cost?
20p
15p
10p  
5p      
20p    ->  20p+120p=140p  Try again.
15p    ->  15p+115p=130p  Try again.   
10p    ->  10p+110p=120p  Try again.
5p      ->  Correct. 5p+105p=110p
The feedback here tries to unpack the nature of the problem for the learner and gives an explanation as to why they are wrong.
Now let’s add another touch, that human element. Make it CONVERSATIONAL.  In conversation, you wouldn’t ask a question like this and simply feedback the statement ‘10p+110p=120p Try again.’ You’d make it a bit more conversational and friendly.
20p    ->  If  the ball was 20p, and the bat is £1 more at 120p. The ball (20p) plus bat (120p) gives a total of 140p. Try again.
15p    ->  If  the ball was 15p, and the bat is £1 more at 115p. The ball (15p) plus bat (115p) gives a total of 130p. Try again.
10p    ->  If  the ball was 10p, and the bat is £1 more at 110p, the ball (10p) plus bat (110p) gives a total of 120p. Try again.
5p    ->  Correct. If  the ball was 5p, and the bat is £1 more at 105p, the ball (5p) plus bat (105p) gives a total of 110p
Now be a little more POSITIVE:
20p    ->  Sorry. Most people get this wrong on first attempt. Let’s see why this is wrong. If the ball was 20p, and the bat is £1 more at 120p. The ball (20p) plus bat (120p) gives a total of 140p. Think of the ratio of the cost between the ball and bat within the 110p cost. Try again.
15p    ->  Sorry. Most people get this wrong on first attempt. If  the ball was 15p, and the bat is £1 more at 115p. The ball (15p) plus bat (115p) gives a total of 130p. Think of the ratio of the cost between the ball and bat within the 110p cost. Try again.
10p    ->  Most people get this wrong first time around and choose this answer. If  the ball was 10p, and the bat is £1 more at 110p, the ball (10p) plus bat (110p) gives a total of 120p. Think of the ratio of the cost between the ball and bat within the 110p cost. Try again.
5p    ->  Well done. If the ball was 5p, and the bat is £1 more at 105p, the ball (5p) plus bat (105p) gives a total of 110p.
You get the idea. Make test items meaningful, not mundane, and be helpful, fulsome and conversational in the feedback. This type of approach to test items will make the learner feel good and propel them forward, rather than act as barriers or make them feel like failures.
2. Binary Choice language
We don’t ask each other questions in normal language using the terms True and false, so consider more natural terms such as Yes/No, Agree/Disagree
Margaret Thatcher was the only British woman Prime Minister.
􏰁 Yes 􏰁 No      or       􏰁 Agree  􏰁 Disagree
3. Binary choice mistakes
Binary choice questions are often too long or turn into complex tests of logic by having two or more ideas expressed. 
Using NOT in a binary choice simply leads to a logic test, so best avoided. Finally, and I have seen this, asking the user to try again is stupid. 

4.  Edit out unnecessary words in options
How many earths would fit into the sun?
approximately 1000
approximately 10,000
approximately 100,000
approximately 1,000,000
This is a good question, a real test of your knowledge of the relative sizes of the earth and sun. However, take out the repeated word of each of the options, as there is no need to make the learner go to that extra effort in reading the same word five times.
Approximately how many earths would fit into the sun?
1000
10,000
100,000
1,000,000
And with meaningful feedback….
1000 ->  Sorry. It is easy to underestimate the size of the sun. The Sun has a radius 100 times that of the Earth. Try again.
10,000 ->  Sorry. It is easy to underestimate the size of the sun. The Sun has a radius 100 times that of the Earth. Try again.
100,000 ->  Sorry. It is easy to underestimate the size of the sun. The Sun has a radius 100 times that of the Earth. Try again.
1,000,000 ->  That’s right. In fact, you could fit 1.3 million Earths inside the Sun! It’s huge.
5. Make question grammatically agree with all answers
In golf, a one under par is a:
Eagle

Birdie

Bogie

Albatross
The word ‘a’ agrees with two of the options but not the other two. It’s grammatically wrong, as well as almost giving away the right answer!
What do you call one under par in golf?
Eagle
Birdie

Bogie

Albatross
6. Avoid negative questions
Avoid having NOT in the question, (Which of the following is not…..) unless they really are testing a competence than involves a common error or misconception. The ‘not’ can be easily missed by the learner, so capitalize if used. These questions often simply reinforce irrelevant negative information, rather than testing much that is useful.
7. Avoid making the longest option the answer
Go through the assessment items and click on just the longest option. You’d be surprised to see how often these are weighted towards the correct answer, as designers often write this first, then come up with easier, shorter distractors. Poundstone showed that this was common, even in high stakes tests. Options must be roughly the same length, of similar grammar and random in order of presentation.
8. Avoid ‘All of the above’ and ‘None of the above’
These are often imprecise tests and you don’t know what actually misled the learner if they get it wrong. They are also easy to write, so there is a tendency for poor test writers to have these as their ‘right’ answer, a fact shown by Poundstone’s analysis of thousands of test items.
9. All options should be believable.
Answers that look obviously wrong are pointless or even worse, patronizing and condescending. These are two questions from BBC Bitesize:
What is magma?
A chocolate ice cream
Molten rock
Bubbles of gas

What are the tiny air sacs in the lungs called?
Ravioli
Bronchioles
Alveoli
I rest my case. A lesser but similar error is to make one or more of the options too obviously right or wrong.
10. Test your test items
First test your test items with an expert, namely an experienced test-item writer. It is a skill that takes some time to master. Just because you know your subject, does not mean that you know how to assess and write good test items. Second, try the questions out with real target learners and ask them to ‘voice’ what they’re thinking. They invariable say things like ‘I don’t understand this question… What do you mean by…”. Don’t argue, just change it.
Conclusion
Questions encourage, not just the acquisition of learning but also critical thinking. Questions stimulate curiosity. Questions can intrigue and pull the learner towards exploring a subject in more detail. Good questions can be learning experiences in themselves not just test items. They can be used to stimulate thought but also, through good feedback, explain to the learner why they have a misconception, even act as a way of getting a point across. Questions drive learners forward. Good questions diagnose strengths and weaknesses. You don't know what you don't know and questions uncover the often uncomfortable truth that you know less than you thought you know. Finally, questions can assess and determine whether a learner is competent against certain criteria.
Questions really do matter in learning, that’s why it is important to design them so that they are fit for purpose. Professional examination bodies all too frequently write questions that are poorly designed and in some cases, unbelievably, impossible to answer. Experts in their subjects, often write poor test items, as it is difficult to put yourself in the shoes of a novice and assessment skills are different from subject matter expertise. In online learning, even professional vendors often produce badly designed test items, due to a lack of interactive design expertise. In many ways the test items are more important than the content.


3 comments:

Unknown said...

Thank you for your interesting post. I wonder if it's not possible to add another point to your feedback : explaining some possible reasons of the mistake. For example:

10p -> Most people get this wrong first time around and choose this answer. Let's look at the reasons that lead to this answer. When we read the statement once, we can consider the cost of the bat is £1. Since the total is £1.1. So we deduce that the ball costs £1.1 - 0.1p = 0.1p. The error is that the statement does not say that the bat costs £1 but £1 more.

Moreover if the ball was 10p, and the bat is £1 more at 110p, the ball (10p) plus bat (110p) gives a total of 120p. Think of the ratio of the cost between the ball and bat within the 110p cost. Try again.

Unknown said...

Hi Donald, excellent post which I stumbled upon after reading another great post of yours - on the use of animation in elearning. This covers quite a few of the common issues I see from inexperienced test writers (particularly use of 'All of the above' - which, when used, is ALWAYS the correct answer - dead give-away). I have also found that often 'professional' vendors make many of these errors (and are sometimes the worst offenders...)

What I found exceptionally valuable in your post was the inclusion of examples demonstrating good (and not so good) practice, and most importantly - explanations pointing out why they are good / not so good. Practising what you're preaching by providing meaningful examples and 'feedback'.

Testing your test items with a representative sample from the actual target audience using 'think out loud' usability / observation techniques is also a critical but often overlooked practice - not just with test items but elearning in general. The mentality of building, getting stakeholder approval / sign off then pushing it out (or perhaps inflicting it on people) is unfortunately all too common. Thanks for highlighting some important points and provoking better practice. I'll be sharing this one.

jk said...

Thanks for the great and informative post.
So true that good questions are difficult to write.
I must confess that as I read your post, I quickly realized examples from my own teaching that needed to be repaired before unleashing them on students in an online environment.
It is humbling to read, but I feel that I will write better questions as a result.