Monday, June 25, 2018

AI and assessment

I used my fingerprint to access this Mac to write this piece, my iPhone uses face recognition and when I travel, face recognition is used to identify me when I leave and enter the country. I am constantly being ‘assessed’ using AI. As the pendulum swings towards online learning, it makes sense to use it in online examinations. Yet the only example of AI being used in assessment in learning is in checks for cheating – plagiarism checkers.
AI is not perfect but neither are humans. Human performance falls, when marking large numbers of essays, they make mistakes, have biases based on names and gender, cognitive biases, as well as biases on what is acceptable in terms of critiques and creativity. This is not about replacing teacher assessment, it’s about automating some of that work to allow teachers to teach and provide more targeted, constructive feedback and support. It’s about optimising teachers’ time. It is also about opening up the huge potential in online assessment, on the not inconsiderable grounds of convenience, quality and cost.
1. Identification
Live or recorded monitoring (proctoring) is used to watch the candidate. You can also monitor feeds, use a locked down browser, freeze screen, block cut and paste, and limit external access.  Video, including 360 degree cameras, and audio are also used to detect possible cheating. Using webcams you can scan for suspicious objects and background noise, also use face recognition.
Coursera holds a patent on keystroke recognition. They get you to type in a sentence, then measure two things; dwell time on each key and time between keystokes, giving you as a candidate a unique signature, so that exam input can be checked to be by you. 
In addition they scan your photo ID, a Driver's license or Passport. Proctoring companies use machine learning to adapt to student behaviour, improving its analysis with each exam. Their facial recognition, eye movement tracking and auditory analysis identifies suspicious behaviour, with incident reports and session activity data generated at the end of each exam.Multi-factor authentication — ID and photo capture, facial recognition and keystroke analysis are all used to verify student identity.
All of these techniques and others are improving rapidly and it is clear from these real examples that AI is already useful in enabling more convenient, cheaper and on-demand identification and assessment. 
2. Interface (voice)
Learners largely use keyboards, whether physical or virtual to write. This is the norm at home and in the workplace. Yet assessment is still largely by writing with a pen. This creates a performance problem. On most writing and critical thinking tasks one needs to be able to ‘rewrite’ (reorder, delete, add, amend) text. Writing with a pen encourages the opposite – the memorisation of blocks of text, even entire essays.
We have already seen how keystroke patterns can be used to identify candidates but voice is also rapidly becoming a normal form of interaction with computers, with 10% of searches on Google, Siri and Cortana are common tools, as well as home devices such as Amazon’s Alexa and Google Home. The advantages of voice for assessment are clear; natural interface, frictionless, speaking is a more universal skill than writing and it eliminates literacy problems, where literacy is not the purpose of the assessment. Voice also helps assess within 3D environments such as VR assessment, where you can navigate and interact wholly by voice. We have a system in WildFire which is wholly voice-driven within or without VR. VR is another form of interface in assessment (more of this later in this article).
3. Retrieval as formative assessment
Formative testing has a solid research base. It shows that testing as a form of retrieval is one of the most effective methods of study. A metastudy by Adesope et al (2017) shows the superiority of testing over reading and other forms of study. 
However, most online learning relies heavily on multiple-choice questions, which have become the staple of much e-learning content. These have been shown to be effective, as almost any type of test item is effective to a degree, but they have also been shown to be less effective than open-response, as they test recognition from a list, not whether it is actually known. MCQs are a relic of the early days of automated marking, when templates could be used around boxes to visually or machine-read ticks/crosses. There are many problems with multiple choice questions; the answer is given, requires recognition rather than retrieval skills, guessing gives you a 25%/33% chance of being right, distractors can be remembered, cheating works and surface structure seriously distorts efficacy.
Kang et al. (2007) showed that, with 48 undergraduates, reading academic Journal quality material, open input is superior to multiple-choice (recognition) tasks. Multiple choice testing had an affect similar to that of re-reading whereas open-input resulted in more effective student learning. McDaniel et al. (2007) repeated this experiment in a real course with 35 students enrolled in a web-based Brain and Behavior course at the University of New Mexico. The open-input quizzes produced more robust benefits than multiple-choice quizzes. ‘Desirable difficulties’ is a concept coined by Elizabeth and Robert Bjork, to describe the desirability of creating learning experiences that trigger effort, deeper processing, encoding and retrieval, to enhance learning. The Bjorks have researched this phenomenon in detail to show that effortful retrieval and recall is desirable in learning, as it is the effort taken in retrieval that reinforces and consolidates that learning.
A multiple-choice question is a test of recognition from a list. They do not elicit full recall from memory. Studies comparing multiple-choice with open retrieval show that when more effort is demanded of students, they have better retention.. As open-response takes cognitive effort, the very act of recalling knowledge also reinforces that knowledge in memory. The act of active recall develops and strengthens memory. It improves the process of recall in ways that passive recall – reading, listening and watching do not. Active recall, pulling something out of memory, is therefore more effective in terms of future performance.
AI can help assess alternatives to MCQs by opening up the possibilities of open input. Meaning matters and so it makes sense to assess through open response, where meaningful recall is stimulated. This act alone, even when you don’t know the answer, is a strong reinforcer, stronger indeed, than the original exposure. Interestingly, even when the answer is not known, the act of trying to answer is also a powerful form of learning. 
4. Automatic creation of assessments
We have developed an AI content creation service in WildFire, that not only creates online learning content but also assessments at the same time. AI techniques create content with the assessment identical to the learning experience, both with open text input, as outlined above. In addition, we can detect a great deal of detail about user behaviour while they do the assessment. You can vary the difficulty, and some of the input parameters, of the assessment using some global variables. This approach is important for the great mass of low level, low stakes assessment, whether formative or summative.
5. Algorithmic spaced practice
The timing of formative assessment is also important as Roediger (2011) has shown, with a logarithmic pattern recommended i.e. loosing up the period between testing or self-testing as time passes. This is one of the most effective study techniques we know, yet many seem to be trapped in the world of taking notes, reading, underlining and re-reading. The way to enhance this technique is to use an algorithm to determine the pattern of practice and push practice events to individual learners. We do this in WildFire.
6. Plagiarism
The most common use of AI in learning is in plagiarism checkers. Oddly, this is by far the most common use of AI in assessment. The quality assurance surrounding assessment often relies on this one tool to verify authorship. There’s lots of tools in this area; (free),  (cheap) or (expensive) or (BlackBoard). Turnitin also has writecheck, a service that allows students to submit their work. What is odd is that the only use of AI in HE is trying to catch cheats.Interestingly, given that plagiarism is a genie that is well and truly out of the bottle, we are still stuck with essays as a rather monolithic form of assessment, especially in Higher Education. The good news is that the AI techniques, increasingly used in plagiarism checkers are increasingly used to allow learners to submit drafts of essays for reflection and improvement. It is in the provision of feedback to submitted text through formative assessment that learning takes place. Comparisons across the essays submitted by one student may reveal inconsistencies that need further investigation.
Essays are sometimes appropriate assignments if one wants long-form critical thought. But in many subjects shorter, more targeted assignments and testing are far better. There’s a lot of formative assessment techniques out there and essays are just one of them. Short answer questions, open-response, formative testing, adaptive testing are just some of the alternatives.
7. Essay marking
Essay and short open answer marking is possible using AI-assisted softwareThe software takes lots of real essays, along with their human marked grades and looks for features within those grades that distinguish them from the other grades. In this sense, the software is using human traits and outputs and tries to mimic them when presented with new cases. The features the software needs to pick up on vary but can include missing absent words/phrases and so on. So it is NOT the machine or algorithms on their own doing the work, it’s a process of looking at what humans experts did when they marked lots of essays. 
Machine grading gives you a score but it also gives you a probability, namely a confidence rating. This is important, as you can use this to retrain the algorithm on low confidence scored essays. AES also tries to give scores for each dimension in the scoring rubric, it’s not just an overall grade.
8. Adaptive assessment
Delivering assessments that adapt to the learner’s performance is called adaptive learning. The advantage is that you require fewer test items to assess.  Iterative algorithms select questions from a database and these are delivered according to the learner’s ability, starting with a medium ability item. WildFire has used this in chatbot delivered assessments, where sprints of questions are delivered in a more naturalistic dialogue format.
9. Context
3D environments, either on 2D screens or in VR have opened up the possibility of assessment within a simulated context. This is particularly useful for physical and vocational tasks. VR systems also offer multi-learner environments with voice and tutor control. This is rapidly becoming a total simulation environment, where both psychological and physical fidelity can match the assessment goals. 
Many competences an only be measured by someone doing something. Yet most exams come nowhere near measuring competences. This is head and shoulders above traditional paper exams for many vocational and practical tasks, real skills. Your performance can really be measured. Your assessment can be your performance – complete and you’ve passed. This is already a reality in many simulations, flight sims and so on. It can also be true of many other skills.
Recertification for inspections is one practical example. I’ve been involved in a simulation on domestic house gas inspection that simulates scenarios so well it’s now used as a large part of the assessment, saving huge amounts of money in the US. You’re free to move around the house, check for gas leaks, do all the necessary measurements using the right equipment – a completely open training and assessment environment. With Oculus Rift it is far more realistic than a 2D screen showing a 3D simulation.
Of course, VR is not essentially AI, although the possibility of AI.
10. Online proctoring
All of the above enable online assessment, or proctoring, especially online identification but also the many online developments around interface, input, retrieval, creation, marking and context. The MOOCs providers have been doing this, and refining their models, over a number of years. It is already a reality for the MOOC providers such as Udacity and Coursera, where paying for grading of assignments, online exams and Nanodegrees (with job promises and money back if you don’t get a job), have been implemented. It is undeniable that most forms of delivery are moving online, whether retail or financial, but also in learning. This increase in demand for online learning needs to ne matched by an increase in demand for online assessment. The knotty problems associated with online assessment benefit greatly from AI.

 Subscribe to RSS


Post a Comment

<< Home