There is a battle underway, with assessors in education and institutions on one side and Generative AI on the other. It often descends into a cat and mouse game but there are too many mice, and the mice are winning. Worse still is when it descends into a tug-of war, tech showdown or brutal legal collision.
We clearly need to harmonise assessment with our evolving technology and navigate our way through AI and assessment to avoid friction, with some practical solutions to minimise disruption. Assessment is at a crossroads.Legal confrontations between assessors and the assessed will do great damage to institutions. When Sandra Borch, Norway’s Minister of research and higher education cracked down on plagiarism, taking a student to the Supreme Court, another student uncovered serious plagiarism on her Master’s theses – she had to resign. A more distasteful case was Claudine Gay, President of Harvard, who had to resign after an attack by a right-wing politician, Bill Ackerman. The whole thing turned toxic as people uncovered plagiarism in the work of his academic wife. As many US kids have lawyers as parents, a slew of cases are hitting the courts putting the reputations of schools and colleges at risk. This is not the way forward.
Problem
Assessment stands right in the middle of the earthquake zone as the two tectonic plates of AI and traditional education collide and rub up against each other. This always happens with new technology; printing, photocopying, calculators, internet, smartphones… now AI.
We are currently in the position where the mass use of AI is common, because it is a fabulous tool, but is being used on the SLY. There is widespread use of unsanctioned AI tools to save time. Learners and employees are using it in their hundreds of millions, yet educational institutions and organisations are holding out by ignoring the issue, or doing little more than issuing a policy document. AI is therefore seeping into organisations like water, a rising tide that never ebbs.
The problem has just gotten way more challenging as AI agents are here (automation). The Claude Sonnet 3.5 upgrade has just gone 'agentic', in a very practical way. It can use computers the way people do. It can look at your screen, go find stuff, analyse stuff, complete a chain of tasks, move your cursor, click on buttons and type text. To be clear, it understands and interacts with your computer just as you are doing now. This shift is in ability means it can to do open-ended functions like; sit tests, do assignments, even open research.
In truth…
We need to be more honest about these problems and stop shouting down from moral high horses, such as ‘academic integrity’. Human nature determines that if people find they are being played, or have an imperative that is different, they will take paths of least resistance. E.O. Wilson’s observation that we have “ Palaeolithic minds, Medieval institutions & Godlike technology” leads people to take shortcuts, through fear of failure, financial consequences, even panic. It is pointless to continually say 'They shall not pass' if the system has below par teaching and assessment is poorly constructed. We cannot simply blame students for the systemic failures of a system. Education must surely be a domian where teachers and learners are in some sort of harmony.
It doesn't help that we delude ourselves about the past. Cheating has been and still is common. We’re kidding ourselves if we think parents don’t do stuff for their kids at school and in universities. Essay mills and individuals writing assessments, even Master’s theses, exist in their tens of thousands in Nairobi and other places, feeding European and US middle-class kids with essays. Silk cloths with full essays go back to Confucian times. Technology can be bougt on the internet with button cameras and full comms into invisible earpieces. Cheating is everywhere. That's not to condone it, just to recognise tat it is ALWAYS a problem, nout just an AI problem.
In truth, we also have to be honest as accept that assessment is far too ‘text’ based. Much of it does not assess real skills or performance – even critical thinking. Writing an essay in an exam does not test critical thinking. No one writes critically starting top left, finishing bottom right – that’s why students memorise essays and regurgitate them in exams. Essay setting is easy, actual assessment is hard. We also have to be honest and accept that most educators designing and delivering assessment know little about it.
In the workplace, few take assessment seriously. At best it is multiple choice or e-learning thinly peppered with MCQs. L&D doesn’t take assessment seriously because they are not driven by credentials, nor do make much effort to evaluate training. With MCQs, you can guess, (1 in 4), distractors are often poor or simply distract, are difficult to write, easy to design badly, often too factual or unreal, require little cognitive effort and can be cheated, (longest, opposites etc.). An additional problem is that online authoring tools lock us into MCQs.
Assessments are odd. People settle for 70-80% (often an arbitrary threshold) as tests are seen as an end-point. They should have pedagogic import and give the learners momentum, yet there is nothing meaningful on improvement in most assessment and marking. Even with high scorers, full competence rarely the aim as a high mark is seen as enough, not full competence. The aim is to pass the test not master the subject.
Plagiarism checkers do not work, so DO NOT use detectors. Neither should you depend on your gut – that is just as bad, if not worse. There are too many false positives and they consistently accuse non-native speakers of cheating. Students KNOW this tech better than you. They will always be one step ahead and even if they are not, there will be places and tools they can use to get round you.
Neither does setting traps for the mice, like "Include in your work icitations from <fictional name>", as when employed, once the trap is revealed, the learners can use the tech to reveal the fictional trap.
In a study by Scarfe et al., GenAI submissions were seeded into the exam system for five undergraduate modules across all years of BSc Psychology in a UK University. 94% of the AI submissions went undetected, with the AI submissions getting grades a half grade boundary higher than real students.
In a recent survey of Harvard students, some had made different course choice because of AI, other reported a sense of purposelessness in their education. They were struck by the fact that they were often learning what will be done differently in a world of AI. It is not just assessment that needs to change but also what is taught to be assessed. Assessment is a means to an end, assessing what is known, learnt or taught. If what we need to know, learn to teach changes, so should assessment.
Solutions
Calculators generate numbers, GenAI generates text. We now live in a post-generative AI world where this is the norm. Most writing is also now digital, so why are so many exams written?
Most writing in the workplace is not a postmodern critique of Macbeth but fairly brief, bureaucratic and banal, getting things done by email, comms, plans, docs and reports. Management is currently laden with admin and GenAI promises to free us from admin to focus on what matters - the goal. It is here to stay because there is a massive need in the real world to raise productivity on speed and quality and not nget bogged down on redrafting or pretending you are Proust. Why expect everyone to be writers of brilliant prose, when the goal is to simply get things done.
1. We have to move beyond text-only assessment into more performance-based assessments. Kids go to school at 5 and come out up to 20 years later having did little else other than read, write and comment on text. There is this illusion that one can assess skills through text – which is plainly ridiculous. Accept that people use GenAI to improve their writing beyond their own threshold. Encourage them to use AI to help them make their writing morce concise through summarisations. Allow them to critique their own work through AI.
2. Build 'pedagogy' into creating assessments with AI. We have done this by taking the research, on say transfer and action, then building that into the AI assessment creation process. You get better, more relevant assessment items, along with relevant rubrics.
3. Also build good assessment design practices into creating assessments with AI. There are clear DOs and DON’Ts in the design of assessment items. Build these into the AI creation process. Go further and match assessments to quality assessments standards. Believe me, this can be done in AI.
4. Match assessments more closely to what was actually taught. This alignment can be done using AI, including the identification of gaps, representative coverage, weaknesses on emphasis identified. The documents and transcripts used in teaching and/or the curriculum, can be used by AI to create better quality assessments.
5. Do more pre-assessments. David Asubel said “The most important single factor influencing learning is what the learner already knows.” I totally agree, yet it is rarely done. This gives assessment real pedagogic import – it propels or feeds forward into the learning process and helps teachers. These can be created quickly by AI.
6. Let's have more retrieval practice. This can be created quickly by teachers, even learners themsleves. We know that this works better than underlining notes and highlighting. Making leatrners or learners themselves making the effort to recall ideas and solutions intheir own minds helps get stuff into long-term memory.
7. Move away from MCQs towards short open text, assessed by AI. Open text is intrinsically superior as it demands recall from memory rather than identification and discrimination (correct answer in MCQs is there on the paper or screen). Open response more accurately reflects actual knowledge and skills.
8. Move away from authoring tools that lock you into fixed, templated MCQ assessment items. They also template things into rather annoying cartoon content with speech bubbles etc. Here's 25 ways I think bad design makes content and assessments suck.
9. Use more scenario and simulation assessment. They match real world skills, have more relevance and can set more sophisticated assessments on decision making and other skills. AI can create such scenario and sims assessment and the content you need to populate the scenarios.
10. On formative assessment, 'test out' more. Testing out means allowing people to progress if they pass the text. they may be able to skip modules, even the entire course. This should be the norm in compliance training, or training full stop, where people get the same courses year after year.
11. Get AI to create mnemonics and question-based flashcards for learners to self-assess, practice and revise and create personalised spaced-practice assessment, so they can get learning embedded.
12. On formative assessment, use more AI designed and delivered adaptive and personalised learning. Adaptive learning can be 1) PRE-COURSE: Student data or pre-tests define pathways. (Never use learning styles or preferences). 2) IN-COURSE: Continuous adaption which needs specialised AI software (this is difficult but systems can do it). 3) POST-COURSE: With shared data across courses/programmes, also Adaptive assessment, Adaptive retention such a personalised spaced practice and performance support.
13. AI created Avatars can be built into assessments. These can be customers in sales or customer care training; employees in leadership. management and specific skills such as recruitment interviewing; or patients in medical education, where you can interact with people in assessments to provide realism.
14. Automate marking. Most lecturers, teachers and trainers have heavy workloads, many rightly complain about this, so focus on teaching not marking. Automated marking will also give you insights into individual performance and gaps. The Henkel study (2024), in a series of experiments in different domains at Grade levels 5-16 (Key Stage 2/3/4), showed that AI was as good at marking as humans.
15. Use audio to deliver assessment results, along with feedback and encouragement. This is more personal and motivating for the learner. It also forces the assessor to be more articulate and precise.
16. Use AI post-assignment or post-assessment techniques, such as generated audio questions that interrogate the learners understanding of their own work. Their audio answers could be transcribed by AI, even assessed by AI.
17. Make assessments more accessible on language and content. Far too many assessments have overly academic language or assessment items that are to abstract and can be turned into better expressed and relevant prose and problems. Translate overly-academic language to readable and vocational using AI. Critique and translate into more readable prose.
18. AI has revolutionised accessibility thorough text-to-speech and speech to text. It has now provided text and speech chatbots and automatically created podcasts (free NotebookLM) AI has also given us live captioning and real-time transcription. For dyslexics (5–15% of population), T2S & S2T, spell checks, Grammar Assistants, Predictive Text and voice dictation have been incredibly useful in reducing fear and embarrassment. AI can do wonders in making assessment more accessible.
19. Use AI for data analysis on your assessment data. It is as simple as loading up a spreadsheet and asking questions you want answered.
20. Stop being so utopian. Most people at school and University will not become researchers and academics. Don’t assess them as if that is ‘their’ goal.
21. What are the skills left over for education to focus on after you get GenAI into common use both in education, the workplace and life? The pat answer is too often - soft skills. I disagree. Move deeper into hard expertise and skills, with a broad perspective, that can be enhanced, magnified, even executed and automated by AI.
Conclusion
AI is moving steadily away from prompting to automation. This is happening in writing, coding spreadsheet analysis, image creation even moving image creation. It is happening with avatars and in real-time advanced dialogue as speech.
Just like calculators generated numbers GenAI generates text. We need to recognise this and change what we expect of learners. Turning it into a cat and mouse game will not work, there are too many mice and they're winning.
PS
This is a sort of summary of my talk in Berlin at the ATP European Assessment Conference.