Wikipedia is, for me, the digital wonder of the world. A free, user generated repository of knowledge, open to edits, across many languages, increasing in both breadth and depth. It is truly astonishing. But it has recently become a victim of its own success. As it scaled, it became difficult to manage. Human editorial processes have not been able to cope with the sheer number of additions, deletions, vandalism, rights violations, resizing of graphics, dead links, updating lists, blocking proxies, syntax fixing, tagging and so on.
So would it surprise you to learn that an army of bots is, as we sleep, working on all of these tasks and many more? I was.
There are nearly 3000 bot tasks identified for use in Wikipedia. So many that there is a Bots Approval Group (BAG) with a Bot Policy that covers all of these, whether fully or partially automated, helping humans with editorial tasks.
The policy rules are interesting. Your bot must be harmless, useful, does not consume resources unnecessarily, performs only tasks for which there is consensus, carefully adheres to relevant policies and guidelines and uses informative messages, appropriately worded, in any edit summaries or messages left for users.
So far so good but the danger is that some bots malfunction and cause chaos. This is why their bot governance is strict and strong. What is fascinating here, is the glimpse we have into the future of online entities, where large amounts of dynamic data have to be protected, while being allowed to be used for human good. The Open Educational Resources people don’t like to mention Wikipedia. It is far too populist for their liking but it remains the largest, most useful knowledge base we’ve ever seen. So what can we learn from Wikipedia and bots?
AI and Wikipedia
Wikipedia, as a computer based system, is way superior to humans and even print, as it has perfect recall, unlimited storage and 24/7 performance. On the other hand it hits ceilings, such as the ability of human editors to handle the traffic. This is where well defined tasks can be automated – as previously mentioned. It is exactly how AI is best used, as solving very specific, well defined, repetitive tasks that occur 24/7 on scale. This leaves the editors free to do their job. Note that these bots are not machine learning AI, they are pieces of software that filter and execute tasks but the lessons for AI are clear.
At WildFire, we use AI to select content related to supplement learning experiences. This is a worthy aim, and there is no real editorial problem, as it is still, entirely under human control, as we can check, edit and change any problems. Let me give you an example. Our system automatically creates links to Wikipedia but as AI is not conscious or cognitive in any sense, it makes the occasional mistake. So in a medical programme, where the nurse had to ask the young patient to ‘blow’, while a lancet was being used to puncture his skin repeatedly in an allergy test, the AI automatically created a link to the page for cocaine. Ooops! Easily edited out but you get the idea. In the vast majority of cases it is accurate. You just need a QA system that catches the false positives.
Wikipedia has to handle this sort of ambiguity all the time. This is not easy for software. The Winograd Challenge offers $25000 for software that can handle its awkward sentences with 90% accuracy – the nearest anyone has got is 58%. Roger Schank used Groucho Marx jokes! Software and data are brittle, they don’t bend they break, which is why it still needs a ton of human checking, advising and oversight.
This is a model worth copying. Governance on the use of AI (let’s just call it autonomous software). Wikipedia, with its Bot Approval Group and Bot Policy, offers a good example within an open source context of good governance over data. It draws the line between bots and humans but keeps humans in control.
The important lesson here is that the practitioners themselves know what has to be done. They are good people doing good things to keep the integrity of Wikipedia intact, as well as keeping it efficient. AI is like the God Shiva, it both creates and destroys. The problem with the dozens of ethics groups springing up, is that all they see is the destruction. AI can be a force for good but not if it is automatically seen as an ideological and deficit model. It seems, at times, as though there’s more folk on ethics groups than actually doing anything on AI. Wikipedia shows us the way here – a steady, realistic system of governance, that quietly does its work, while allowing the system to grow and retain its efficiencies, with humans in control.