Thursday, February 22, 2024

Rather than deepfakes, censorship and surreal ethical fakes are the problem?

Google have just shot themselves in the foot with their release of Gemini. Social and mainstream media has been flooded with pieces showing that their text and image generation behaves like some crazed activist teenager.

When asked to create images of a German soldier it created black, Asian and female faces, so keen was it to be ‘diverse’. I won’t give other examples, but it almost looks as of the Gemini LLM is mocking its creators for being so stupid. It's as of the language model fed back to them the craziness of their own internal ideological echo chamber and culture.

This image shows what happens when idiotic guardrailing overrides common sense. We get the imposition of narrow moralising on reality and reality loses. More than this, straight up utility loses. The tool gets a reputation for being as flaky as its moralist moderators. This is not like the six fingers problem, a weakness in the technology itself, it is the direct result of human interference.

What is Guardrailing?

Guardrailing is a complex business and needs to be carefully calibrated. So what is ‘Guardrailing’? It attempts, like road barriers to provide safeguards and constraints to stop responses producing harmful, inappropriate or biased output. These three words are important. 

Harm

No one wants child porn, porn or actual content that causes real physical and 'extreme' psychological harm to be produced. This has long been a feature in ethics around the line that should be drawn within freedom of expression (not just speech). We need to err on the side of freedom of expression as that is enshrined in our democratic society as essential. That throws the definition back to the word ‘harm’, as defined legally, not by someone who ‘feels’ as though they have been harmed or think that harm is synonymous with offence.

Inappropriate

This is different from harm in that it depends on a broad social consensus around what is appropriate, a much harder line to draw. Should you allow swearing (some literature), nudity in any form (thing classical art) and so on. Difficult but not the real problem as there is, largely, a social consensus around this.

Biased

This is a dangerous term, and where things can go way off balance. I use that word ‘balance’ as we cannot have a small group imposing their own personal views on what constitutes balance being applied to these systems. This clearly happened in Google’s case. In trying to eliminate what they saw as bias they managed to impose their own extreme biases on the system.

How is it implemented?

It always starts with policy statements and guidelines. This is where things can go badly wring, if the people writing the guidelines apply their own personal, ideological or activist beliefs – whether from the right, left, wherever. This is a huge lever as it affects everything. This is clearly where things went wrong at Google. A small number of moralisers have imposed their own views on the system. They need to be removed from the company.

The guidelines are then implemented using content filters, often prompts directing output towards supposed unbiased generation. The problem here is that of your guidelines are too constrained you eat not just into freedom of speech but also functionality. It simple doesn’t do things, like reply or create an image. Goggle have gone back to the filters to recalibrate.

Prompt modification means, they take your prompt then add other criteria, like 'diverse', 'inclusive' and other positively discriminatory descriptions. This is most likely in this case as the outputs are so obviously and crazily inappropriate. 

Moderation is another technique. This is a bad idea as it is slow, expensive and subject to the vagaries of the moderators. You are far better automating and calibrating that automation. Although there are several exceptions to this, such as porn, child porn, extreme violence and other actually harmful content.

You can also curate the training data. This is less of a problem as the data is so large that it tends to eliminate extremes. Indeed some of the problems created by Google have been on ignoring clear social norms that the training data would have produced. Apply a narrow definition of identity and you destroy realism.

There is also user testing. This has really surprised me. I know that Gemini was tested widely among Google employees before release, as I know people who did it. The problem could be that Google tends to employ a certain narrow demographic, so that testing is massively biased. This is almost certainly true in this case. Or, more likely the image generation wasn’t actually tested or tested only with their own weird ‘ethics’ team 

You can also put in user constraints that apply to single requests and/or context, such mentioning famous names in image generation. That’s fine.

Conclusion

I warned about this happening three years ago in a full article, and again in a talk at at the University of Leeds.

Who would have thought that rather than political deepfakes being the problem, censorship and surreal ethical fakes by flaky moralists, would flood social media? Guardrailing is necessary, and a good thing, but only when it reflects a broad social consensus, not when it is controlled by a few fruitcakes in a tech company. You can’t please all of the people all of the time but pleasing a small number at the expense of the majority is suicidal. Guardrails are essential, imprisoning content or allowing the generation of massively biased content, under the guise of activism, is not.

The good news is that this has happened. I mean that. In fact, it was bound to happen, as there was always going to be a clash between the moralisers and reality. We learn from our mistakes and Google will rethink and re-release. That's how this type of innovative software works. Sure, they seem hopeless at testing, as five minutes of actual testing would have revealed the problem. But we are where we are. The great news is that it knocks a lot of the bonkers ethical guardrailing into touch. 




No comments: