This big news.
In what is seen as a critical test case, SDNY Judge Colleen McMahon has dismissed the idea that training a LLM is copying. The ruling, (without prejudice) did not provide judgements on what I'm about to say, merely stated the arguments and provides the explanatory detail, which I think is sound.
Generative AI ‘synthesises’, it does not copy. This is central. It’s a bit like our brains, we see, hear and read stuff but memory isn’t copying, it’s a process of synthesis and recall is reconstructive. If you believe in the computational theory of mind, as I do, this makes sense (many don't).
What is even more interesting is the conclusion that the datasets are so large than no one piece is likely to be plagiarised. That, I think is the correct conclusion. It would take 170,000 years for us to read the GPT4 dataset, reading 8 hours a day. Any one piece is quantifiably minuscule.
On the idea that regurgitated data has appeared. It would appear that this problem has been solved (almost), with provenance identified by some systems, such as GPT o1. In other words, don't worry, it was an early artefact of largely early systems.
I was always sure that these cases would result in this type of ruling, as the basic law of copyright depends on copying, and that is not what is happening here. All freshly minted content is based on past content to a degree and here it is not just a matter of degree (it’s minuscule) but also the methods used. Complex case but right rationale.
I think we're seeing many of the ethical objections to AI fade somewhat. There are still issues but we're moving past the rhetorical phase of straw men and outrage, into detailed analysis and examination. This is an important Rubicon to have crossed. Many so called 'ethical' issues are just issues that need to be worked through, rather than waved as flags of opposition. We are seeing the resolution of these issues. Time to move on.
No comments:
Post a Comment