A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
I don't get why this is an issue. Assuming they purchased a legal copy that it was trained on then what's the problem? Like really. What does it matter that it knows a certain book from cover to cover or is able to imitate art styles etc. That's exactly what people do too. We're just not quite as good at it.
A copyright holder has the right to control who has the right to create derivative works based on their copyright. If you want to take someone's copyright and use it to create something else, you need permission from the copyright holder.
The one major exception is Fair Use. It is unlikely that AI training is a fair use. However this point has not been adjudicated in a court as far as I am aware.
LLMs don’t create anything new. They have limited access to what they can be based on, and all assumptions made by it are based on that data. They do not learn new things or present new ideas. Only ideas that have been already done and are present in their training.
If you copy the copyrightable elements of another work, you have created a derivative work. That work needs to be transformative in order to be eligible for its own copyright, but being transformative alone is not enough to make it non-infringing.
There are four fair use factors. Transformativeness is only considered by one of them. That is not enough to make a fair use.
this is so fucking stupid though. almost everyone reads books and/or watches movies, and their speech is developed from that. the way we speak is modeled after characters and dialogue in books. the way we think is often from books. do we track down what percentage of each sentence comes from what book every time we think or talk?
Aye, but I'm thinking the whole notion of copyright is banking on the fact that human beings are inherently lazy and not everyone will start churning out books in the same universe or style. And if they do, it takes quite some time to get the finished product and they just get sued for it. It's easy, because there's a single target.
So there's an extra deterrent to people writing and publishing a new harry potter novel, unaffiliated with the current owner of the copyright. Invest all that time and resources just to be sued? Nah...
Issue with generating stuff with 'puters is that you invest way less time, so the same issue pops up for the copyright owner, they're just DDoS-ed on their possible attack routes. Will they really sue thousands or hundreds of thoudands of internet randos generating harry potter erotica using a LLM? Would you even know who they are? People can hide money away in Switzerland from entite governments, I'm sure there are ways to hide your identity from a book publisher.
It was never about the content, it's about the opportunities the technology provides to halt the gears of the system that works to enforce questionable laws. So they're nipping it in the bud.
this brings up the question: what is a book? what is art? if an "AI" can now churn out the next harry potter sequel and people literally can't tell that it's not written by JK Rowling, then what does that mean for what people value in stories? what is a story? is this a sign that we humans should figure something new out, instead of reacting according to an outdated protocol?
yes, authors made money in the past before AI. now that we have AI and most people can get satisfied by a book written by AI, what will differentiate human authors from AI? will it become a niche thing, where some people can tell the difference and they prefer human authors? or will there be some small number of exceptional authors who can produce something that is obviously different from AI?
i see this as an opportunity for artists to compete with AI, rather than say "hey! no fair! he can think and write faster than me!"
Well, poor literature has always existed, which some might not even dignify to call literature. Are writers of such things threatened by LLMs? Of course they are. Every new technology has beought with it the fear of upending somebody's world. And to some extent, every new technology has indeed done just that.
Personally, and... this will probably be highly unpopular, I honestly don't care who or what created a piece of art. Is it pretty? Does it satisfy my need for just the right amount of weird, funny and disturbing to stir emotions or make me go 'heh, interesting!'? Then it really doesn't matter where it comes from. We put way too much emphasis on the pedigree of art and not on the content. Hell, one very nice short story I read was the greentext one about humans being AI and escaping from the simulation. Wonder how many would scoff at calling art something that came out of 4chan?
Maybe this is the issue? Art is thought of as a purely human endeavour (also birds do it, and that one pufferfish that draws on the seabed, but they're "dumb" animals so they don't count, right? hell, there's even a jumping spider that does some pretty rad dances). And if code in a machine can do it just as well (can it? let it - we'll be all the better for it. can't it? let it be then - no issue) then what would be the significance of being human?