A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
One of the first things I ever did with ChatGPT was ask it to write some Harry Potter fan fiction. It wrote a short story about Ron and Harry getting into trouble. I never said the word McGonagal and yet she appeared in the story.
There is enough non-copywrited Harry Potter fan fiction out there that it would not need to be trained on the actual books to know all the characters. While I agree they are full of shit, your anecdote proves nothing.
The anecdote proves nothing because the model could potentially have known of the McGonagal character without ever being trained on the books, since that character appears in a lot of fan fiction. So their point is invalid and their anecdote proves nothing.