A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
I am sure they have patched it by now but at one point I was able to get chatgpt to give me copyright text from books by asking for ever large quotations. It seemed more willing to do this with books out of print.
Yeah, it refuses to give you the first sentence from Harry Potter now.
Which is kinda lame, you can find that on thousands of webpages. Many of which the system indexed.
If someone was looking to pirate the book there are way easier ways than issuing thousands of queries to ChatGPT. Type "Harry Potter torrent" into Google and you will have them all in 30 seconds.