Skip to content

ChatGPT can train itself on your book. Deal with it.

Brian Merchant defends the Luddites:

They were not opposed to progress, and certainly not to technology; most were skilled technicians themselves, who spent their days working on machines at home or in small shops. It is true that the Luddites hammered certain machines to pieces, but it wasn’t technology itself they were protesting — it was the bosses that were using those machines to cut their pay and shepherd them into factories.

That sure seems like a distinction without a difference to me. Merchant is saying that the Luddites didn't oppose new technology, they merely opposed other people using new technology in ways they disliked. OK.

It turns out this is all in service of defending modern writers who are outraged because ChatGPT has been trained on their books:

The reason that, 200 years later, so many creative workers are angry and unnerved by AI is not that they fear it will become so good, so powerful that they may as well up and quit writing, drawing, or acting. It’s that, like the Luddites, they are painfully aware how bosses will use AI against them. To most working authors (and artists, screenwriters, illustrators, and so on) the fear over AI is not philosophical; it is economic, and it is existential.

The only way that bosses can use AI against writers is if AI, in fact, becomes so good and so powerful that it performs as well as humans. There's really no difference here.

In any case, the writers are supposedly put out by the fact that ChatGPT and similar apps have been trained on their copyrighted works. But so what? Copyright is just what it sounds like: it prevents you from copying a work and selling it without permission. It doesn't prevent you from reading a book and reviewing it, even if you haven't purchased it. Likewise, it doesn't prevent either a human or a computer from ingesting a work in order to index it or summarize it or do research with it.

ChatGPT cannot spit out a copy of a book upon demand. Or, if it can, it can be legally enjoined from doing so. But merely reading a book in order to get better at its job? There's nothing either wrong or illegal about that.

45 thoughts on “ChatGPT can train itself on your book. Deal with it.

  1. Steve_OH

    I don't think the concern is ChatGPT reading books in order to get better at its job. The concern is ChatGPT creating derivative (vs. transformative) works without consent of the copyright holder.

    1. Brett

      The distinction should be in the prompts. Someone saying, "Give me a picture of Donald Duck" and then selling it would be clear copyright violation using AI. Whereas mere inspiration - "Give me a picture in the style of Disney's Donald Duck illustrations" - not so much.

    2. Murc

      Derivative work is perfectly legal and always has been.

      Half the existing comic book heroes were developed on the basis "take this already existing character and change just enough to make it legal."

      1. Steve_OH

        If something has been changed "just enough to make it legal," then it is transformative, not derivative. That's the difference.

  2. Ken Rhodes

    "Copyright is just what it sounds like: it prevents you from copying a work and selling it without permission. It doesn't prevent you from reading a book and reviewing it, even if you haven't purchased it."

    That's funny, isn't it. My parents paid some good money to send me to college so I could be forced (occasionally against my will) to read books and learn from them.

  3. aldoushickman

    "Copyright is just what it sounds like: it prevents you from copying a work and selling it without permission."

    Not exactly--it prevents you from copying a work without a license; there are certain exceptions from liability (such as fair use), and the test for those takes into consideration the impact on the ability of the copyrightholder to make money, but that's not the exclusive test. And you can certainly violate copyright without "selling" the work--for example, you could make a zillion copies and give them away for free, for example, and still be breaking the law.

    ". . . it doesn't prevent either a human or a computer from ingesting a work in order to index it or summarize it or do research with it."

    That's less clear. Computers do not "ingest" a work like a person does; uploading a library of material to train a LLM involves making a copy of the underlying materials, and that may or may not be consistent with the license. (Esp. since training an AI isn't a use anybody would have thought of when creating copyrighted materials more than a year or two ago) So, it's an open question whether it's within the scope of whatever licenses might be applicable, or, if not, if it constitutes some sort of fair use.

    1. Anandakos

      It's hard to believe that a "fair use" would be to create jillions of competing "works" which incorporate by extension the learned experiences of the producers of the original works.

    2. Crissa

      If it's not possible to teach an LLM, it's not possible to use a word find tool on the text, either.

      That's the problem here.

      Using the trained data, though, to mimic someone would be basically fraud, I think. It implies approval that doesn't exist.

      1. aldoushickman

        "If it's not possible to teach an LLM, it's not possible to use a word find tool on the text, either."

        Maybe--that's certainly an argument I'd expect the LLM people to make, at least.

        But mechanical arguments about the process of using a work might not encompass everything relevant, such as the purpose of that use. After all, I may have a perfectly legal ability under, say, my license with Netflix to stream a movie in my house with a couple of friends, but I probably would get in copyright trouble if I used my Netflix account to stream a movie in a movie theater for 500 people (even if I claimed they were all my pals).

    3. timpmcdermott

      I seem to remember that sometime in the 90s, the supreme court ruled that reading something into a computer's memory was copying, and a violation of copyright.

      IANAL, and that was a long time ago, so I might be wrong. But I have a vivid memory of being incensed at the stupidity of deciding that copying something into volatile memory, and doing nothing with it, was a copyright violation.

  4. daddyj

    Let's say Kevin Drum is a famously popular blogger, who earns enough from ad revenue to be able to go out in the desert and experiment with astrophotography.

    I have no doubt an LLM could be trained to "Respond daily to today's events like Kevin Drum." And then, since it's so cheap, twenty more iterations of Drum(nn) could be spooled up.

    Why would anybody read your blog if the market is flooded with robot impersonators of you?

    Ditto with a first-time author who strikes it big. The future Stephen King must be content with his proceeds from Carrie, because there will be dozens, even hundreds of competitors writing the next Carrie.

    I feel like you are not really grokking the imperatives of capitalism here, Kevin.

    1. geordie

      Except that I read Kevin's opinions because they are reasoned and unpredictable. ChatGPT on the either hand can only take what it has seen before and try to predict what other people would say with a tad bit of randomness thrown in. On the surface those appear to be quite similar things but they aren't. Fiction writers on the other hand I think you are right will have an enormous problem at some point (right now the context window is too small). That being said the publishing industry has always been radically unjust. Stephen King is a very good writer but he sells books because people know that his brand is well told stories that are interesting. There are already millions of mediocre writers publishing kindle books. They only barely compete with King.

  5. royko

    We're kind of in a new are where things don't exactly fit our familiar paradigms. Or maybe they do. It depends on your point of view. Which is where things get messy.

    When I read a bunch of books and then go write a book, obviously all the books I've read affect my language and how I write books. I've learned from them, and I'm applying that knowledge. But it's one piece of an entire organic language process that incorporates a lot of factors -- all my conversations, my experiences, my thoughts. There's a lot of things being included into the mix that determine what my writing style is. We consider all of that to belong to me, because I lived it. I can't just copy a book, and I can't create a rip off of a book (as subjectively defined by what's derivative.) But I can read a bunch of books and create a book in that style, and that's OK because a) that's just how humans operate, b) I'm always going to introduce something unique to the mix, and c) I'm a human, so I can only read so much and write so much.

    When we get to AI, it's messier. What they create is purely based on a statistical algorithm of what they've been fed. There are no ideas, no incorporated experiences, nothing else brought to the mix. If you did it with one book as input, it would obviously violate copyright. But our copyright doesn't have any sort of mechanism to handle statistically smooshing together tens or hundreds of thousands of books together, not because it isn't a problem but because humans can't do that. What is the AI bringing to the table? Is the statistical formula enough for the work to be considered "original"? All of the richness that you get as a result of the text you've been fed. And AI is capable of reading everything and cranking out enough output to put every human writer out of a job.

    On the one hand, maybe you feel what the statistical algorithm is doing really isn't any different from what we do when we think, it's just a mechanical form of thinking, so it should be treated just like human writing. On the other hand, a work based exclusively on other works, with nothing else added, would be considered derivative under our (social, not legal) understanding of the term, and the fact that creating a derivative work from 10,000 books instead of 1 or 2 isn't covered by law is a bug, not a feature. It depends on your point of view.

    There is a danger to AI destroying the human profession of writing (aside from the unemployed writers): with nothing human to train on, the AI output can't evolve. It needs human input to the mix otherwise it becomes a feedback loop that will degrade over time. You can't train this current form of AI on AI text. So, for the same reason we have copyright to protect authors, we probably need protections against AI assimilation that also protect those authors.

    1. Joseph Harbin

      "I'm always going to introduce something unique to the mix."

      I think the value of AI will be to streamline/automate what is not unique. What is unique is the value that humans bring. If a work has nothing unique to offer, why involve a human writer? For a creative work (i.e., one that includes non-unique elements), a human could write it entirely. But if the human writer can outsource the non-unique elements to an AI program, that should allow the work to be completed more quickly, or provide the writer more time to add other unique/creative elements. AI is a tool, and ultimately should allow for more creativity, not less.

      As a writer (fiction, mostly "yet to be published"), I don't find AI to be a threat. I agree with Stephen King in the L.A. Times article.

      I have years of experience making crossword puzzles. There are computer programs (not AI) that are useful tools for constructors. I think the path toward more automated xword construction may be like the path toward using AI for writing in general (fiction and nonfiction may have different arcs, however).

      Before the 1990s, almost all puzzles were constructed by hand. The basic tools included pencils with erasers, graph paper, and references (e.g., dictionaries). Printed word lists were used (some specialized; e.g., all 8-letter words ending -OUS). Puzzle construction took a lot of time. In the '90s, computer programs started becoming more popular. Now, with perhaps a few exceptions, all (quality) puzzles are constructed by humans using a program. What a program does is save time. A constructor can see dozens or hundreds of options every step of the way. But the decision on what words to use remains with the constructor. That's why better puzzles are made by better constructors, even if using the same tools. (A good puzzle-maker will customize the tools, as well.)

      There are other components to puzzle-making: concept, cluing, etc. The better puzzles generally have more "unique" / creative elements. It's the combination of tools and tool-users that make for the most notable creative works.

      I think this quote from decades ago holds:

      "Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination."

  6. Brett

    Merchant used to write a column for Gizmodo where he basically scare-mongered about whatever automation news there was that day. Not surprising to see him writing a column romanticizing Luddites - folks who benefitted from the previous wave of automation but now wanted to use violence to attack the next wave that didn't make them money.

    If I invented a machine that allowed for technicians to provide superior health care than doctors, and doctors responded by shooting up the clinics where that care was being provided, would Merchant sympathize with the doctors?

    1. Murc

      Depends. Were the doctors provided with remunerative employment elsewhere, or were they punted into the streets and told "better re-train, or learn to ask 'would you like fries with that.'"

      If its the latter, then my sympathies are absolutely with the doctors shooting up your devil machine that turned skilled work into poverty.

      1. Brett

        I would hope they would help them deal with the transition (and I would absolutely support the doctors if they were lobbying for that), but even without that I'd still say "support the new system that's better for those receiving the treatments" is the morally superior position.

        1. Murc

          "Robbing men of their livelihood" is almost never the morally superior position. You want to sell your devil machine, the labor it displaces gets provided for FIRST, not "maybe sometime, maybe never." Otherwise smashing it up is a hundred percent justified.

    2. kaleberg

      The Industrial Revolution caused living standards to fall dramatically. It wasn't until the violent Chartist movement started threatening the stability of England AND the King came in on the side of the workers and threatened to pack the House of Lords with labor reformers that living standards stopped falling in the 1830s. They didn't surpass pre-industrial levels until the 1850s. The Queen started honoring centenarians in the 1950s.

      In other words: (1) The Luddites were right. If the industrialists had been willing to share in the gains from the Industrial Revolution, it would have been a different matter. (1) Violence was necessary to spread the benefits of industrialization to more than a small coterie. Reform required an existential threat.

      As to your question about superior health care, a lot depends on who benefited. If it only benefited a small number of rich people and screwed the doctors, I'd say they should consider shooting up the machines or the hospital company managers. Democracy offers alternative approaches, but bosses have a long history of shooting up workers and then whining when they shoot back.

  7. Pittsburgh Mike

    Way back when, radio stations could broadcast music with minimal attention being paid to copyright. IIRC, you didn't have to pay a per-song royalty as a station, as long as you followed the constraints of the copyright law, which included things like not too many songs in a row from a given album, and other things that would make recording an album off the radio easy. On top of that, if you ever tried to record something off the radio, you know that the hit to quality was severe.

    But when Internet Radio came along, you could easily automate recording everything, and of course, the quality of an MP3 stream didn't degrade no matter how many times it was copied. The DMCA made allowances for that, and for on-demand streaming as well.

    Similarly, there's a qualitative difference between me reading some books and then writing about a topic, and a computer assimilating many many books, and then generating text about these books without tiring, forever. Some of that text might be so close to the original that it would be viewed as an infringement of copyright, but that would no doubt depend upon the query to the LLM.

    IANAL, but my guess is that LLMs don't violate copyright today, but I think that some limits need to be placed on what an LLM can do without paying some sort of royalty. An LLM's adjustment of its weights based on a book is closer to making a copy than me just reading the book.

    1. kaleberg

      Radio airplay was considered promotional, so the copyright holders would benefit as long as free play led to music sales. Streaming leads to very few music sales, so it makes sense to demand per-play payment. In fact, it should be much higher than it is now. (Weirdly, people still want to buy music. That's why vinyl has made something of a comeback.)

      P.S. I don't remember that too many songs restriction. There were FM stations that announced when they were going to play entire albums for people to tape. Taping, back then, required expertise and expensive gear, but the kind of person who would tape an album off the air was likely to play it for friends who were then more likely to buy their own copy.

  8. Doctor Jay

    My biggest problem with this post is this:

    The only way that bosses can use AI against writers is if AI, in fact, becomes so good and so powerful that it performs as well as humans.

    This is not accurate. AI can be used as a "replacement level" alternative, which gives it a credible threat against copywriters people with no names to speak of. It doesn't have to be as good, because it's much, much cheaper.

    Nobody thinks it's going to threaten Stephen King. But it could easily threaten technical writers for knockoff products.

    1. csherbak

      This has already begun in some niche (obscure?) ebooks on offer at Amazon. They are ostensibly 'written' by someone, but between obvious cut/paste stealing from other works, and not-so-scientific analysis that it's been ChatGPT'd they are cutting into the 'Stephen King' level (for the niche) authors.

      The (very) low margins of eBooks makes it lucrative to do and the niche market obscures the notion that the 'author' is unknown. People who know the author pool, or at the very least recognize the lineage of an author, are wise to the sham. But not so for the great unwashed. (Or washed if you don't like this niche. Occult/NewAge/Astrology/Tarot/Herbalism.)

      Is this a problem for the level of writers as Mr. Drum? Probably not. But if you are in a 'hack' niche with a lot of demand (i'm thinking romance of various stripes and newage (rhymes with sewage)) it's a biggish problem for you.

    2. aldoushickman

      There are also tones of "well, it's only the *bad* writers who are going to get replaced, and they probably deserve it because if they wanted to keep their jobs they could/should just be better writers" in the argument.

      Which is problemmatic for a few reasons (not least because it's meanspirited), at least one of which is that there is actually a pretty big market for garbage writing--just look at most tv and the bestsellers on the shelves at airports--*and it's not clear to me that automating some significant chunk of that presents a better world for consumers.

      I mean, the limiting factor of consumption of media generally isn't price, so shaving a few bucks off of the production costs of a novel because you replaced the author with an LLM doesn't seem to benefit anybody aside from the publisher's shareholders, about whose welfare I am not sure I particularly care about.
      ______________
      *yes, I am a snob.

  9. jeffreycmcmahon

    Yeah, this is one of Mr. Drum's more obtuse posts, it really seems like he's missing an important aspect of what's going on here that others have more clearly described.

  10. JimFive

    I think you're conflating two different issues.

    The issue in the SAG/AFTA writers strike is the concern that the writer will come up with an idea and then the studios will feed it into an AI to get a script without crediting and paying the writer.

    The other issue is the concern of established writers that their writing is being ingested by the LLM creators and then the model is spitting out effectively derivative works.

  11. Nominal

    "It doesn't prevent you from reading a book and reviewing it, even if you haven't purchased it. Likewise, it doesn't prevent either a human or a computer from ingesting a work in order to index it or summarize it or do research with it."

    Chat GPT (and every other LLM) isn't "indexing, summarizing, or researching" works. It is regurgitating portions of works. What it does is essentially copy SOOOOOOO MAAAAANY works that when it regurgitates it's difficult to know exactly where it comes from. That's possibly "transformative," but if you did that with movies or music it wouldn't be considered fair use, and I'm not sure why it should be for written works.

    1. Murc

      That's absolutely not how it works. Like straight-up it isn't, unless you're saying "when the algorithm uses any word, it is copy and pasting it from SOMEWHERE."

      You're also simply wrong about music. Sampling is a longstanding practice and perfectly legal.

      1. aldoushickman

        "Sampling is a longstanding practice and perfectly legal."

        No, this is generally only true if the sampler gets a license from the original copyrightholder. Which is how most all sampling in music works. You generally cannot sample somebody else's song into your derivative work and call it "fair use"--that's not what fair use is.

    2. Joseph Harbin

      @Nominal

      "Chat ... is regurgitating portions of works. ...when it regurgitates it's difficult to know exactly where it comes from. ...if you did that with movies or music it wouldn't be considered fair use..."

      It absolutely happens with movies and music all the time. That is exactly how movies and music are made.

      Sometimes the regurgitation is too on the nose, and those become notable court cases. But no good artist does anything without a wide range of influences and countless sources that the artist adapts into new works.

      I'd invite you to take a look at some of Kirby Ferguson's "Everything is a remix" series," which provides many examples of where creative works come from (spoiler: they come from copying/transforming/combining the creative works of others).

      https://www.youtube.com/@KirbyFerguson

      In a comment below, I also linked to one of his older videos, which is a good intro.

  12. KJK

    Just stumbled upon this sorting through the news:

    https://abcnews.go.com/Technology/wireStory/visual-artists-fight-back-ai-companies-repurposing-work-102824248

    Same issue as Kevin's article, and it is very troubling on a person level and for society in general. Not sure how one can claim a copyright infringement without showing a specific incident of such AI produced commercially used artwork that has infringed upon the original copyrighted artwork it was based upon. I don't know if it is a copy infringement, or a breach of a licensing agreement, to feed such copyrighted material into an AI.

  13. Murc

    That sure seems like a distinction without a difference to me. Merchant is saying that the Luddites didn't oppose new technology, they merely opposed other people using new technology in ways they disliked. OK.

    How is that not a meaningful distinction? This is in fact a very, VERY meaningful distinction.

    I mean, fuck. You want to talk the the age of industrialization? That was REPLETE with examples of "this technology would be pretty okay if it weren't being put to evil uses." Mechanical looms were pretty great... right up until mill owners decided to spin them so fast and so constantly they routinely killed children. Assembly-line production was pretty great... right up until it was used to smash to bits the bodies of many generations of men working twelve-hour shifts on starvation wages.

    We did not realize the benefits from these things until that technology was FORCED to serve men, and not the other way around.

    And yes, you know what? In our current economy, the combination of "you must work in order to eat" and "we will develop technology that will make the only thing you know how to do well obsolete" means people do in fact have a legitimate objection to the machines.

  14. Joseph Harbin

    "Picasso had a saying. He said, "Good artists copy, great artists steal." That was Steve Jobs in 1996 openly admitting the core Apple philosophy when it came to technology innovation. Apple didn't create things so much as take technologies that were already created and combine them in a new way. What did Apple do with what it "re-created"? "We patented it," Jobs said. (Later in his life, Jobs was singing a different tune about tech theft, saying he was "willing to go thermonuclear war" on Android because it was stealing from Apple.)

    Anyone with exalted ideas about creativity, originality, and what it is that artists (including writers) do ought to check out Kirby Ferguson's series "Everything is a remix" on YouTube. This short video is a good intro. and the link will get you to many others he's done over the years.

    https://www.youtube.com/watch?v=zd-dqUuvLk4

    His thesis is that all creative works are made using the following three elements: Copy, Transform, Combine. Take existing works (book/story, movie, song, etc.), add creativity, put them together again, and you have a new creative work. That's the formula for all creative works including those you might call the most "original."

    What elevates the great artists is not that they're not using (or "stealing") the works of others, but that they are superior in transforming those works into something greater.

    Take Bob Dylan. The Kirby Ferguson video at the link compares Dylan's sources and Dylan's work. The theft is pretty obvious. White artists are often accused of cultural appropriation. Do they steal from Black artists? Of course, they do. Just as Black artists steal from white artists and from each other. It's creativity at work. Copy, Transform, Combine.

    In the literary world, there was probably no greater thief than William Shakespeare. He didn't plot his dramas (for the most part). He "stole" the plots from existing works, ones that are now forgotten. It's Shakespeare's works we remember, not because they were "original," but because they possess the spark of genius he added in transforming them.

    Great artists use whatever is available, inc. creative works and technology. No doubt LLMs & AI will have an impact on writers, but I doubt the future will be shedding (many) tears for anyone put out of work. More likely, they'll be thankful for the artists who mastered the tools of their trade and created works that no one previously had thought possible.

  15. azumbrunn

    Why do we have such stringent coy rights enforcement these days? Because the tech and entertainment lobbied government to tighten the crews on copyright in order to raise their profits.

    I remember reading about a case: Somebody wrote a novel with the story of "Gone with the wind" but from a Black perspective and was sued by the owners of the copyrights. In my opinion this would be fair use but if I don't misremember the course found otherwise.

    This is really quite similar to the training of a chatbot using novels or articles etc. I see a poetic justice in the fact that the tech industry's own lobbying has now come to bite them in the ass.

  16. pjcamp1905

    Please.

    Copyright prohibits the use of an ENTIRE book. Reviewers can't do that and neither can ChatGPT. Since ChatGPT is not AI but only a mixmaster, it is quoting other people's work without attribution. That also puts it beyond fair use.

    The law is not crafted to deal with this sort of situation but until it is or a court weighs in, I lean toward the authors' position.

  17. kaleberg

    The writers' complaint isn't so much about copyright as about contract terms. When, let's say, Stephen King writes a creepy novel, he gives his publisher the right to print copies of what he has written. If the publisher wants another creepy novel in that style, he can negotiate with Stephen King or find another writer who can write in that style. If the original Stephen King novel is successful, odds are a many publishers will be looking for such writers or trying to contract with Stephen King.

    When someone purchases a copy of a literary work, they get the right to use it to enhance their understanding of language, story telling and creepy horror lore. After such influence, the reader is free to write his or her own works as long as they are different enough to be considered novel. (Some publishers will tolerate fan fiction, particularly if it is distributed freely, but others won't.)

    Now, the publishers want more than just the right to distribute copies of the original work. They want to analyze the style and produce an arbitrary number of derivative works in that style. The publisher wants to buy a completed work AND to be able to use it as a template for future works. That is potentially much more valuable than mere reproduction rights, and authors should be paid more for it.

    If someone asks you to autograph some memorabilia, that doesn't give them the right to use your signature on a contract or check. It doesn't give them the right to make copies of the signed item and resell them. Rights are often partitioned like this. Technically, a translation of a work is a creative act. Odds are most of the words will be different in the target language. Despite this, the translation rights for a work are sold separately, and the original author is paid for that right.

    Suppose a video game company really likes Stephen King's new novel and wants to use it as the basis for a video game. If they simply hired a team of writers to come up with something suitably creepy, suspenseful and so on, they could be in the clear assuming the new work is distinctive enough. If they want to use the same settings, the same characters and many similar story elements, they would have to secure the rights. They'd have to pay even more if they wanted to put the Stephen King name on it.

    Just because a right is new, it doesn't mean it is free for the taking. Naturally, the people with economic and political power would rather not pay for their new right, but that doesn't mean they should be allowed to simply appropriate it.

    1. D_Ohrk_E1

      That's about right.

      But there are lots of writers (and artists) upset that their works are being absorbed by AI.

      At some point, Congress and SCOTUS will end up on a derivative works doctrine on what is fair.

  18. kaleberg

    For another familiar example, consider clip art. Most collections allow you to use it in works intended for reproduction except for other clip art collections.

  19. cooner

    "The only way that bosses can use AI against writers is if AI, in fact, becomes so good and so powerful that it performs as well as humans."

    Yeah I'm gonna call bulls--t on this. Those bosses are looking for the cheap way out, and if they perceive that AI output is "good enough," they'll happily fire their employees and let the AI crank out crap instead. (Now whether this crap is ultimately "good enough" that they don't lose their readers/audience/market/whatever … they don't think that far ahead, and by the time it becomes obvious, the damage to all their creative former workers has already been done.)

    For a long time now I've said that my biggest fear about AI isn't that it will become better than me or that it will take over humanity. It's that the capitalist overlords, driven on by industry hucksters and credulous boosters like Kevin, will overestimate AI's actual capabilities and decimate entire industries in their effort to hold onto more of their money.

Comments are closed.