Make illegally trained LLMs public domain as punishment
Make illegally trained LLMs public domain as punishment

Make illegally trained LLMs public domain as punishment

It's all made from our data, anyway, so it should be ours to use as we want
Make illegally trained LLMs public domain as punishment
Make illegally trained LLMs public domain as punishment
It's all made from our data, anyway, so it should be ours to use as we want
So banks will be public domain when they're bailed out with taxpayer funds, too, right?
At the same time, if a bank goes under, that means they owe more than they own, so "ownership" of that entity is basically worthless. In those cases, a bailout of the customers does nothing for the owners, because the owners still get wiped out.
The GM bailout in 2009 also involved wiping out all the shareholders, the government taking ownership of the new company, and the government spinning off the newly issued stock.
AIG required the company basically issue new stock to dilute owners down to 20% of the company, while the government owned the other 80%, and the government made a big profit when they exited that transaction and sold the stock off to the public.
So it's not super unusual. Government can take ownership of companies as a condition of a bailout. What we generally don't necessarily want is the government owning a company long term, because there's some conflict of interest between its role as regulator and its interest as a shareholder.
Public domain wouldn't be the right term for banks being publicly owned. At least for the normal usage of Public Domain in copyright. You can copy text and data, you can't copy a company with unique customers and physical property.
Oh good point. I'm not actually sure what the phrase would be.. Publicly owned?
Just FYI of the bank bailouts in the US, the banks paid back the bailout plus interest back to the government. Meaning the govt actually made a profit off the bailout. There’s a lot of things wrong with both banks and the govt, but generally this is not one of them. https://www.propublica.org/article/the-bailout-was-11-years-ago-were-still-tracking-every-penny
Super interesting, learned something new today. Thanks!
I mean, that sometimes did happen.
Germany propped up the Commerzbank after 2007 by essentially buying a large part of it, and managed to sell several tranches with a healthy profit.
Same is true for Lufthansa during COVID.
Banks are redundant, so is the stock market. These institutions do not need to, and should not be private. They are level playing fields in the economy, not participants trying to tilt the board for taking over the game.
No, "the banks" wouldn't be what the AI would be trained on, it would be the private info of individuals the banks do business with.
A similar argument can be made about nationalizing corporations which break various laws, betray public trust, etc etc.
I'm not commenting on the virtues of such an approach, but I think it is fair to say that it is unrealistic, especially for countries like the US which fetishize profit at any cost.
Yes, mining companies should all be nationalised for digging up the country's ground and putting carbon in the country's air.
We essentially do have the death penalty for corporations, it's called being declared a criminal organisation.
You must be fun at parties.
this comment doesn't make any sense
You must be new here.
I don't think it should be a "punishment." It should be done on principal.
Not sure making their LLMs public domain would really hurt their principal, their secret sauce is in the code around the model.
And yes, I do recognize that you meant "principle".
That's not true though. The models themselves are hella intensive to train. We already have open source programs to run LLMs at home, but they are limited to smaller open-weights models. Having a full ChatGPT model that can be run by any service provider or home server enthusiast would be a boon. It would certainly make my research more effective.
It's not punishment, LLM do not belong to them, they belong to all of humanity. Tear down the enclosing fences.
This is our common heritage, not OpenAI's private property
It doesn't matter anyway, we still need the big companies to bankroll AI. So it effectively does belong to them whatever we do.
Hopefully at some point people can get the processor requirements to something sane and AI development opens up to us all.
It could also contain non-public domain data, and you can't declare someone else's intellectual property as public domain just like that, otherwise a malicious actor could just train a model with a bunch of misappropriated data, get caught (intentionally or not) and then force all that data into public domain.
Laws are never simple.
Forcing a bunch of neural weights into the public domain doesn't make the data they were trained on also public domain, in fact it doesn't even reveal what they were trained on.
So what you're saying is that there's no way to make it legal and it simply needs to be deleted entirely.
I agree.
There's no need to "make it legal", things are legal by default until a law is passed to make them illegal. Or a court precedent is set that establishes that an existing law applies to the new thing under discussion.
Training an AI doesn't involve copying the training data, the AI model doesn't literally "contain" the stuff it's trained on. So it's not likely that existing copyright law makes it illegal to do without permission.
It wouldn't contain any public-domain data though. That's the thing with LLMs, once they're trained on data the data is gone and just added to the series of weights in the model somewhere. If it ingested something private like your tax data, it couldn't re-create your tax data on command, that data is now gone, but if it's seen enough private tax data it could give something that looked a lot like a tax return to someone with an untrained eye. But, a tax accountant would easily see flaws in it.
Right, like I did. They're safeguarding Disney and other places like that now. It's just the little guys who get screwed.
intellectual property doesn't really exist in most of the world. they don't give a shit about it in india, bangladesh, vietnam, china, the philippines, malaysia, singapore...
it's arbitrary law that is designed to protect corporations and it's generally unenforceable.
it’s arbitrary law that is designed to protect corporations and it’s generally unenforceable.
It's arbitrary, but it was designed to protect individuals, but it has been morphed to protect corporations. If we reset the law back to the original copyright act of 1790 w/ a 14-year duration, it would go a long way toward removing power from corporations. I think we should take it a step further and perhaps make it 10 years, with an optional extension for another 10 years if you can show need (i.e. you're an indie dev and your game is finally making a splash after 8 years).
So true. IP only helps the corps and slows tech development. Contracts, ndas, and trade secrets are all you really need to keep your ideas safe. If you want your country to develop fast, get rid of any IP laws.
But they're not developing AI in those countries they're developing it mostly in the US. In the US copyright law is enforced.
There are many AI development happening in China. Doubao (from Bytedance, the same company behind TikTok), DeepSeek and Qwen are some examples of Chinese LLMs.
India only has openhathi, dhenu, bhashini, krutrim and like a dozen other LLM so I cannot see how you could think they aren't developing AI. This is a wildly wrong claim lol
Although I'm a firm believer that most AI models should be public domain or open source by default, the premise of "illegally trained LLMs" is flawed. Because there really is no assurance that LLMs currently in use are illegally trained to begin with. These things are still being argued in court, but the AI companies have a pretty good defense in the fact analyzing publicly viewable information is a pretty deep rooted freedom that provides a lot of positives to the world.
The idea of... well, ideas, being copyrightable, should shake the boots of anyone in this discussion. Especially since when the laws on the book around these kinds of things become active topic of change, they rarely shift in the direction of more freedom for the exact people we want to give it to. See: Copyright and Disney.
The underlying technology simply has more than enough good uses that banning it would simply cause it to flourish elsewhere that does not ban it, which means as usual that everyone but the multinational companies lose out. The same would happen with more strict copyright, as only the big companies have the means to build their own models with their own data. The general public is set up for a lose-lose to these companies as it currently stands. By requiring the models to be made available to the public do we ensure that the playing field doesn't tip further into their favor to the point AI technology only exists to benefit them.
If the model is built on the corpus of humanity, then humanity should benefit.
OpenAI hasn’t disclosed the datasets that ChatGPT is trained on, but in an older paper two databases are referenced; “Books1” and “Books2”. The first one contains roughly 63,000 titles and the latter around 294,000 titles.
These numbers are meaningless in isolation. However, the authors note that OpenAI must have used pirated resources, as legitimate databases with that many books don’t exist.
Should be easy to defend against, right-out trivial: OpenAI, just tell us what those Books1 and Books2 databases are. Where you got them from, the licensing contracts with publishers that you signed to give you access to such a gigantic library. No need to divulge details, just give us information that makes it believable that you licensed them.
...crickets. They pirated the lot of it otherwise they would already have gotten that case thrown out. It's US startup culture, plain and simple, "move fast and break laws", get lots of money, have lots of money enabling you to pay the best lawyers to abuse the shit out of the US court system.
For OpenAI, I really wouldn't be surprised if that happened to be the case, considering they still call themselves "OpenAI" despite being the most censored and closed source AI models on the market.
But my comment was more aimed at AI models in general. If you are assuming they indeed used non-publicly posted or gathered material, and did so directly themselves, they would indeed not have a defense to that. Unfortunately, if a second hand provided them the data, and did so under false pretenses, it would likely let them legally off the hook even if they had every ethical obligation to make sure it was publicly available. The second hand that provided it to them would be the one infringing.
If that assumption turns out to be a truth (Maybe through some kind of discovery in the trial), they should burn for that. Until then, even if it's a justified assumption, it's still an assumption, and most likely not true for most models, certainly not those trained recently.
the AI companies have a pretty good defense in the fact analyzing publicly viewable information is a pretty deep rooted freedom that provides a lot of positives to the world
They are not "analyzing" the data. They are feeding it into a regurgitating mechanism. There's a big difference. Their defense is only "good" because AI is being misrepresented and misunderstood.
I agree that we shouldn't strive for more strict copyright. We should fight for a much more liberal system. But as long as everyone else has to live by the current copyright laws, we should not let AI companies get away with what they're doing.
I've never really delved into the AI copyright debate before, so forgive my ignorance on the matter.
I don't understand how an AI reading a bunch of books and rearranging some of those words into a new story, is different to a human author reading a bunch of books and rearranging those words into a new story.
Most AI art I've seen has been... Unique, to say the least. To me, they tend to be different enough to the art they were trained in to not be a direct ripoff, so personally I don't see the issue.
Not to mention patent laws are bullshit.
There are law offices that exist specifically to fuck with people over patent and copyright law.
There's also cases where people use copyright and patent law to hold us back. I can't find the article but some religious jerk patented connecting a sex toy to a computer via USB. Thankfully someone got around this law with bluetooth and cell phones. Otherwise I imagine the camgirl and LDR market for toys would've been hit with products 10 years sooner.
They are not “analyzing” the data. They are feeding it into a regurgitating mechanism. There’s a big difference. Their defense is only “good” because AI is being misrepresented and misunderstood.
I really kind of hope you're kidding here. Because this has got to be the most roundabout way of saying they're analyzing the information. Just because you think it does so to regurgitate (which I have yet to see any good evidence for, at least for the larger models), does not change the definition of analyzing. And by doing so you are misrepresenting it and showing you might just have misunderstood it, which is ironic. And doing so does not help the cause of anyone who wishes to reduce the harm from AI, as you are literally giving ammo to people to point to and say you are being irrational about it.
Banning AI is out of the question. Even the EU accepts that and they tend to be pretty ban heavy, unlike the US.
But it's important that we have these discussions about how copyright applies to AI so that we can actually get an answer and move on, right now it's this legal quagmire that no one really wants to get involved in except the big companies. If a small group of university students want to build an AI right now they can't because of the legal nightmare that would be the Twilight zone of law that is acquiring training data.
AI is right-out unregulated in the EU unless and until you actually use it for something where it becomes relevant, then you've got at the lower end labelling requirements (If your customer service is an AI chat, say that it's an AI chat), up to heavy, heavy requirements when you use it for stuff like sifting through job applications. The burden of proof that the AI isn't e.g. racist is on you. Or, for that matter, using to reject health insurance claims, I think we saw some news lately out of the US what can happen when you do that.
OpenAI's copyright case isn't really good to make the legal situation any clearer: We already know that using pirated content to train stuff isn't legal because you're not looking at it legitimately. The case isn't about the "are computers allowed to learn from public sources just as humans are" question.
Imaginary property has always been a tricky concept, but the law always ends up just protecting the large corporations at the expense of the people who actually create things. I assume the end result here will be large corporations getting royalties from AI model usage or measures put in place to prevent generating content infringing on their imaginary properties and everyone else can get fucked.
It's like what happened with Spotify. The artists and the labels were unhappy with the copyright infringement of music happening with Napster, Limewire, Kazaa, etc. They wanted the music model to be the same "buy an album from a record store" model that they knew and had worked for decades. But, users liked digital music and not having to buy a whole album for just one song, etc.
Spotify's solution was easy: cut the record labels in. Let them invest and then any profits Spotify generated were shared with them. This made the record labels happy because they got money from their investment, even though their "buy an album" business model was now gone. It was ok for big artists because they had the power to negotiate with the labels and get something out of the deal. But, it absolutely screwed the small artists because now Spotify gives them essentially nothing.
I just hope that the law that nothing created by an LLM is copyrightable proves to be enough of a speed bump to slow things down.
"Given they were trained on our data, it makes sense that it should be public commons – that way we all benefit from the processing of our data"
I wonder how many people besides the author of this article are upset solely about the profit-from-copyright-infringement aspect of automated plagiarism and bullshit generation, and thus would be satisfied by the models being made more widely available.
The inherent plagiarism aspect of LLMs seems far more offensive to me than the copyright infringement, but both of those problems pale in comparison to the effects on humanity of masses of people relying on bullshit generators with outputs that are convincingly-plausible-yet-totally-wrong (and/or subtly wrong) far more often than anyone notices.
I liked the author's earlier very-unlikely-to-be-met-demand activism last year better:
...which at least yielded the amusingly misleading headline OpenAI ordered to delete ChatGPT over false death claims (it's technically true - a court didn't order it, but a guy who goes by the name "That One Privacy Guy" while blogging on linkedin did).
They're spitting out propaganda and misinformation mostly from what I can see. If anything, it should get a refund.
-Outside of coding / debugging tasks (and that's hit or miss)
I'd rather they were destroyed, but practically speaking that's impossible, and this sounds like the next best idea to me.
I used whisper to create subs of a video and in a section with instrumental relaxing music it filled on repeat with
La scuola del Dr. Paret è una tecnologia di ipnosi non verbale che si utilizza per risultati di un'ipnosi non verbale
Clearly stolen from this Dr paret YouTube channels where he's selling hypnosis lessons in Italian. Probably in one or multiple videos he had subs stating this over the same relaxing instrumental music that I used and the model assumed the sound corresponded to that text
They don't mean your data, silly. They don't give a fuck about that.
They mean other huge corporations data.
I want to have a personal llm that learns all my interests from my files and websites visited. I just want to ask it stuff that I don't have to remember.
I'm working on something along these lines for myself, I think of it like using AI as a filter to create a bubble of good Internet around me
I think that'd be ok, even with this proposal, as long as you don't sell that LLM for public use. It's fine it I draw a picture of Mickey Mouse in my notebook, but if I try to sell that picture I could get in legal trouble.
So basically Microsoft's Recall if it was actually good. I've wanted that for a long time https://lemmy.dbzer0.com/comment/12921637
Possibly but just not Microsoft anything ever.
The environmental cost of training is a bit of a meme. The details are spread around, but basically, Alibaba trained a GPT-4 level-ish model on a relatively small number of GPUs... probably on par with a steel mill running for a long time, a comparative drop in the bucket compared to industrial processes. OpenAI is extremely inefficient, probably because they don't have much pressure to optimize GPU usage.
Inference cost is more of a concern with crazy stuff like o3, but this could dramatically change if (hopefully when) bitnet models come to frutition.
Still, I 100% agree with this. Closed LLM weights should be public domain, as many good models already are.
Doesn't Open AI just have the same efficiency issue as computing in general due to hardware from older nodes?
What are bitnet models and what does that change in a nutshell?
What are bitnet models and what does that change in a nutshell?
Read the pitch here: https://github.com/ridgerchu/matmulfreellm
Basically, using ternary weights, all inference-time matrix multiplication can be replaced with much simpler matrix addition. This is theoretically more efficient on GPUs, and astronomically more efficient on dedicated hardware (as adders take up a fraction of the space as multipliers in silicon). This would be particularly fantastic for, say, local inference on smartphones or laptop ASICs.
The catch is no one has (publicly) risked a couple of million dollars to test it with a large model, as (so far) training it isn't more efficient than "regular" LLMs.
Doesn’t Open AI just have the same efficiency issue as computing in general due to hardware from older nodes?
No one really knows, because they're so closed and opaque!
But it appears that their models perform relatively poorly for thier "size." Qwen is nearly matching GPT-4 in some metrics, yet is probably an order of magnitude smaller, while Google/Claude and some Chinese models are also pulling ahead.
With current kWh/token it's 100x of a regular google search query. That's where the environmental meme came from. Also, Nvidia plans to manufacture enough chips to require global electricity production to increase by 20-30%.
Another clown dick article by someone who knows fuck all about ai
Calling something illegal in spite of or in absence of precedent is a time-honored tactic - though not a particularly persuasive one.
AI is just a plagiarism machine with thousands of copyrighted materials that "trained" it, which they paid nothing for.
Are you threatening me with a good time?
First of all, whether these LLMs are "illegally trained" is still a matter before the courts. When an LLM is trained it doesn't literally copy the training data, so it's unclear whether copyright is even relevant.
Secondly, I don't think that making these models "public domain" would have the negative effects that people angry about AI think it would. When a company is running a closed model internally, like ChatGPT for example, the model is never available for download in the first place. It doesn't matter if it's public domain or not because you can't get a copy of it. When a company releases an open-weight model for public use, on the other hand, they usually encumber them with some sort of license that makes them harder for competitors to monetize or build on. Making those public-domain would greatly increase their utility. It might make future releases less likely, but in the meantime it'll greatly enhance AI development.
The LLM does reproduce copyrighted data though.
Not 1:1, overfitted images still have considerable differences to their original. If you chose "reproduce" to make that point, that's why OP clarified it wasn't literally copying training data, as the actual data being in the model would be a different story. Because these models are (in simplified form) a bunch of really complex math that produces material, it's a mathematical inevitability that it produces copyrighted material, even for calculations that weren't created due to overfitting. Just like infinite monkeys on infinite typewriters will eventually reproduce every piece of copyrighted text.
But then I would point you to the camera on your phone. If you take a copyrighted picture with that, you're still infringing. But was the camera created with the intention to appropriate material captured by the lens? Which is why we don't blame the camera for that, we blame the person that used it for that purpose. AI users have an ethical obligation not to steer the AI towards generating infringing material.
How?
*it can produce data identical to data that has been copyrighted before
This is a terrible idea. Very easy to circumvent, doesn't actually help the training sources.
To speak of AI models being "made public domain" is to presuppose that the AI models in question are covered by some branch of intellectual property. Has it been established whether AI models (even those trained on properly licensed content) even are covered by some branch of intellectual property in any particular jurisdiction(s)? Or maybe by "public domain" the author means that they should be required to publish the weights and also that they shouldn't get any trade secret protections related to those weights?
Unlikely, I'd say, In EU jurisdictions copyright requires creative authorship, not "sweat of the brow" which is why by default databases aren't included, which is why they're have their own protection regime.
Quote, emphasis mine:
In the meaning of the European Union Directive 96/9/EC on the legal protection of databases,the term database refers to a collection of independent works, data or other materials, which have been arranged in a systematic or methodical way, and have been made individually accessible by electronic or other means. In the meaning of the Directive the data or materials:
- must not be linked, or must be capable of separation without losing their informative content;
- must be organised according to specific criteria, which means that only planned collections are covered;
- must be individually accessible – mere storage of data is not covered by the term database.
In AI models the organisation is inferred from the data, it's not planned into the database. The first bullet point is on less shaky, a summary an AI can make of a book can reasonably be regarded to be "informative content", nothing about db protections says that they have to store full works it could also be references, citations, etc.
Wouldnt that give people who is it for bad things easier access? It should be made illegal to create if they dont legally have access to that data
The "illegally trained LLMs" they're taking about are trained on copyrighted data that they didn't have permission to use, this isn't about LLMs that have been trained to do illegal things. OpenAI (chatgpt) is being sued because there is a lot of evidence that they used copyrighted content for training, like NY Times articles. OpenAI is so profitable that they'll probably see these lawsuits as a business expense and keep doing it. Most people won't sue anyway...
Only if they were trained on public material.
Correct
Doesn't seem like this helps out all the writers / artists that the LLM stole from.
Yes!
Nice one
So if I make a better car using customer feedback is the rights to the car really theirs because it was their opinions that went partially into the end product?
IP is a joke anyway. If you put information out into the world you don't own it. Sorry, you can't have it both ways. You can simultaneously support torrenting movies (I do, and I assume you do too), while also claiming you own your comments on the internet and no one can "pirate" them.
Sure, but saying the corpos can't privatize the output of their AI is consistent with that viewpoint.
I don't support torrenting movies
I really don't care about AI used on designs for generic products.
I mean, if we really are following the spirit of copyright, since no-one at open AI or other companies developed matrix and vector multiplication (operations existing in the public domain because Platonism is a thing).
Edit: oh my, I guess the consensus is that stealing the work of mathematicians is ok (or more, classifying our constructions as discoveries).
You can't patent math, though you can copyright a specific explanation of math concepts.
If Open AI (or any AI company) is including copyrighted works in their solution, that's a copyright violation and should be treated as such. But if they're merely using the information from a copyrighted work but not violating the copyright itself, they're fine.
That's rather the irony - mathematics takes a great deal of work and creativity. You can't copyright mathematical work; but, put a set of lines together and shade in the polygons created and suddenly it becomes copyrightable. Somehow one is a creative work whose author requires protection, and the other is volunteered for involuntary public service.
The reason mathematics cannot be copyrighted: because it's a "discovery", rather than a "creation" (very much a point of view, and far from irrefutable fact). In mathematics, one should be aware, that the concept and it's explanation (proof) are much the same thing.
All in all, the argument is either mathematical work should fall under copyright (an abhorrent idea), or copyright should be abolished as it rarely (if ever) does much good.
Simple operations like vector multiplication are not works for the purposes of copyright law. If you invented an entirely new form of math, complete with novel formulae, you could conceivably assert patent rights and/or copyright over it, especially if you published a textbook. It would be more relevant, however, to discuss complex algorithms, such as for data compression. Those can certainly be patented. And, when implemented as a computer program, can certainly be copyrighted.
But if you're just defining one simple operation, yeah, you're unlikely to be able to assert any rights over it.
Ehh no, you genuinely can't patent any form of mathematics.
Mathematics falls under "exists in nature" (if you are a Platonist) or "abstract ideas" (gets even clear thinking Constructivists). So they're excluded from parents and copyrights no matter how complex the system
Textbooks usually belong to the publisher (academics commonly have to pirate their own papers), so that's usually a bust.
You might be able to patent an algorithm associated with a branch of mathematics, but that's trickier than you think. Blank slate development can, and does, happen (see Compaq's reimplementation of IBM's bios). You're banking on it not being reversed engineer able (spoiler, don't take that bet if you've published your proofs!).
Your data is worthless. Only Linux type zealots (conspiracy theorists) harp on that. Ever copied a meme and shared it elsewhere?
Negative reputation troll.
Stay in your hugbox bro.
Not only that, but copyright applies to copying, not reading, which is what it’s doing.
It won't really do anything though. The model itself is whatever. The training tools, data and resulting generations of weights are where the meat is. Unless you can prove they are using unlicensed data from those three pieces, open sourcing it is kind of moot.
What we need is legislation to stop it from happening in perpetuity. Maybe just ONE civil case win to make them think twice about training on unlicensed data, but they'll drag that out for years until people go broke fighting, or stop giving a shit.
They pulled a very public and out in the open data heist and got away with it. Stopping it from continuously happening is the only way to win here.
Oh no, not the pubes! Get those curlies outta here!
Best correction ever. Fixed. ♥️
Legislation that prohibits publicly-viewable information from being analyzed without permission from the copyright holder would have some pretty dramatic and dire unintended consequences.
Not really. The same way you can't sell live and public performance music for profit and not get sued. Case law right there, and the fact it's performance vs publicly published doesn't matter. How the owner and originator classifies or licenses it is the defining classification. It's going to be years before anyone sees this get a ruling in court though.
It's already illegal in some form. Via piracy of the works and regurgitating protected data.
The issue is mega Corp with many rich investors vs everyone else. If this were some university student their life would probably be ruined like with what happened to Aaron Swartz.
The US justice system is different for different people.
If we can't train on unlicensed data, there is no open-source scene. Even worse, AI stays but it becomes a monopoly in the hands of the few who can pay for the data.
Most of that data is owned and aggregated by entities such as record labels, Hollywood, Instagram, reddit, Getty, etc.
The field would still remain hyper competitive for artists and other trades that are affected by AI. It would only cause all the new AI based tools to be behind expensive censored subscription models owned by either Microsoft or Google.
I think forcing all models trained on unlicensed data to be open source is a great idea but actually rooting for civil lawsuits which essentially entail a huge broadening of copyright laws is simply foolhardy imo.
Unlicensed from the POV of the trainer, meaning they didn't contact or license content from someone who didn't approve. If it's posted under Creative Commons, that's fine. If it's otherwise posted that it's not open in any other way and not for corporate use, then they need to contact the owner and license it.
But wouldn't that mean making it open source, then it not functioning properly without the data while open, would prove that it is using a huge amount of unlicensed data?
Probably not "burden of proof in a court of law" prove though.
Making it open source doesn't change how it works. It doesn't need the data after it's been trained. Most of these AIs are just figuring out patterns to look for in the new data it comes across.
in civil matters, the burden of proof is actually usually just preponderance of evidence and not beyond a reasonable doubt. in other words to win a lawsuit, you only need to have more compelling evidence than the other person.
Just a little note about the word "model", in the article it's used in a way that actually includes the weights, and I think this is the usual way of using it! If you change the weights, you get a different model, though the two models will have the same structure.
Anyway, you make good points!