AI Is Going to Break Open Source Licensing
Here’s what happened: a Python library called Chardet — originally licensed LGPL — was rewritten by an LLM and quietly relicensed MIT. The original authors weren’t consulted. It landed on the Linux kernel mailing list as a concern. Most people outside that community ignored it.
They shouldn’t have.
This isn’t an edge case. It’s a preview of a structural collapse that’s coming for open-source licensing, and the industry has almost no institutional machinery to deal with it.
What Actually Happened
Chardet is a character encoding detection library. Widely used, well understood, LGPL-licensed for a reason: LGPL lets you link against the library without copylefting your own code, but modifications to Chardet itself must be shared back. That distinction matters.
Someone fed the codebase to an LLM, generated a functionally equivalent rewrite, and published it under MIT. From a pure diff standpoint, the code is “new” — different variable names, different structure. From a legal standpoint, it’s a gray area at best and willful copyright circumvention at worst.
The intent of the LGPL license is clear. The letter of the law is genuinely unclear when an LLM is the intermediate step. That gap is the problem.
The LLM Laundromat
Think about what an LLM actually does when you ask it to rewrite a codebase: it ingests the copylefted source, produces a statistically derived transformation, and outputs something that looks structurally independent. The model itself was trained on that code — and millions of others like it.
Is the output a derivative work? Current copyright law doesn’t have a clean answer. Courts are still figuring out whether training on copylefted code constitutes infringement. Nobody has definitively ruled on whether LLM-generated rewrites constitute derivative works.
That ambiguity is now being exploited, whether intentionally or not. And the tooling makes it almost effortless. You can take a GPL library, paste it into a context window, say “rewrite this cleanly,” and get something that passes lint and tests. Relicense it MIT. Push to GitHub. Done.
This is the LLM laundromat. LGPL in, MIT out.
Why This Is a Real Problem
Open source licensing isn’t bureaucratic paperwork. It’s the social contract that makes the whole ecosystem function.
When you use GPL software, you agree to share your modifications. This is the mechanism by which open source compounds — improvements flow back to the commons. Copyleft licenses were specifically designed to prevent large actors from strip-mining community work without reciprocating.
Automated relicensing breaks that mechanism entirely. If any copylefted codebase can be trivially rewritten and relicensed, copyleft becomes unenforceable at scale. The legal overhead of chasing down thousands of LLM-laundered rewrites is prohibitive. The FSF and OSI don’t have that capacity. Neither do the individual maintainers who wrote the original code.
The result: copyleft dies in practice, even if it remains valid in theory. The commons gets strip-mined.
The Bigger Pattern
This isn’t just about licensing. It’s about the general failure mode where LLMs allow people to extract value from socially constructed systems without participating in the social contract those systems depend on.
Same pattern shows up with:
- LLMs trained on Stack Overflow, GitHub, documentation — producing answers that compete with the humans who generated that data without compensation or credit
- AI-generated pull requests that look like contributions but don’t reflect understanding of the codebase
- AI-summarized research papers that compete with the journals funding the peer review process
In each case, the LLM acts as a transformer that launders the social obligation embedded in the original work. You get the output without the relationship.
For open source specifically, this is existential. The reason people contribute to open source is complicated — reputation, necessity, ideology, craft. But a big part of the substrate is the expectation of reciprocity encoded in licenses. You can use this, but you have to give back. Remove that expectation and you change the incentive structure.
What The Industry Is Not Doing
The correct response to the Chardet situation would be: clear legal precedent establishing that LLM rewrites are derivative works when they demonstrably replicate the logic and structure of the original, rapid tooling to detect such rewrites, and enforcement mechanisms that don’t require individual maintainers to hire lawyers.
None of that is happening. The legal system moves slowly. OSI is focused on AI training set debates, not the derivative-work-via-rewrite question. GitHub and the major AI code generation companies have strong incentives to keep the question unresolved.
The open-source foundations are reactive, underfunded, and structurally unequipped for this. They were built for a world where infringement required a human to read code and copy it.
What Maintainers Should Do Right Now
If you maintain open-source code, especially under a copyleft license, you should:
Document your architectural decisions. The more your code encodes non-obvious design choices, domain knowledge, and explicit reasoning, the harder it is to argue an LLM-generated rewrite is truly independent. Comments aren’t just for humans anymore.
Dual-license strategically. If your project is commercially useful, consider adding a commercial license option. This doesn’t stop laundering but gives you legal standing to pursue it.
Embed identifiers. Some maintainers are starting to embed unique stylistic patterns, deliberately unusual test cases, or watermarking-adjacent techniques in their code. It’s imperfect, but it creates evidence trails.
Watch your dependents. Tools like deps.dev and libraries.io let you monitor who’s depending on your packages. If a suspiciously similar library with a more permissive license appears and starts getting downstream adoption, that’s worth investigating.
The License Layer Is Load-Bearing
Here’s the thing: the open-source ecosystem is genuinely one of humanity’s great collaborative achievements. Linux, Python, PostgreSQL, the web platform itself — these things exist because the license layer created a stable game-theoretic equilibrium where contribution made sense.
LLMs are introducing a new player that can free-ride at industrial scale. That player doesn’t have the reputation incentives that keep individual developers honest. It doesn’t respond to social pressure. And it doesn’t need to — the tool is just a tool.
The load-bearing layer is at risk. The Chardet incident is small. The pattern it represents is not.
If the copyleft mechanism breaks down at scale, the long-term consequence isn’t just lost revenue for some maintainers. It’s the systematic draining of the commons — more code produced by fewer people who can afford to not require reciprocity, and the slow collapse of the collaborative ecosystem that made modern software possible.
We need legal clarity, technical tooling, and institutional capacity to detect and challenge automated relicensing. We need it faster than the courts will move.
The industry is not taking this seriously. It should.