@mods can you please split this off into a new topic on copilot? It’ll be nice to discuss it here on the platform and I’m pretty sure it’ll be brought up again somewhere else here later anyways.
if you believe that running Copilot is distributing (the individually licensed pieces of) software then yes.
Most licenses like MIT, BSD or GPL require some form of attribution and codex does not (and could never) give attribution.
I’m not convinced that Copilot is redistributing (copying) source code though.
You send copilot a prompt and it’ll respond with what it imagines to follow after the prompt. It’s autocompletion.
Copilots response will (ideally) not be copied from some code in its training dataset (overfitting), but instead it’ll be something new, generated from its ‘understanding’ of the issue.
That is, when Codex was trained, it learned to abstract code into concepts, that it can then apply to write code (in principle similar to a human!).
Therefore it’s impossible to know what exact pieces of code from the training data ‘inspired’ copilot to respond to your prompt in a certain way. Someone on Twitter described it cynically as ‘copyright laundering’ - idk about the copyright part, but it sure is a form of laundering.
For me personally this is the reason I’m fine with copilot. It learned to code - just like I did - and is therefore now able to solve problems. I didn’t violate any copyrights by reading blog posts, books, documentation or attending lectures; and in my opinion so didn’t copilot.
Now obviously the legal side of this could change - similar to how some countries have copyright exemptions to allow data mining, they could add rules to forbid machine learning. Or they could treat the outputs of an ML model as a derivative work of the training data. But they aren’t as of right now (and I don’t such a thing has been brought to courts yet?).
There’s another thing too: even if Copilot was copying verbatim and it was basically a code search engine: lots of countries have minimum thresholds under which it’s perfectly fine to just copy stuff without attribution. Basically, two lines of code from a project with 1000 lines are probably not ‘significant’. Even the help section on the GPL license home page says, that it’s not worth sticking a GPL header on a couple of lines.
And I think this is the crux: the spirit of the GPL was not to enforce copyright, but to undermine it. To proliferate free software that everyone can not only execute, but also read and modify. To prevent patent trolling, and to stop big companies from building monopolies that you cannot escape from.
Something like codex is something the GPL was never made to encourage or to deny, but let’s be real, the solution should not be to interpret copyright more strictly. Because the only ones profiting from that will not be programmers, but large IP holders such as Disney.
This is a good take by Felix Reda, a former member of the EU parliament for the pirates party: