Disclaimer: I’m not a lawyer; this article, therefore, just reflects my personal views on the new discussion about Microsoft’s role in the Open Source community.
I’m writing this from my first impressions of the discussion while waiting for my flight to depart. So, as an additional disclaimer: This might also be very vague, but I promise to post an update on the matter by the end of the week.
There was a lot of fuss around the blog post by Denver Gingerich and Bradley M. Kuhn for the software freedom conservancy. In this blog post, they accused Microsoft of basically abusing GitHub-hosted Open Source repositories, and hence, their author, by using the source code to train an AI model which powers a commercial product called GitHub CoPilot. CoPilot’s idea is to assist its users in writing code. In fact, impressive demos show that from a verbal description of the task, source code can be generated for almost any programming language.
According to the SFConsercancy, this would mean that Microsoft might have used copylefted code not only to train their models but also that users of CoPilot might end up using copylefted code in their projects by using CoPilot’s suggestions.
I understand how frustrating it is for Open Source contributors to see their projects being exploited by large corporations. The last case that got lots of attention in the community was about Elastic vs Amazon, which eventually led to a switch of licence for Elastic and Amazon building a fork of the last Elastic version released under a permissive license.
In the very first paragraph of the Open Source definition, it is emphasized that an open-source license “shall not restrict any party from selling[…] the software”. It can, in return, require that customizations must be covered by the same license (copyleft) and/or that making software available “as a Service” counts as distributing (this is a case the AGPL 3 has covered).
However, there should be no restriction on using Open Source software for training. Actually, what Microsoft does here, is the scalable version of someone learning from Open Source code and then starting a consulting business.
The article’s second accusation is that copylefted code might end up in proprietary software projects because team members would end up using code “copied” by CoPilot’s AI model.
While I am a strong advocate for intellectual property and Open Source licenses, I still see no point in a violation here. While authors should have the full power to cover their software as a whole, small snippets representing comparatively simple algorithms might not reach the required Threshold of originality to be protectable by copyright laws.
Primarily, as I have experienced it, GitHub CoPilot is used, as the name suggests, as a co-pilot for inconvenient glue code or well-known algorithms. It’s basically an AI automation of copying code from StackOverflow. So, I still cannot see the big affair here.
Again: I’m not a lawyer. But Matthew Butterick is – he does see a violation of licenses in his blog post on the topic. So, from a legal point of view, you should trust him more. I am reflecting on my perception of how I understand the licensing situation!
That being said, there might be other issues for Open Source projects using GitHub –
Git is a distributed source control system. Technically speaking, each team member of a software project, whether open source, or closed source, has a copy of the entire repository on their computer. Using GitHub as a “mirror” for these repositories and as a medium of exchange to which and from which developers can push or pull each other’s changes to the code base is therefore just a matter of convenience. Other than SourceForge, which mainly offered centralized SVN and CVS repositories (state of the art, back then), migrating away from GitHub’s source code control service would be trivial if things went south. It would – in most cases – even be possible to find a replacement even if GitHub suddenly disappeared.
However, GitHub – as a company – creates an ecosystem around source code management which create severe vendor lock-in. The vendor lock-in is so strong because GitHub uses different components to ensure it:
GitHub is a Social Network. Besides the technical peculiarities of source code control, GitHub is a central hub for developers to meet, discover and collaborate. This is probably the one thing that gives GitHub the monopolistic position it has among developers. It’s just the same network effect that we see with other Social Media platforms.
In theory, Social Media could well be designed in a decentralized way, too, if we cut the amenities of having a one-stop shop to meet everyone in the industry. The Usenet was one example of such a decentralized design.
However, every attempt to make any (general purpose) Social Media network decentralized has failed to attract a significant number of users to unfold network effects in such a setting. Probably the most well-known (not popular) example is Mastadon: It works, but only very few people use it.
Interestingly, the impediments coming with using a decentralized Social Media platform and the missing convenience of a centralized one-stop-shop even holds true for not only the “general public”, but also for the group of technically skilled people which are typically GitHub’s user base.
Leaving GitHub would therefore mean giving up on a marketing and community channel.
GitHub provides critical infrastructure as a service. Again, technically speaking, GitHub’s offering as a Git hosting company is trivial. Their automation features are not, however. More and more projects rely on GitHub Actions for their build and/or deployment processes.
Therefore, leaving GitHub would mean finding, configuring and running such tools. Sure, Jenkins and others exist, but for complex setups, it would not only be a pain in the neck to migrate all these actions, but it would also require probably more resources to run the necessary infrastructure.
GitHub is a project/community management tool. Wiki pages, an issue tracker, and GitHub pages are tempting for open source projects because these are tools to manage a project and a community professionally. And, thanks to GitHub, free of charge.
Leaving GitHub here would mean moving documentation, project management, and lessons learnt to another tool. Needless to say: This would be a non-trivial task. We saw the implications when Open Source projects – Django, for example – did it the other way around and went GitHub all-in.
That said, I am not trying to make the point that using GitHub is never to be replaced for Open Source projects. I’m not even trying to say that it shouldn’t.
But: So far, even when considering the recent discussion about GitHub Copilot, I think Microsoft is doing an excellent job for the Open Source community, not only, but most remarkably, by providing a good product – GitHub – to that community – so far. Microsoft is a profit-oriented company, so its strategy could change “in our sole discretion” at any time, probably even without prior notice. And those who do things with their computer a bit longer might remember that Microsoft, in particular, has certainly not ever been that saviour of the Open Source community it now wants to be perceived as.
We should value that as long as they do a good job for the community and probably use GitHub to connect. This is a good thing.
Strategically speaking, however, it is probably not the best solution – no matter if an Open Source project or a company – to rely on GitHub for each and every part (Social, Actions, Project Management) at the same time. It is appealing, but we had open-source projects, like Mailman, trac, or Jenkins. Suppose you do a side-by-side comparison with GitHub from today’s perspective. In that case, all of the aforementioned projects look significantly less appealing to most developers. Still, they served us very well for a long time, and they probably were on par with GitHub’s functions when they gradually added these features to their platform. However, these tools haven’t evolved ever since.
I would not accuse GitHub of abusing Open Source code for their CoPilot. In fact, Microsoft (at the moment!) does a good job of supporting the OSS community. Rather than leaving GitHub for that reason, I’d like to see more innovation in Open Source tools and community-owned platforms for community and project management.