The widespread adoption of Generative AI has infiltrated nearly every business sector. While tools like transcription and content creation are readily accessible to all, AI’s transformative potential extends far deeper. Its influence on coding and software development raises profound questions about the future of mutliple industries.
Addressing how AI can be best adopted without hampering creativity or overstepping the line when it comes to copyright or licensing laws is one of the major challenges facing software developers today. For instance, the Intellectual Property Office (IPO), the Government body responsible for overseeing intellectual property rights in the UK, confirmed recently that it has been unable to facilitate an agreement for a voluntary code of practice which would govern the use of copyright works by AI developers.
The perfect match of AI and OS
Today, most AIs are being trained on open source (OSS) projects. This is because they can be accessed without the restrictions associated with proprietary software. This is something of a perfect match. It provides AI with an ideal training environment. The models are given access to a huge amount of standard code bases running in infrastructures around the world. At the same time, OS software is exposed to the acceleration and improvement that running with AI can provide.
Developers, too, are massively benefiting from AI. For example, they can ask questions, get answers and, whether it’s right or wrong, use AI as a basis to create something to work with. This major productivity gain is helping to refine coding at a rapid rate. Developers are also using it to solve mundane tasks quickly, get inspiration or source alternative examples on something they thought was a perfect solution.
Total certainty and transparency
However, it’s not all upside. The integration of AI into OSS has complicated licensing. General Public Licenses (GPL) are a series of widely used free software licences (there are others too), or copyleft, that guarantee end users four freedoms; to run, study, share, and modify the software. Under these licences, any modification of software needs to be released within the same software licence. If a code is licensed under GPL, any modification to it also needs to be GPL licensed.
There lies the issue. There must be total transparency with regard to how the software has been trained. Without it, it’s impossible to determine the appropriate licensing requirements or how to even licence it in the first place. This makes traceability paramount if copyright infringement and other legal complications are to be avoided. Additionally, there are ethical questions? For example, is a developer has taken a piece of code and modified it, is it still the same code?
So the pressing issue is this: What practical steps can developers take to safeguard themselves against the code they produce? Alspo what role can the rest of the software community – OSS platforms, regulators, enterprises and AI companies – play in helping them do that?
Here is where foundations come to offer guidance
Integrity and confidence in traceability matters more when it comes to OSS because everything is out in the open. A mistake or oversight in proprietary software might still happen. But, because it happens in a closed system, the chances of exposure are practically zero. Developers working in OSS are operating in full view of a community of millions. They need certainty with regard to a source code’s origin – is it a human, or is it AI?
There are foundations in place. Apache Software Foundation has a directive that says developers shouldn’t take source code done by AI. They can be assisted by AI but the code they contribute is the responsibility of the developer. If it turns out that there is a problem then it’s the developers issue to resolve. We have a similar protocol at Aiven. Our guidelines state that our developers can make use only of the pre-approved constrained Generative AI tools, but in any case, developers are responsible for the outputs and need to be scrutinised and analysed, and not simply taken as they are. This way we can ensure we are complying with the highest standards.
Beyond this, there are ways organisations using OSS can also play a role, taking steps to safeguard their own risks in the process. This includes the establishment of an internal AI Tactical Discovery team – a team set-up specifically to focus on the challenges and opportunities created by AI. We wrote more about this in a recent blog but, in this case it would involve a project specifically designed to critique OSS code bases, using tools like Software Composition Analysis to analyse the AI-generated codebase, comparing it against known open source repositories and vulnerability databases.
Creating a root of trust in AI
While it is happening, creating new licensing and laws around the role of AI in software development will take time. Not least because consensus is required when it comes to the specifics of its role and the terminology used to describe it. This is made more challenging because the speed of AI development and how it is being applied in code bases moves at a much quicker pace than those trying to put parameters in place to control it.
When it comes to assessing if AI has provided copied OSS code as part of its output, factors such as proper attribution, licence compatibility, and ensuring the availability of the corresponding open source code and modifications are absolutely necessary. It would also help if AI companies start adding traceability to their source code. This will create a root of trust that has the potential to unlock significant benefits in software development.
- Data & AI