The Good, the Bad, and The Ugly of AI Generated code
Generation vs Completion, an Important Distinction
First and foremost, there is a massive difference between letting an AI write all your code and letting an AI help complete your code. In the former, you provide a prompt to ChatGPT or Le Chat and let it generate all the code for you. In the latter, you write your code and let Copilot or Codeium complete the rest for you. In this latter mode, you can think of it as being very similar to IntelliSense in Visual Studio Code, just with more power and flexibility. These are the extremes; in between there is a continuum that blends prompting and auto-completion to different degrees.
The Good
Copilot or Codeium completes your code as you type it. In about 40% of the cases, it is just what you need. Another 40% of the time, the code is not good enough, but with a little effort from yourself, you can make it so. The rest 20% of the time, throw the code away! This is still a massive saving of keystrokes for simple tasks. This is where the true power of AI is: generate your boilerplate code so you can concentrate on the important bits.
Codeium reports an underestimate of around 5.16 hours saved per month when using their AI. Not bad considering the total time a developer codes in a month.
I find reading English much easier than reading code, regardless of the language. If an AI can give me an accurate summary of what the code does, that is going to save me time and effort. CodeScene and Blackbox AI do just that. However, to get accurate code completion, I do often write comments detailing why and how the code does something. Either way, AI helps programmers document their code, which they are bad at: What’s with the aversion to documentation in the industry?
An AI can write some basic unit tests for a piece of code I have done. If I generate three such tests, I have greater test coverage than before and have saved time.
The Bad
Hallucination is a major problem for generative AI, and frequently occurs when an AI is asked to write code. For
example, I asked a popular AI to “write me a page with a contact email form that can be deployed using Nikola with a
reCAPTCHA protecting it” and it suggested to this command: nikola plugin_deps --install contact-form reCaptcha
.
However, the command plugin_deps
does not exist in Nikola v8.3.0. Neither the contact-form
nor the reCaptcha
plugins are listed in the list of supported Plugins for Nikola.
A more serious issue is ChatGPT Hallucinations Can Be Exploited to Distribute Malicious Code Packages. This is where attackers find a common hallucination and create a package loaded with malware of the same name. The hope is that prompt engineers will just run commands without researching them first.
Prompt engineers are supposed to be able to tweak prompts to get spectacular results. The literature shows, however, that human optimised prompts are not reliable in extracting better performance out of language models. Ironically, the best approaches involve using AI to optimise prompts, resulting in some very unintuitive outputs. In one case, a prompt optimised to improve an LLM’s maths performance looked more like a quote from Star Trek — AI Prompt Engineering Is Dead:
Command, we need you to plot a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this challenging situation.
Banning AI is unlikely to be successful. Enough developers will have access to external tools and they will use those. All it means is that you do not have the visibility of their use of it. A Stack Overflow Policy: Generative AI (e.g., ChatGPT) is banned is in place, yet there is no easy way to detect code written by an AI.
Finally, there is Devin, the AI engineer platform, which notoriously lied about its performance and capabilities. Therefore, it is critical that you and your team evaluate any AI tool before you use them in earnest.
Full adoption by companies seeking to replace all their senior developers with junior developers with AI will get into a large amount of trouble. AI has no context and therefore needs a human to validate its output. If that human is clueless, the output will be just as clueless.
The Ugly
Where is the data? You must understand whether the AI learns from your data or not. If it does, your only choice is to run the AI locally (or in a cloud you control). This does incur a cost, but can be beneficial since you can train it on your proprietary code in addition to the code it was trained on. If not, you risk leaking your code to the AI model and someone else could get to it.
Copyright of the generated code is an old problem, not an AI problem. If a developer copies code from Stack Overflow or the documentation of a module, they are adding copyrighted code to the code base. Every developer frequently does this. Licensed code is more problematic here. If the AI is trained on closed sourced data (which you do not own) or on licence that are too open (The GNU General Public License v3.0 for example), then you risk breaking the terms of those licences. Again, this is nothing new. This can go the other way too. Many generative AI models (particularly but not exclusively the free tiers) use your code and your response to generated code to refine their models, and this has resulted in some high-profile inadvertent Samsung Software Engineers Busted for Pasting Proprietary Code Into ChatGPT.
Security of the generated code is also an old problem, as the 2019 blog Preventing the Top Security Weaknesses Found in Stack Overflow Code Snippets shows. This is not AI-generated code, but it is still insecure code that has been copied and pasted into closed source production code. AI, being trained on existing code, can generate insecure code because some of the examples it was trained on are insecure, and the AI doesn’t have a deep semantic understanding that lets it reliably produce secure code.
The security problem is already mitigated by techniques such as paired programming and code reviews, as well as tools like Static application security testing (SAST) and Dynamic application security testing (DAST) Solutions. Securing AI-Generated Code and How SAST Tools Secure AI-generated Code delve into the matter some more.
Performance of the generated code is always going to be tricky to measure. AI has no idea of context and specifics. Therefore, any code it generates is as generic and common as it can make it. To optimise code, you need context, and AI has no context whatsoever.
Finally, the SWE-bench: Can Language Models Resolve Real-World GitHub Issues? states, from the abstract:
To this end, we introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories […] The best-performing model, SWE-agent + GPT4, can solve a mere 12.47% of the issues.
This is not really encouraging.
So many tools… There is a proliferation of AI tools aimed at developers, DevOps, and DevSecOps. They all promise the moon and have data to show they can get you there. How do those interoperate? How do you choose one over the rest? Are there standards, so you can select one of many! It’s a wild west out there.
Conclusion
So, with everything said, it is very unlikely that AI will replace a programmer. But, a programmer using AI might well take the job of one that does not.
The only way to remain secure is to have a company-wide AI policy, pick the right AI tools to use, and keep to the best security practices for secure code development. AI is a powerful tool, but the benefits of its use may be outweighed by the pitfalls of its misuse. The way your business engages with AI is a critical factor, whether your technology team becomes advocates for AI software code generation or develops an adversarial relationship to it.
If you want help, we can offer guidance in all those aspects. Please do reach out at Contact Us | Firmamentum Consulting.
PS: This article was written before the DORA State of DevOps 2024 report came out. The report has a large section on AI which we will be reporting on in future posts.
PS: I was on interviewed as part of the Security Accelerator Podcast on just this. You can get the podcast form Amazon Music, Apple, and Spotify.