Does AI really make you more productive?

Most developers are using AI for speed and productivity. But at what cost? Could AI be sacrificing software delivery metrics such as efficiency and stability?

Salma Alam-Naylor

March 28, 2025

In October 2024, the DORA research programme published the 2024 Accelerate State of DevOps Report, which for the first time, included a section on how AI adoption is affecting software development at an individual, team and product level. With the recent emergence of the new “ vibe coding” meta, a term introduced by Andrej Karpathy in February 2025, where AI is is enabling people to ship apps from idea to production in record time, I wanted to take the time to reflect on the report and discuss whether AI is really making us more productive as software developers.

Most developers are relying on AI

The report found that almost 76% of participants are “relying on” some form of AI tooling in their daily responsibilities as a software developer. This can include writing, optimising, documenting and debugging code, explaining unfamiliar code, writing tests, data analysis, and summarising information. “[D]evelopers who trust gen AI use it more”, but almost 40% of participants “reported having little or no trust in AI”.

I do not trust AI. I have over 11 years of professional industry experience, and have been making websites in some form or another for almost 30 years. With all this considered, AI code generation has only ever made me feel less productive. I prefer to understand every single line of code I ship in my applications: it makes everything easier to debug, fix, and extend. I have found that AI-generated code is often sloppy, unnecessarily complex, and a lot of the time, just plain wrong. For me, AI code generation is akin to mindlessly copy-pasting code snippets from Stack Overflow, and we all know how that goes. It usually takes me longer to understand AI generated code than write my own.

You may be sacrificing software delivery metrics by relying on AI

DORA’s software delivery metrics provide an effective way of measuring outcomes of software delivery processes. They are split into two categories: throughput and stability. The report found that “AI adoption is negatively impacting software delivery performance”, and the “negative impact on delivery stability is larger”.

Sacrificing throughput with AI

Throughput measures the velocity of changes in software that are being made, that is, how quickly and how frequently software teams can ship changes to production. Throughput is all about how efficient and responsive a team can be.

The report hypothesises that “the fundamental paradigm shift that AI has produced in terms of respondent productivity and code generation speed may have caused the field to forget one of DORA’s most basic principles — the importance of small batch sizes.” Since AI code generation will often spit out huge batches of code in one fell swoop, pull requests are getting larger. Larger pull requests and changes are much more difficult and time-consuming to review thoroughly for edge-cases and potential issues.

Speaking from experience, code reviewers are more likely to skim over large changes and miss important details; walls of impenetrable code are much more difficult to process and interpret by a real human brain than smaller changes, amidst an already busy work-day. Whilst combing through that +11456 -7892 code review, you’re probably thinking about all those lines of code you need to write yourself in order to stay “productive”.

Of course, there are tools that provide AI-assisted code reviews. But if we, as developers, do not trust AI to produce maintainable and reviewable code, why should we trust AI to review it? If you find yourself constantly reviewing large pull requests with a lot of AI generated code, you’re probably sacrificing throughput. You might not be shipping as fast.

Sacrificing stability with AI

Stability measures the quality of software changes delivered, and the team’s ability to fix bugs. Stability is measured using a combination of change fail percentage, which is the percentage of deployments to production that require hot-fixes or rollbacks, and failed deployment recovery time. A lower change fail percentage means the team has a reliable delivery process, and a lower recovery time indicates more resilient and responsive systems.

The report suggests that it’s “possible that we’re gaining speed through an over-reliance on AI for assisting in the process or trusting code generated by AI a bit too much.” I would argue that speed here is the illusion of speed, and the concept of “over-reliance” is key here, especially in the context of the new vibe coding meta. Vibe coding is about being dependent on AI code generation. Describe the app or code you want, and let an LLM take care of it for you. Ask it for a few edits. Ship it.

The report states that a developer’s productivity “is likely to increase by approximately 2.1% when an individual’s AI adoption is increased by 25%”. Vibe coding and any sort of AI adoption is attractive because it feels fast; it feels more productive. Now, I’m not proposing that professional software developers are going all-in on vibe coding, but increased adoption of AI and reported productivity increases poses a risk to software stability. We think we’re going faster, but we may be shipping poorer quality software and more broken changes, because we’re putting our trust in AI generated code too much.

Data from the industry is proving that vibe coders are shipping unstable applications, reporting outages and critical security vulnerabilities in their apps, losing months of work that didn’t use version control, and the anecdotal struggle to finish vibe-coded apps. If professional software development teams do not stay vigilant with their use of AI, they run the risk of sacrificing stability.

Plus, if you’re moving fast, but always having to pick up the pieces caused by over-zealous AI code generation, are you really being more productive?