Is Agentic AI the Key to Automating Human Work?


I’ve been contemplating the concept of agentic AI for a while now. The idea that we can create specialised AI agents, each mastering a specific task, and have them work together iteratively struck me as a powerful way to really harness the power of AI to tackle complex problems. After all, that’s how teams of humans working together accomplish amazing outcomes, right?

I first encountered Auto-GPT over a year ago and was intrigued enough to join their Discord community. At that time, the concept of autonomous AI agents was still in its early stages, and Auto-GPT was pioneering an approach that allowed AI systems to autonomously generate and execute their own goals. This innovative idea resonated with me, reinforcing my belief in the transformative potential of specialised AI agents across various industries.

As I continued to explore this concept, Andrew Ng’s discussions on agentic AI further deepened my understanding, highlighting how these specialised agents could revolutionise the way we approach complex problems.

While the broad potential of agentic AI to automate and optimise a wide range of jobs is exciting, my focus naturally sharpens when I consider its application to software engineering. This focus ties back to my ongoing Master’s research, where I’m exploring the broader impact of AI on software engineering practices. In this context, agentic AI stands out as particularly powerful, offering the possibility to fundamentally change how we develop, test, and maintain software.

What is Agentic AI?

Agentic AI involves multiple specialised AI agents working collaboratively to achieve complex goals. Imagine a team of experts, each with a specific focus - one agent excels at writing code, another at testing, and yet another at conducting security reviews. These agents don’t operate in silos; instead, they communicate, iterate, and refine their outputs together, much like a well-coordinated human team.

Andrew Ng, a prominent AI thought leader, emphasizes in his talks, “What’s Next for AI Agentic Workflows” and “Andrew Ng on AI Agentic Workflows and Their Potential for Driving AI Progress”, that agentic workflows are inherently more iterative than current LLM-based agents, which typically operate on a zero-shot basis. This iterative nature allows agentic AI to achieve significantly better outcomes. For instance, while zero-shot GPT-3.5 achieved a 48% success rate on the HumanEval coding benchmark, and zero-shot GPT-4 improved to 67.6%, multi-agent AI systems like AutoDev and AgentCoder achieved significantly higher success rates, reaching 91.5% and 96.3%, respectively. This stark contrast highlights the potential of agentic AI to surpass the capabilities of traditional models.

Vision Agent: An Open Source Agentic AI from Landing AI

One of the practical implementations of agentic AI is the Vision Agent, an open-source project developed by Andrew Ng and his team at Landing AI. This agentic AI system is designed to specialise in computer vision tasks, enabling different AI agents to collaborate on complex visual processing tasks, such as object detection, image classification, and anomaly detection.

The Vision Agent framework allows for the modular combination of AI agents, each focusing on a specific aspect of the task. For example, one agent might specialise in edge detection, while another focuses on object classification. By allowing these agents to work together, Vision Agent can deliver more accurate and robust results than a single AI model working alone. The framework’s open-source nature also makes it accessible for developers and researchers, encouraging collaboration and further innovation in the field of agentic AI.

MetaGPT: A Multi-Agent Framework

MetaGPT is discussed in the paper “Data Interpreter: An LLM Agent For Data Science” by Hong et al. This multi-agent collaborative framework is designed for software development, simulating the software development lifecycle by leveraging multiple AI agents, each specialising in a particular role, such as project management, code writing, and testing. This approach emulates a human software development team, where various specialists collaborate to complete a project effectively.

However, like many multi-agent systems, MetaGPT encounters challenges related to coordinating multiple agents and managing communication overhead. Despite these challenges, MetaGPT illustrates the potential of agentic AI to mirror and enhance human-like collaboration in software development.

AgentCoder: Multi-Agent Code Generation

AgentCoder, presented in the paper “AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation” by Huang et al., represents another significant advancement in agentic AI. This system is designed specifically for multi-agent code generation, incorporating effective testing and self-optimisation mechanisms. AgentCoder uses a team of AI agents to not only generate code but also to iteratively test and optimize it, ensuring higher quality and efficiency in the code produced.

In evaluations, AgentCoder demonstrated exceptional performance, achieving 95% success on the HumanEval benchmark by leveraging the strengths of multiple specialised agents working together. This success underscores the promise of agentic AI in enhancing the software development process, from code generation to testing and optimisation.

AutoDev: Automated AI-Driven Development

Microsoft’s AutoDev is another cutting-edge framework that exemplifies the potential of agentic AI. As described in the paper “AutoDev: Automated AI-Driven Development” by Tufano et al., AutoDev is designed to fully automate software engineering tasks. It enables AI agents to autonomously perform a wide range of operations within a codebase, including file editing, building, testing, and git operations, all within a secure, user-controlled environment. The framework’s effectiveness was demonstrated on the HumanEval dataset, where it achieved a Pass@1 score of 91.5% for code generation and 87.8% for test generation.

AutoDev goes beyond simple code suggestions, integrating deeply with the development environment to manage tasks that typically require significant human intervention. Its ability to handle complex, multi-step tasks autonomously marks a significant leap forward in the capabilities of AI-driven development tools.

Why Agentic AI Matters

The implications of agentic AI are profound, extending far beyond a single industry. If we can truly harness the potential of agentic AI, the automation of many jobs may be closer than we realise. These AI systems, capable of handling interconnected tasks across various domains, could lead to significant shifts in how work is structured and executed. By distributing tasks among specialised agents, entire workflows could be automated, potentially reducing the need for human intervention in areas that were once considered too complex for AI to manage.

In software engineering, this evolution shifts the focus from human-led coding to human-led prompting. As agentic AI systems take on more of the actual coding, testing, and reviewing, the role of engineers may pivot toward describing requirements in precise, effective ways to guide these AI agents. This change underscores the importance of developing strong skills in articulating and refining prompts - prompt engineering - as the ability to communicate effectively with AI will become increasingly critical in this new paradigm.

Looking Ahead

The journey toward fully realising the potential of agentic AI is just beginning. The advancement of frameworks like Auto-GPT, AutoDev, AgentCoder and MetaGPT represents a significant step forward in this direction. As we continue to develop and refine these technologies, we’re likely to see even more transformative impacts on the way we approach software engineering.