A Whole New World


In the years I’ve been building software, I’ve lived through more than a few waves. My first taste of code was in the mid‑80s, typing BASIC into a Commodore 64 where you couldn’t even save your work to a hard drive. In the mid‑90s, scripting in mIRC and hand‑rolling simple HTML sites felt cutting edge. The early 2000s were all about desktop apps, then web apps that suddenly got a lot more dynamic - yet for a while, Flash was still the only way to refresh part of a page without the user hammering the browser’s refresh button. Then AJAX arrived and changed that.

After that came an explosion of tools and technologies. On the application side, we reached for caches, queues, NoSQL databases, and event streams to make distributed systems possible and keep them performing under load. On the delivery and infrastructure side, DevOps pipelines and automated static analysis tools helped us ship faster, automated testing gave us confidence in what we were releasing, and cloud and infrastructure‑as‑code let us scale in a far more programmatic way. Mobile in the 2010s brought a whole new set of tools and constraints with it - you had to really think about payloads, because pushing huge amounts of data over mobile networks just wasn’t a great idea.

These are the kinds of shifts we point to when we talk about why software engineering is a career where there’s always more to learn. New technologies arrive constantly, and if you want to build the right solutions with the tools available today, you have to keep up, experiment, and learn how to put those tools to work in meaningful ways.

What’s exciting now is that with Gen AI, we’ve added a whole new kind of component to that toolkit: the LLM. We can weave it into our systems in all sorts of ways - as a helper inside a feature, as the thing that orchestrates tools and workflows, or as the layer that sits in front of everything and talks to users. By its very nature, though, it’s non-deterministic and often unpredictable. That forces us to rethink how we design software end-to-end, from architecture and implementation through to testing, deployment and the way we run these systems in production.

From deterministic to non‑deterministic systems

In a recent conversation between Martin Fowler and Gergely Orosz, Martin puts his finger on this very point: the importance of the introduction of non-determinism into our systems. He leans on his wife’s world of structural engineering, where one must think in terms of tolerances and deliberately build extra capacity into a bridge or a building because materials like wood, concrete and steel all vary. You can never assume two pieces of timber will behave identically. Instead, you learn as much as you can about the materials and then design around that uncertainty. I think he’s right that we’ll need a similar mindset when we work with non‑deterministic AI components, understanding the “tolerances” of that uncertainty and resisting the temptation to skate too close to the edge, especially on the security side.

In my AI Native DevCon talk six months ago, Am I Still a Software Engineer If I Don’t Write the Code?, I shared a slide titled “New tools, new problems, new solutions” to illustrate some of the new problem spaces opening up for us as engineers. Basically, this post zooms in on just one of those boxes: designing non-deterministic systems.

New tools, new problems, new solutions

From my AI Native DevCon talk Am I Still a Software Engineer If I Don’t Write the Code? This post focuses on one of these areas: designing non-deterministic systems.

The rise of the AI application layer

Up until recently, most of my research has focused on what AI is doing to software engineering as a discipline: how it changes the process of building software, what it does to our day‑to‑day experience as engineers, and how it shifts where we spend our time when we have an AI assistant sitting beside us in the IDE. But I’m just as intrigued by what it means for the software itself. The kinds of systems we can now build. The architectures we reach for. The new constraints we run into and the new classes of problems we have to solve when a non‑deterministic component sits in the middle of everything.

That curiosity has led me to focus more on the AI application layer - the part of the stack where models, tools and products actually meet real users. And there are strong signals that this focus is well‑placed. Andrew Ng, in a recent Batch editorial, pointed out that while huge amounts of money and attention are flowing into infrastructure and foundation models, the AI application layer is comparatively under‑invested. There’s a lot of value still to be created there, and that value will come from people who know how to design, build and operate these new kinds of systems.z

We’re already starting to attach more specific labels to those people - titles like AI Engineer, AI Application Developer, or AI Application Architect - folks who live closer to that layer. But I don’t think of that as a separate profession. Just as we once had to learn our way around caches, queues, mobile constraints and cloud tooling, this is simply the next set of tools and patterns we need to get fluent in. We’re still software engineers, and these are tools, patterns, and ways of thinking that we’ll be better off knowing, whether or not we ever put “AI” in our job titles.

This post is my working map of the skills and concepts I think matter for software engineers who want to build in this new LLM / agentic paradigm.

How the work shifts

Before diving into the detail, it helps to name the kinds of work that shift when you bring LLMs and agents into the mix. There is still solution design and architecture, but instead of just deciding where your service boundaries lie, you’re deciding what belongs in deterministic code vs a model, where to introduce retrieval or agents, and how to build in safety and human oversight from the start.

There is still engineering, but more and more it means stitching together models, tools, data stores, workflows and observability into something coherent and operable. A lot of the hard work now is in learning the new tooling and patterns well enough that you can keep systems explainable and debuggable, even when some core components are probabilistic.

And there is still validation, but it looks very different from traditional unit and integration testing. You need new ways to evaluate behaviour over time, catch regressions when models or prompts change, and decide what “good enough” means for systems that will never be perfectly predictable.

AI Engineering Competency Map

What follows is a set of skill areas and capabilities you can explore if you want to get serious about building systems with LLMs and agents at their core. This is simply my current view, shaped by what I’m reading, what I’m building, and what I’m seeing across the industry, not a set of hard rules or a checklist to complete. It’s deliberately broad, not exhaustive, and almost certain to evolve as the tools, patterns and best practices do.

1. Models, Providers & Core Stack

Understanding which models exist, what they can do, and the core stack used to work with them.

1.1 Model providers

Knowing the major commercial and open model providers and how to integrate their APIs.

1.2 Model capabilities & selection

Choosing appropriate models based on capability, risk and constraints.

1.3 Core implementation stack

Using programming languages and runtimes suitable for AI-enabled backends and agents.

2. Knowledge Preparation & Retrieval (RAG)

Preparing data and retrieving it so agents can ground their answers in real information.

2.1 Knowledge preprocessing

Transforming messy input into clean, LLM-ready text.

2.2 Chunking strategies

Breaking documents into useful pieces for retrieval.

Representing text as vectors and searching semantically.

2.4 Hybrid retrieval & reranking

Combining different retrieval techniques to get better results.

2.5 Knowledge graphs & structured stores

Using structured knowledge to support reasoning and answering.

2.6 Common vector and search backends

Using production-ready services for retrieval.

3. Context & Conversation Management

Deciding what the model sees, how it sees it, and how to cope with context limits.

3.1 Context engineering

Designing prompts and context to give the model what it needs and nothing it does not.

3.2 Context window architecture

Managing the limited context window as a resource.

3.3 Compaction and summarisation

Compressing history while preserving what matters.

3.4 Structured outputs & schemas

Ensuring outputs are machine-friendly and predictable.

4. Agent Reasoning & Orchestration Patterns

How agents think, break down work and orchestrate multiple steps or tools.

4.1 Prompt chaining

Breaking complex tasks into explicit, ordered LLM calls.

4.2 Routing

Selecting the right model, tool, or agent for a given request.

4.3 Parallelisation

Running independent tasks concurrently to improve throughput.

4.4 Planning & goal management

Creating and adjusting plans to meet explicit goals.

4.5 Goal setting & monitoring

Defining success criteria and checking whether they are met.

4.6 Advanced reasoning techniques

Using structured reasoning styles to improve accuracy.

5. Tools, Skills & External Systems

Connecting agents to external capabilities and designing those capabilities well.

5.1 Tool and function calling

Letting the model invoke deterministic operations.

5.2 Tool ecosystems and MCP

Organising tools into discoverable, reusable ecosystems.

5.3 Enterprise and SaaS integration

Connecting agents to real systems to actually do work.

6. Multi-Agent Systems & Inter-Agent Communication

Using multiple specialised agents that collaborate over well-defined protocols.

6.1 Role-based multi-agent design

Assigning clear responsibilities to different agents.

6.2 Collaboration patterns

Structuring how multiple agents work together.

6.3 Inter-agent communication standards

Using standard protocols so agents from different frameworks can talk.

6.4 A2A-style discovery & interaction

Finding and calling remote agents reliably.

7. Memory & Learning

Giving agents continuity over time and allowing them to improve.

7.1 Short-term memory

Tracking state within a session or workflow.

7.2 Long-term memory

Persisting information across sessions and tasks.

7.3 Learning and adaptation

Letting systems improve from feedback and data.

8. Safety, Robustness & Human Partnership

Keeping systems safe, resilient and aligned with people.

8.1 Guardrails and content safety

Preventing harmful, non-compliant or out-of-scope behaviour.

8.2 Exception handling & recovery

Dealing gracefully with errors and degraded conditions.

8.3 Human-in-the-loop collaboration

Designing for human oversight and joint work.

8.4 Security & access control

Keeping data and capabilities properly protected.

9. Resource & Priority Management

Using time, money and compute wisely while choosing what to do first.

9.1 Resource-aware optimisation

Balancing quality against time and cost.

9.2 Task and goal prioritisation

Deciding which task or goal the agent should work on next.

10. Evaluation, Monitoring & Operations

Making sure systems work, stay healthy and improve over time.

10.1 Evaluation and metrics

Measuring whether the system is actually good.

10.2 Monitoring & observability

Watching live systems and catching problems early.

10.3 LLMOps / AgentOps

Running AI systems as first-class production services.

11. Frameworks, Platforms & Tooling

Using the ecosystems that make all of the above practical.

11.1 Orchestration and agent frameworks

Building complex workflows without reinventing the wheel.

11.2 Cloud AI platforms

Using managed services for models and agents.

11.3 Evaluation & monitoring tools

Leveraging specialised tools for analysing behaviour.

11.4 Document and data tooling

Supporting ingestion and preprocessing at scale.

Where this leaves us

This list is intentionally dense and a little overwhelming, because the space itself is. You don’t need to become an expert in every item in this list; getting comfortable with even a handful of them will open up new kinds of systems to design and new questions to wrestle with, in a space where many of the patterns and “best practices” are still being written. In a world where more and more of our stacks include components that are, by design, non-deterministic, we have a rare opportunity: to get curious early, experiment while things are still fluid, and help redefine what good engineering looks like - especially around inference, orchestration and agents - bringing the discipline, judgement and curiosity that make these systems something people can trust.