Academy2. Application Architectures

Module 2: Application & Agent Architectures

Designing an LLM application is not just about picking a model—it’s about picking (and often combining) the right architecture pattern along a spectrum that runs from a single LLM call to fully-autonomous multi-agent swarms.

This choice introduces a trade-off between predictability and agency: the higher you climb, the more freedom your system gains, but the harder it becomes to anticipate or constrain every step.

Tradeoffs

Below is a distilled map, guidance on when to stop at a workflow vs. when to move to an agent, and concrete patterns you can apply in Langfuse-instrumented projects.


The Architecture Ladder

Rule of thumb – only go as far to the right as you need:

  • Workflows (R0-R4) shine when you value predictability, testability, low latency, and tight context control.
  • Agents (R5-R6) shine when the path is unknown a-priori, tooling decisions are dynamic, or the user expects open-ended autonomy.

Agent Loop

The diagram above zooms in on how an agent-environment loop is executed at runtime. Notice how each iteration cycles through three checkpoints:

  1. Action – the agent (LLM call) decides what to do next.
  2. Environment – the real or simulated world responds.
  3. Feedback / Stop – the system evaluates whether to continue, hand control back to a human, or terminate.

This micro-loop is the operational core of every architecture at the top of the ladder (R5-R6). It is also where the predictability-vs-agency trade-off materialises in practice: with more agency you allow the loop to run longer and mutate state in unforeseen ways, which demands stronger feedback and stop conditions to stay reliable. In Langfuse each pass through this loop becomes a traced span, giving you the visibility to debug, evaluate, and put guardrails around autonomous behaviour.

With this framing in mind, let’s zoom in on a set of canonical patterns you can lift straight into your own projects. Each pattern is a ready-made blueprint that balances the trade-offs outlined above in a slightly different way — pick the one that maps best to your constraints on cost, latency, and reliability.


Canonical Patterns

A good way to reason about these architectures is to treat them as reusable patterns – standard blueprints that describe how LLM calls, tool invocations, and memory slots can be composed to solve a recurring class of problems. Patterns provide a shared vocabulary, speed up design by letting you reuse proven approaches, and make the trade-offs (cost, latency, predictability) explicit when moving from a rigid workflow to a more agentic solution.

PatternTypical Use-CaseKey ProsKey Cons
Prompt ChainingDeterministic multi-step doc generationEasy to debugRigid, brittle when input drifts
Routing / HandoffTier-1 support → specialised promptsCheap requests go to smaller modelsMis-routing tanks quality
ParallelisationMap-reduce summarisation, guardrailsReduces latencyCost × N, aggregation complexity
Evaluator–Optimizer”Draft → critique → revise” loopsBuilds quality offline or onlineAdds tokens & delay
Orchestrator–WorkersRetrieval + synthesis workflowsClear separation of concernsNeeds robust state passing
Tool-Calling ReActOne-shot Q&A with calculator / webSimple mental modelParsing / hallucination risk
Planning AgentMulti-file code-refactor, researchDeeper reasoningPlanning errors snowball
ReflectionSelf-consistency, safety checksCuts hallucinationsExtra calls and $$
Memory-AugmentedLong customer sessionsPersonalised UXMemory staleness / cost
Multi-Agent SwarmBrainstorming, negotiation simsDiverse reasoningHardest to debug

(Source: Phil Schmid)

Selecting the Right Approach

Choosing between a lightweight workflow and a fully-fledged agent is rarely a one-off decision. Instead, think of it as an iterative search: start simple, measure, and only add complexity when the data tells you the current architecture has topped out.

  1. Define “good” first. Accuracy? Cost? Latency? Trust?
  2. Prototype as R1 (single call). Measure offline with Langfuse datasets.
  3. When metric plateaus, move to R2 → R3.
  4. Adopt agents only if the task cannot be expressed as a bounded graph.

Langfuse provides the tracing you need to see that context. Every node/tool invocation you build becomes a traced span that you can later debug, evaluate, and cost-optimise.

📚

Further reading:

  • Building Effective Agents, blog post, by Anthropic
  • Hugging Face Agents Course, course, by Hugging Face
  • How We Built Ellipsis (or: Lessons from 27 months building LLM coding agents), blog post, by Nick Bradford
  • Agentic Pattern, blog post, by Phil Schmid

Was this page useful?

Questions? We're here to help

Subscribe to updates