Module 6: AI Agents Deep Dive β From Concept to Production
Duration: 90-120 minutes | Level: Deep-Dive Audience: Cloud Architects, Platform Engineers, AI Engineers Last Updated: March 2026
6.1 What Is an AI Agent?β
An AI agent is a system that goes beyond responding to prompts. It plans, reasons, uses tools, and takes autonomous actions to accomplish goals on behalf of a user. Where a chatbot answers questions, an agent gets things done.
The Agent Equationβ
At its core, every AI agent is composed of four building blocks:
Agent = LLM + Memory + Tools + Planning
| Component | Role | Example |
|---|---|---|
| LLM | The reasoning engine β understands intent, generates plans, makes decisions | GPT-4o, Claude, Llama |
| Memory | Retains context across steps and sessions β short-term and long-term | Conversation history, vector store, Redis |
| Tools | External capabilities the agent can invoke to interact with the world | Search APIs, databases, code interpreters, REST endpoints |
| Planning | Decomposes complex goals into executable steps and decides what to do next | ReAct loop, plan-and-execute, reflection |
Chatbot vs Agent β The Key Differencesβ
| Dimension | Traditional Chatbot | AI Agent |
|---|---|---|
| Interaction model | Single turn Q&A or scripted dialog | Multi-step autonomous workflow |
| Decision making | Pattern matching or intent classification | LLM-based reasoning and planning |
| Tool access | None or hardcoded integrations | Dynamic tool selection and invocation |
| State | Stateless or session-scoped | Persistent memory across sessions |
| Autonomy | Zero β waits for user input each turn | Can execute multi-step plans independently |
| Error handling | Returns "I don't understand" | Re-plans, retries, asks clarifying questions |
| Output | Text responses | Text, actions, API calls, file generation, code execution |
Why Agents Are the Next Evolutionβ
The evolution follows a clear trajectory:
Each step adds capability that the previous generation could not achieve. Agents represent the transition from responsive AI (wait for question, answer it) to proactive AI (receive a goal, plan and execute it).
Agent vs Copilot vs Assistant β Definitionsβ
These terms are used interchangeably in industry marketing, but they have distinct meanings for architects:
| Term | Definition | Autonomy Level | Example |
|---|---|---|---|
| Assistant | An LLM that answers questions and follows instructions within a single conversation | Low β responds to direct requests | Azure OpenAI Chat, ChatGPT |
| Copilot | An AI embedded in a workflow that suggests actions but requires human approval | Medium β suggests, human decides | GitHub Copilot, M365 Copilot |
| Agent | An AI system that autonomously plans and executes multi-step tasks using tools | High β plans and acts independently | AutoGen agents, Azure AI Agent Service |
If the AI only talks, it is an assistant. If it suggests actions in your workflow, it is a copilot. If it takes actions on its own, it is an agent. The boundaries are fluid β many production systems blend all three.
6.2 The Agent Architectureβ
The Core Agent Loopβ
Every agent β regardless of framework β follows the same fundamental loop. The LLM acts as the brain that observes the environment, reasons about what to do, selects a tool, executes an action, observes the result, and decides whether to continue or return a final answer.
The ReAct Pattern β Reasoning + Actingβ
The ReAct (Reasoning and Acting) pattern is the most widely adopted agent architecture. It interleaves chain-of-thought reasoning with tool use in an explicit, inspectable loop.
Thought: I need to find the current price of Azure SQL Database S3 tier.
Action: search_azure_pricing(service="Azure SQL Database", tier="S3")
Observation: The S3 tier costs approximately $200/month for 100 DTUs.
Thought: Now I need to compare this with the vCore-based model.
Action: search_azure_pricing(service="Azure SQL Database", model="vCore", tier="General Purpose 2 vCores")
Observation: General Purpose 2 vCores costs approximately $370/month.
Thought: I have both data points. I can now provide the comparison.
Final Answer: The DTU-based S3 tier costs ~$200/month while the vCore General Purpose 2-core tier costs ~$370/month...
Each iteration follows the Perception - Reasoning - Action - Observation cycle:
| Phase | What Happens | LLM's Role |
|---|---|---|
| Perception | Receive user input or observation from previous tool call | Parse and understand current state |
| Reasoning | Think about what to do next (chain-of-thought) | Generate a "Thought" step |
| Action | Select and invoke a tool with appropriate parameters | Generate a structured tool call |
| Observation | Receive tool result and incorporate into context | Parse tool output, update understanding |
Single-Turn vs Multi-Turn Agentsβ
| Aspect | Single-Turn Agent | Multi-Turn Agent |
|---|---|---|
| Conversation scope | Receives one task, executes it, returns | Maintains ongoing conversation with user |
| User interaction | No mid-task interaction | Can ask clarifying questions during execution |
| State management | Stateless or ephemeral | Stateful β maintains context across turns |
| Use case | Batch processing, automated tasks | Interactive assistants, complex workflows |
| Complexity | Lower | Higher (session management, state persistence) |
Stateful vs Stateless Agent Designβ
| Design | Characteristics | Trade-offs |
|---|---|---|
| Stateless | No persistent memory between invocations. Each request is independent. Context must be passed in every call. | Simple to scale horizontally. No session affinity needed. Higher token cost (re-send context). |
| Stateful | Maintains conversation history and working memory across requests. Thread-based execution model. | Lower per-request token cost. Requires session persistence. More complex to scale and recover from failures. |
For production systems, most agents use a hybrid approach: stateless compute with externalized state in a managed store (e.g., Azure Cosmos DB for conversation history, Azure AI Search for long-term memory).
6.3 Core Agent Capabilitiesβ
Planningβ
Planning is the capability that separates agents from simple tool-calling chatbots. An agent with planning can take a complex goal and decompose it into a sequence of actionable steps.
Task Decompositionβ
Given a complex request like "Research the top 3 Azure regions with the lowest latency for our users in Southeast Asia, then estimate monthly costs for running our AKS cluster in each region", a planning-capable agent will:
- Identify sub-tasks: region research, latency analysis, cost estimation
- Determine dependencies: cost estimation depends on region selection
- Order execution: research first, then estimate
- Execute each step using appropriate tools
Planning Strategiesβ
| Strategy | How It Works | Best For | Trade-off |
|---|---|---|---|
| ReAct | Interleave reasoning and acting one step at a time. Decide the next step only after observing the result of the current step. | Dynamic tasks with uncertain steps | Slower β each step waits for the previous |
| Plan-then-Execute | Generate a complete plan upfront, then execute all steps sequentially. | Well-defined, predictable tasks | Plan may become invalid if early steps fail |
| Reflection | After execution, the agent reviews its output, identifies weaknesses, and iterates. | Quality-critical outputs (writing, analysis) | Higher token cost due to self-review loops |
| Plan-Reflect-Replan | Generate plan, execute, reflect on results, replan if needed. | Complex multi-step workflows | Most expensive but most robust |
Memoryβ
Memory gives agents the ability to retain and recall information. Without memory, every agent invocation starts from zero. Production agents typically employ multiple memory tiers.
Memory Typesβ
| Memory Type | Scope | Persistence | Implementation | Analogy |
|---|---|---|---|---|
| Short-Term Memory | Current conversation | Session-lived | Message array in the API call (system + user + assistant messages) | Your notepad during a meeting |
| Working Memory | Current task | Task-lived | Scratchpad variable the agent updates during a multi-step plan | Your whiteboard while solving a problem |
| Long-Term Memory | Cross-session | Persistent | Vector database, relational database, or key-value store | Your filing cabinet of past projects |
| Episodic Memory | Past interactions | Persistent | Stored summaries of previous conversations and outcomes | Your memory of how past projects went |
Memory Architectureβ
Memory Management Strategiesβ
| Strategy | Description | When to Use |
|---|---|---|
| Sliding Window | Keep only the last N messages in context | Simple chatbots, cost-sensitive workloads |
| Summarization | Periodically summarize old messages into a compact summary, discard originals | Long-running conversations that exceed context windows |
| Retrieval-Augmented | Store all messages in a vector DB, retrieve only relevant ones per turn | Agents that need to recall specific past interactions |
| Tiered Eviction | Recent messages in full, older messages summarized, oldest in vector store | Production agents balancing cost and recall quality |
Tool Use / Function Callingβ
Tool use is the capability that gives agents hands. The LLM reasons about what tool to call, generates the parameters, and the application code executes the tool and feeds the result back into the LLM.
How Function Calling Worksβ
The critical insight is that the LLM never calls the tool directly. It generates a structured request (function name + parameters), and the application code is responsible for execution. This gives architects full control over security, validation, and error handling.