Skip to main content

Module 3: Azure AI Foundry β€” Microsoft's AI Platform Deep Dive

Duration: 60-90 minutes | Level: Platform Audience: Cloud Architects, Platform Engineers, CSAs Last Updated: March 2026


3.1 What is Azure AI Foundry?​

Azure AI Foundry is Microsoft's unified platform for building, evaluating, and deploying generative AI applications at enterprise scale. If you are an infrastructure or platform architect, think of it as the control plane for everything AI inside Azure -- model catalog, deployments, evaluation pipelines, prompt orchestration, fine-tuning, and integrated AI services all accessible through a single portal, SDK, and CLI.

The Evolution​

The branding journey matters because customers encounter all three names in documentation, blog posts, and portal URLs.

EraNameWhat It WasKey Limitation
2016-2022Azure Machine Learning StudioDrag-and-drop ML training and deploymentFocused on classical ML; poor LLM support
2023-2024Azure AI StudioPreview portal for generative AI projectsSeparate from Azure ML; fragmented experience
Late 2024+Azure AI FoundryUnified platform merging Azure ML + AI StudioCurrent GA platform -- this module's focus

Azure AI Foundry is not a separate resource type that replaces Azure ML. Under the hood, the Azure ML workspace resource (Microsoft.MachineLearningServices/workspaces) is still the ARM building block. Foundry is a unified experience layer that consolidates model catalog, prompt flow, evaluation, deployments, and AI services into a single portal and SDK.

Three Interfaces, One Platform​

InterfaceBest ForExample Use Case
Azure AI Foundry portal (ai.azure.com)Exploration, visual prompt flow building, model comparisonA CSA demonstrating RAG to a customer
Azure AI Foundry SDK (Python)Programmatic model deployment, evaluation pipelines, CI/CDA platform team automating model rollouts
Azure CLI (az ml)Infrastructure provisioning, DevOps integrationAn IaC pipeline deploying Hubs and Projects
Architect's Mental Model

Think of Azure AI Foundry as "Azure Resource Manager for AI workloads." Just as ARM gives you a control plane for VMs, networking, and storage, Foundry gives you a control plane for models, endpoints, evaluations, and AI services -- with the same RBAC, networking, and compliance story you already know.


3.2 Architecture & Resource Model​

This is the section that matters most to platform architects. Azure AI Foundry introduces a two-tier workspace hierarchy: Hubs and Projects.

Hub and Project Model​

Hub (Parent Resource)​

The AI Foundry Hub is the shared administrative boundary. It owns:

  • Connections -- credentials and endpoints for Azure OpenAI, Azure AI Search, Storage, Key Vault, and external services (e.g., a Snowflake database, a custom API)
  • Compute resources -- shared compute instances and clusters that Projects can use
  • Networking configuration -- public access, private endpoints, managed VNet
  • Security policies -- RBAC role assignments, managed identity, customer-managed keys
  • Container Registry -- shared ACR for custom model images

A Hub maps to the ARM resource type Microsoft.MachineLearningServices/workspaces with kind: hub.

Project (Child Resource)​

A Project is an isolated workspace scoped to a single AI application or workload. It inherits connections and compute from the parent Hub but maintains its own:

  • Model deployments and endpoints
  • Prompt flow definitions
  • Evaluation runs and datasets
  • Fine-tuning jobs
  • Artifacts and logs

A Project maps to the ARM resource type Microsoft.MachineLearningServices/workspaces with kind: project and a hubResourceId pointing to its parent Hub.

Resource Relationship Summary​

ResourceScopeCardinalityKey Responsibility
HubOrganization/Team level1 per team or business unitShared config, connections, networking, policies
ProjectApplication levelMany per HubIsolated workspace for a specific AI app
Azure OpenAIConnected resource1 or more per HubLLM API access (GPT-4o, GPT-4.1, o-series)
Azure AI SearchConnected resource0 or more per HubVector search for RAG workloads
Storage AccountConnected resource1 per HubData, artifacts, prompt flow files, logs
Key VaultConnected resource1 per HubSecrets, connection strings, API keys
Container RegistryConnected resource0 or 1 per HubCustom model images, prompt flow images

RBAC Model​

Azure AI Foundry uses Azure RBAC with purpose-built roles.

RoleScopePermissions
Azure AI DeveloperProjectDeploy models, run evaluations, create prompt flows, manage endpoints
Azure AI Inference Deployment OperatorProjectDeploy and manage inference endpoints only
Azure ML Data ScientistProjectFull access to experiments, compute, data assets
ContributorHub or ProjectFull resource management (create/delete projects, manage connections)
ReaderHub or ProjectView resources and configurations, no modifications
OwnerHubFull control including RBAC assignment
RBAC Best Practice

Assign roles at the Project level, not the Hub level. This follows the principle of least privilege -- a developer working on the RAG application should not have access to the internal Copilot project's data and deployments. Use the Hub-level Contributor role only for platform administrators who manage shared infrastructure.

Networking Options​

ModeDescriptionUse Case
PublicHub and Projects are accessible over public internet with AAD authenticationDevelopment, PoCs, non-sensitive workloads
Private EndpointsHub, connected resources (AOAI, Search, Storage, KV) exposed only via private endpoints in your VNetProduction enterprise workloads
Managed VNetFoundry manages a VNet on your behalf; you control outbound rules. Compute runs inside this managed VNetSimplified private networking without BYO VNet complexity
Managed VNet + Data Exfiltration ProtectionManaged VNet with outbound restricted to approved destinations onlyHighly regulated industries (financial services, healthcare)

3.3 Model Catalog​

The Model Catalog is the front door of Azure AI Foundry. It is a curated marketplace of 1,800+ models from Microsoft and the open-source ecosystem, ready to deploy with a few clicks or a single SDK call.

Model Providers​

ProviderExample ModelsLicense Type
Azure OpenAI (Microsoft)GPT-4o, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, o3, o4-miniProprietary (Microsoft-hosted)
Microsoft ResearchPhi-4, Phi-4-mini, Phi-4-multimodal, MAI-1Open-weight (MIT license)
MetaLlama 4 Scout, Llama 4 Maverick, Llama 3.3 70BOpen-weight (Llama license)
MistralMistral Large 2, Mistral Small, Codestral, PixtralOpen-weight / Commercial
CohereCommand R+, Command R, Embed v3Commercial
AI21 LabsJamba 1.5 Large, Jamba 1.5 MiniCommercial
Hugging FaceHundreds of community models (BERT, T5, Whisper variants)Various open-source
NVIDIANemotron, NV-EmbedOpen-weight

Two Deployment Paradigms​

The catalog offers two fundamentally different ways to run a model. Understanding this distinction is critical for cost planning and architecture.

DimensionModels as a Service (MaaS)Managed Compute
What it isServerless API -- Microsoft hosts the modelYou deploy the model onto dedicated VMs
BillingPay-per-token (input + output tokens)Pay-per-hour for the VM SKU
Compute managementNone -- fully managedYou choose VM size, instance count, scaling rules
Cold startNone (always warm)Possible if scaled to zero
CustomizationSystem prompts and parameters onlyFull control -- custom containers, fine-tuned weights
Available modelsAzure OpenAI models, select partner models (Llama, Mistral, Cohere)Any model from the catalog or a custom model
NetworkingPublic endpoint with AAD auth; private endpoint availablePrivate endpoint, VNet integration, managed VNet
Best forQuick prototyping, variable workloads, multi-model testingProduction workloads with predictable traffic, custom models, strict isolation

Serverless API Endpoints (MaaS)​

Serverless API endpoints are the simplest path from model selection to production. You select a model from the catalog, accept the terms, and receive an API endpoint and key. No compute provisioning, no VM sizing, no scaling configuration.

Key characteristics:

  • Pay-per-token pricing -- you pay only for the tokens you consume (input + output)
  • No infrastructure to manage -- Microsoft handles scaling, availability, and hardware
  • Azure OpenAI-compatible API -- same SDK and API contract as Azure OpenAI deployments
  • Immediate availability -- endpoint is live within seconds of deployment
  • Regional availability matters -- not all models are available in all regions

Managed Online Endpoints​

Managed Online Endpoints give you dedicated compute for model serving. You deploy a model (from the catalog or your own custom model) to a specific VM SKU and control scaling.

Key characteristics:

  • Dedicated VMs -- choose from CPU or GPU SKUs (e.g., Standard_NC24ads_A100_v4)
  • Autoscaling -- scale based on request count, CPU, or custom metrics
  • Blue-green deployments -- traffic splitting across multiple deployments for safe rollouts
  • Custom containers -- bring your own inference server (e.g., vLLM, TGI, Triton)
  • VNet integration -- deploy endpoints inside a managed VNet with private endpoint access

Comparing Models in the Catalog​

The portal provides built-in tools for model comparison:

  1. Benchmark scores -- view standardized benchmarks (MMLU, HumanEval, MT-Bench) side by side
  2. Model cards -- detailed descriptions of capabilities, limitations, and intended use cases
  3. Try it out -- interactive playground to test models with your own prompts before deploying
  4. Pricing calculator -- estimate costs based on expected token volume
  5. Region availability -- check which regions support which models

3.4 Model Deployments​

Deployment is where architecture decisions meet operational reality. Azure AI Foundry supports three deployment types, each with different performance, cost, and management characteristics.

Deployment Type 1: Azure OpenAI Deployments​

These are deployments of Microsoft's proprietary models (GPT-4o, GPT-4.1, o3, etc.) through the Azure OpenAI Service resource connected to your Hub.

VariantDescriptionBillingSLABest For
StandardShared capacity in a single regionPay-per-token99.9%Development, moderate production workloads
Global StandardShared capacity across global regions (auto-routed)Pay-per-token (same price)99.9%Production workloads that benefit from global capacity and lower latency
Provisioned (PTU)Reserved throughput units in a specific regionPay-per-PTU-hour (reserved)99.9%Predictable high-volume workloads, latency-sensitive apps
Global Provisioned (PTU)Reserved throughput units across global regionsPay-per-PTU-hour (reserved)99.9%High-volume global workloads needing guaranteed throughput
Data Zone StandardShared capacity within a data boundary (e.g., EU)Pay-per-token99.9%Data residency requirements
Data Zone ProvisionedReserved throughput within a data boundaryPay-per-PTU-hour99.9%High-volume workloads with data residency requirements
Understanding PTUs

A Provisioned Throughput Unit (PTU) is a unit of reserved model processing capacity. One PTU does not equal one request -- the relationship depends on the model, prompt size, and generation length. Use the Azure OpenAI capacity calculator to estimate how many PTUs your workload needs. PTUs are committed in monthly or yearly reservations, with significant discounts for longer commitments.

Deployment Type 2: Serverless API Deployments​

These are pay-per-token deployments for partner models (Llama, Mistral, Cohere) and some Microsoft models that use the Models as a Service (MaaS) infrastructure.

  • No compute to manage
  • You accept the model provider's terms of use
  • Charged per million input/output tokens
  • Endpoint is Azure OpenAI-compatible (same SDK, same API shape)

Deployment Type 3: Managed Online Endpoints​

These are dedicated-compute deployments for any model -- from the catalog or custom.

  • You select the VM SKU and instance count
  • Support for autoscaling (min/max replicas, scaling metric)
  • Blue-green deployment with traffic splitting
  • Custom inference containers supported
  • Full VNet integration and private endpoint access

Deployment Types Comparison​

DimensionAzure OpenAI (Standard)Azure OpenAI (PTU)Serverless API (MaaS)Managed Online Endpoint
ModelsGPT-4o, GPT-4.1, o3, etc.GPT-4o, GPT-4.1, o3, etc.Llama, Mistral, Cohere, etc.Any (catalog or custom)
BillingPer tokenPer PTU-hour (reserved)Per tokenPer VM-hour
ThroughputShared, rate-limitedGuaranteed (reserved)Shared, rate-limitedDedicated (VM-bound)
LatencyVariable (shared pool)Predictable (reserved)Variable (shared pool)Predictable (dedicated)
ScalingAutomatic (within quota)Fixed PTU allocationAutomatic (within quota)Manual or autoscale
NetworkingPublic + Private EndpointPublic + Private EndpointPublic + Private EndpointManaged VNet + Private Endpoint
GPU managementNoneNoneNoneYou choose VM SKU
CustomizationSystem prompt, parametersSystem prompt, parametersSystem prompt, parametersFull (custom container, weights)

Quotas and Rate Limits​

Every deployment type has quotas. Platform architects must plan for quota as a first-class infrastructure concern.

Quota TypeApplies ToUnitHow to Increase
Tokens per Minute (TPM)Azure OpenAI StandardTokens/min per deploymentAzure portal quota page or support ticket
Requests per Minute (RPM)Azure OpenAI StandardRequests/min per deploymentDerived from TPM (approximately TPM / 6)
PTU allocationAzure OpenAI ProvisionedPTU count per subscription/regionCapacity reservation via portal or support
Endpoint countManaged Online EndpointsEndpoints per subscription/regionSupport ticket
VM coresManaged Online EndpointsvCPU cores per subscription/regionStandard Azure quota increase

3.5 Prompt Flow​

Prompt Flow is Azure AI Foundry's visual orchestration tool for building LLM applications. If you are familiar with Azure Logic Apps or Power Automate, the mental model is similar -- but purpose-built for AI workflows.

What is Prompt Flow?​

Prompt Flow lets you build DAG-based flows (Directed Acyclic Graphs) where each node performs a specific operation: call an LLM, execute Python code, process a prompt template, or invoke a tool. The output of one node feeds into the next, creating a composable pipeline.

Node Types​

Node TypePurposeExample
LLMCall a language model (Azure OpenAI, Serverless API)Generate a response given context and query
PromptDefine a prompt template with variable substitutionBuild a system message with {{context}} and {{query}} placeholders
PythonExecute arbitrary Python codeParse JSON, call an external API, transform data
ToolInvoke a pre-built or custom toolAzure AI Search retrieval, Bing Search, custom REST calls
LLM + Function CallingCall an LLM with tool definitions for autonomous tool selectionAgent-style node that decides which tools to call
ConditionalBranch the flow based on a conditionRoute to different LLMs based on query complexity

Use Case: Building a RAG Pipeline in Prompt Flow​

Here is how you would build a Retrieval-Augmented Generation pipeline visually in Prompt Flow:

Step 1: Input Node -- Accept the user's query as a string input.

Step 2: Embedding Node (Python) -- Call the Azure OpenAI embedding model to convert the query into a vector.

Step 3: Search Node (Tool) -- Query Azure AI Search with the vector to retrieve the top-k most relevant document chunks.

Step 4: Prompt Node -- Construct an augmented prompt that injects the retrieved chunks as context, along with the user query.

Step 5: LLM Node -- Send the augmented prompt to GPT-4o for answer generation.

Step 6: Output Node -- Return the generated answer along with source citations.

The entire flow is defined as a YAML file (flow.dag.yaml) that can be version-controlled in Git, making it CI/CD-friendly.

Evaluation Flows​

Prompt Flow supports a special type of flow called an evaluation flow. Instead of processing user queries, evaluation flows score the quality of outputs produced by your main flow.

An evaluation flow typically:

  1. Takes the main flow's output (answer), the ground-truth answer, and the original question as inputs
  2. Calls an LLM (or runs custom Python logic) to score the output on metrics like groundedness, relevance, and coherence
  3. Outputs numerical scores that can be aggregated across a test dataset

This enables automated quality gates in your CI/CD pipeline -- if the evaluation scores drop below a threshold, the deployment is blocked.

Deploying a Flow as an Endpoint​

Once you have built and tested a Prompt Flow:

  1. Build the flow into a Docker container (Foundry handles this automatically)
  2. Deploy the container to a Managed Online Endpoint
  3. Configure autoscaling, traffic splitting, and authentication
  4. Monitor with built-in metrics (latency, throughput, error rate, token consumption)

The deployed flow exposes a REST API endpoint that your application calls -- just like any other microservice.


3.6 Model Evaluation​

Evaluation is the most underinvested area in most AI projects and the most important area for production readiness. Azure AI Foundry provides built-in evaluation capabilities that let you systematically measure model quality before deployment.

Why Evaluation Matters​

Without EvaluationWith Evaluation
"It seems to work okay in my testing"Quantified quality scores across hundreds of test cases
Ship and hopeShip with confidence backed by metrics
Catch problems from user complaintsCatch problems before users see them
No regression detectionAutomated regression testing in CI/CD
Anecdotal quality assessmentData-driven model selection and prompt optimization

Built-in Evaluation Metrics​

Azure AI Foundry provides LLM-as-a-judge evaluation metrics that use a grader model (typically GPT-4o) to score your application's outputs.

MetricWhat It MeasuresScaleWhen It Matters
GroundednessIs the answer supported by the provided context? (Not hallucinated)1-5RAG applications -- critical for factual accuracy
RelevanceDoes the answer address the user's actual question?1-5All applications -- ensures on-topic responses
CoherenceIs the answer logically structured and readable?1-5Long-form generation -- reports, summaries, explanations
FluencyIs the language natural, grammatically correct?1-5Customer-facing applications
SimilarityHow close is the answer to a known ground-truth answer?1-5Applications with deterministic expected outputs
F1 ScoreToken-level overlap with ground-truth0-1Extractive QA tasks
ROUGEN-gram overlap with reference text0-1Summarization tasks
BLEUPrecision of n-gram overlap0-1Translation tasks

Custom Evaluation Metrics​

When built-in metrics are not sufficient, you can define custom evaluation metrics using:

  • Python functions -- Write a Python function that takes the model output and returns a score
  • LLM-as-a-judge prompts -- Write a custom prompt that instructs GPT-4o to score the output on your domain-specific criteria (e.g., "Does this medical summary include all required ICD-10 codes?")
  • Composite metrics -- Combine multiple metrics into a single quality score with weighted averages

Red-Teaming Evaluations​

Red-teaming tests whether your AI application can be manipulated into producing harmful, biased, or policy-violating outputs.

Azure AI Foundry supports red-teaming through:

  1. Automated adversarial testing -- Built-in adversarial datasets that probe for jailbreaks, prompt injections, and content policy violations
  2. Custom red-team datasets -- Define your own adversarial prompts tailored to your application's domain
  3. Azure AI Content Safety integration -- Automatically score outputs for hate speech, violence, self-harm, and sexual content severity levels
  4. Human-in-the-loop review -- Export flagged outputs for manual review by your safety team

Evaluation Datasets and Test Suites​

A robust evaluation requires a well-curated test dataset. Best practices:

ComponentDescriptionRecommended Size
Golden datasetCurated question-answer pairs with verified ground-truth100-500 examples
Edge case datasetUnusual, ambiguous, or boundary-condition queries50-100 examples
Adversarial datasetPrompt injection attempts, jailbreak probes, out-of-scope queries50-200 examples
Regression datasetPreviously failed cases that were fixed -- prevents regressionsGrows over time
Evaluation as a CI/CD Gate

The highest-maturity AI teams treat evaluation as a deployment gate. Every PR that changes a prompt, updates a RAG pipeline, or swaps a model triggers an automated evaluation run. If scores drop below the baseline, the deployment is blocked. This is no different from blocking a deployment on failing unit tests -- the principle is identical.


3.7 Fine-Tuning​

Fine-tuning is the process of further training a pre-trained model on your domain-specific data to improve its performance on your specific use case. Azure AI Foundry supports fine-tuning for select models directly within the platform.

When to Fine-Tune (And When NOT To)​

Fine-Tuning Decision Matrix​

TechniqueCostTime to ImplementData RequiredBest ForRisk
Prompt EngineeringFreeMinutes to hours0 - a few examplesFormatting, behavior control, output structureLow
RAGMedium ($)Hours to daysDocument corpusFactual grounding, dynamic/changing knowledgeLow-Medium
Fine-TuningHigh ($$)Days to weeks50-10,000+ examplesDomain style, specialized vocabulary, consistent toneMedium (catastrophic forgetting)
Pre-Training from scratchVery High ($$$$$)Weeks to monthsBillions of tokensEntirely new language or domainVery High

Supported Models for Fine-Tuning​

ModelMinimum ExamplesRecommended ExamplesMax Training TokensFine-Tuning Method
GPT-4o1050-500Varies by tierSupervised fine-tuning
GPT-4o-mini1050-500Varies by tierSupervised fine-tuning
GPT-4.1-mini1050-500Varies by tierSupervised fine-tuning
Phi-410100-1,000VariesLoRA / Full fine-tuning
Phi-4-mini10100-1,000VariesLoRA / Full fine-tuning
Llama 3.3 70B10100-1,000VariesLoRA
Mistral models10100-1,000VariesLoRA

Fine-Tuning Workflow​

Data preparation format (JSONL):

{"messages": [{"role": "system", "content": "You are a medical coding specialist."}, {"role": "user", "content": "Patient presents with acute bronchitis and mild dehydration."}, {"role": "assistant", "content": "ICD-10 Codes:\n- J20.9 (Acute bronchitis, unspecified)\n- E86.0 (Dehydration)"}]}
{"messages": [{"role": "system", "content": "You are a medical coding specialist."}, {"role": "user", "content": "Follow-up for type 2 diabetes, well controlled."}, {"role": "assistant", "content": "ICD-10 Code:\n- E11.65 (Type 2 diabetes mellitus with hyperglycemia)"}]}

LoRA and Parameter-Efficient Fine-Tuning​

LoRA (Low-Rank Adaptation) is a technique that fine-tunes only a small number of additional parameters (adapters) rather than updating all model weights. This has major implications for architects:

DimensionFull Fine-TuningLoRA Fine-Tuning
Parameters updatedAll (billions)Small adapter matrices (millions)
GPU memory requiredVery high (40-80 GB+)Much lower (often fits on a single GPU)
Training timeHours to daysMinutes to hours
Storage per modelFull model copy (tens of GB)Small adapter file (tens of MB)
Risk of catastrophic forgettingHigherLower (base model unchanged)
Multiple specialtiesNeed a full copy per specialtySwap adapters at inference time
QualityMarginally better for large domain shiftsExcellent for most use cases

Cost and Compute Requirements​

Fine-tuning costs are driven by three factors:

  1. Training compute -- GPU hours consumed during training (typically Standard_NC24ads_A100_v4 or similar)
  2. Hosting cost -- Fine-tuned Azure OpenAI models incur higher per-token costs than base models; custom models on Managed Online Endpoints cost per VM-hour
  3. Data preparation -- Human time to curate, clean, and validate training data (often the most expensive part)

3.8 Azure AI Services (Integrated)​

Azure AI Foundry integrates with the broader Azure AI Services family -- pre-built, task-specific AI capabilities that were previously known as Azure Cognitive Services. These services complement generative AI models by handling specialized tasks like speech recognition, document parsing, and content moderation.

Service Overview​

ServiceCapabilitiesCommon Use CasesIntegration with Foundry
Azure AI SpeechSpeech-to-text (STT), text-to-speech (TTS), speech translation, speaker recognitionVoice-enabled copilots, call center analytics, accessibilityPrompt Flow speech nodes, real-time conversation APIs
Azure AI VisionImage analysis, OCR, spatial analysis, face detection, custom image classificationDocument digitization, visual search, accessibilityMulti-modal RAG (image + text), document processing pipelines
Azure AI LanguageNamed Entity Recognition (NER), sentiment analysis, key phrase extraction, summarization, PII detectionCustomer feedback analysis, compliance scanning, content taggingPre-processing nodes in Prompt Flow, PII redaction before LLM calls
Azure AI Document IntelligenceForm extraction, invoice processing, receipt parsing, layout analysis, custom document modelsAccounts payable automation, contract analysis, claims processingDocument ingestion for RAG pipelines, structured data extraction
Azure AI Content SafetyText and image content moderation, prompt shield, groundedness detection, protected material detectionGuardrails for AI applications, user-generated content moderationBuilt-in content filtering for Azure OpenAI deployments, evaluation metrics
Azure AI TranslatorText translation (100+ languages), document translation, custom terminologyMulti-language copilots, document localizationPre/post-processing in Prompt Flow

How These Integrate with Foundry Projects​

Key integration patterns:

  1. Document Intelligence as RAG Ingestion -- Use Document Intelligence to extract text, tables, and structure from PDFs and images, then chunk and embed the output for vector search
  2. Content Safety as a Guardrail -- Content Safety filters run automatically on Azure OpenAI deployments; you can also invoke them explicitly in Prompt Flow for custom models
  3. Speech as an I/O Layer -- Add voice input/output to any Prompt Flow by using Speech STT (input) and TTS (output) nodes
  4. Language for Pre-Processing -- Use PII detection to redact sensitive data before sending to an LLM; use NER to extract entities for structured queries

3.9 Infrastructure Considerations​

This section addresses the infrastructure decisions that platform architects must make when deploying Azure AI Foundry in production.

Compute Options​

Compute TypeUsed ForManagementGPUTypical SKUs
Serverless (MaaS)Azure OpenAI and partner model inferenceFully managed by MicrosoftN/A (abstracted)N/A
Managed Compute InstanceDevelopment, Prompt Flow authoring, notebooksManaged VM (start/stop)OptionalStandard_DS3_v2, Standard_NC6s_v3
Managed Compute ClusterTraining, fine-tuning, batch inferenceManaged cluster (auto-scaling)YesStandard_NC24ads_A100_v4, Standard_ND96amsr_A100_v4
Managed Online EndpointProduction model servingManaged deployment (auto-scaling)Yes, for LLM servingStandard_NC24ads_A100_v4, Standard_NC48ads_H100_v5
Kubernetes (AKS)Self-managed model serving via attached AKSCustomer-managedYes (GPU node pools)Any AKS-supported GPU VM

Networking Deep Dive​

Production deployments of Azure AI Foundry require careful networking design. The following table summarizes the network endpoints you need to plan for:

ResourcePrivate Endpoint Required?DNS ZoneNotes
AI Foundry HubYesprivatelink.api.azureml.msControls access to the workspace API
Azure OpenAIYesprivatelink.openai.azure.comMust be in the same or peered VNet
Azure AI SearchYesprivatelink.search.windows.netRequired for private RAG pipelines
Storage Account (blob)Yesprivatelink.blob.core.windows.netData, artifacts, logs
Storage Account (file)Yesprivatelink.file.core.windows.netFile shares for compute instances
Key VaultYesprivatelink.vaultcore.azure.netSecrets and connection strings
Container RegistryYesprivatelink.azurecr.ioCustom model images
Managed Online EndpointAutomaticManaged by FoundryWhen using Managed VNet
DNS Resolution

Private endpoints require proper DNS resolution. Use Azure Private DNS Zones linked to your VNet, or configure conditional forwarders in your on-premises DNS infrastructure. Missing or incorrect DNS resolution is the #1 cause of connectivity failures in private AI Foundry deployments.

Data Residency and Compliance​

ConcernHow Azure AI Foundry Addresses It
Data residencyChoose Hub region carefully. Data (prompts, completions, training data) stays in the Hub's region. Azure OpenAI processing region depends on deployment type (Standard = single region; Global Standard = Microsoft-routed).
Data processingPrompts and completions are NOT used to train Microsoft models. Opt-out is the default for Azure OpenAI Service.
Compliance certificationsAzure OpenAI and AI Foundry inherit Azure's compliance portfolio (SOC 2, ISO 27001, HIPAA BAA, FedRAMP, etc.). Verify per-model availability in compliance-scoped regions.
Customer-Managed Keys (CMK)Supported at the Hub level for encrypting data at rest with your own Key Vault key.
Managed IdentityHub and Projects use system-assigned or user-assigned managed identities for authentication to connected resources -- no API keys in code.

RBAC and Security Model (Expanded)​

Security best practices:

  1. Use Managed Identity for all service-to-service communication. Avoid storing API keys in Key Vault when managed identity is supported.
  2. Enable Managed VNet with data exfiltration protection for regulated workloads.
  3. Apply RBAC at the Project level -- not the Hub level -- to enforce least privilege.
  4. Use Conditional Access policies in Entra ID to enforce MFA and compliant device requirements for Foundry portal access.
  5. Enable diagnostic logging to send Hub and Project audit logs to a Log Analytics workspace or SIEM.

Cost Management​

Cost DriverHow to Optimize
Azure OpenAI Standard (pay-per-token)Monitor token usage per deployment; set TPM quotas to prevent runaway costs; use smaller models (GPT-4.1-mini, GPT-4.1-nano) for simpler tasks
Azure OpenAI Provisioned (PTU)Right-size PTU allocation using the capacity calculator; commit to 1-year reservations for ~40% discount; consolidate workloads on shared PTUs
Managed Online EndpointsEnable autoscaling with scale-to-zero for dev/test; use spot VMs for non-production fine-tuning; right-size GPU SKUs
StorageLifecycle management policies for training data and logs; delete unused evaluation datasets
Azure AI SearchRight-size the Search SKU; use semantic ranker only when needed; partition indexes by workload
Compute InstancesAuto-shutdown schedules for dev instances; use small SKUs for Prompt Flow authoring

3.10 Azure AI Foundry vs AWS Bedrock vs Google Vertex AI​

Platform architects in multi-cloud environments need to understand how Azure AI Foundry compares to its counterparts on AWS and Google Cloud.

Feature Comparison​

CapabilityAzure AI FoundryAWS BedrockGoogle Vertex AI
Unified AI platformYes (portal, SDK, CLI)Partial (Bedrock + SageMaker)Yes (Vertex AI console, SDK, CLI)
Model catalog size1,800+ models~30 models (fewer providers)200+ models (Model Garden)
Proprietary frontier modelsGPT-4.1, o3, o4-mini (Azure OpenAI exclusive)Claude (Anthropic), Nova (Amazon)Gemini 2.5 (Google-exclusive)
Open-weight modelsLlama, Mistral, Phi, Cohere, AI21Llama, Mistral, CohereLlama, Mistral, Gemma
Serverless API (MaaS)Yes (Azure OpenAI + partner models)Yes (all Bedrock models)Yes (Vertex AI endpoint)
Dedicated compute deploymentManaged Online EndpointsSageMaker EndpointsVertex AI Endpoints
Fine-tuningGPT-4o, Phi, Llama, Mistral (in-platform)Claude, Llama, Titan (limited)Gemini, Llama (in-platform)
Prompt orchestrationPrompt Flow (visual + YAML)Bedrock Agents + Step FunctionsVertex AI Reasoning Engine
RAG built-inYes (AI Search integration)Yes (Knowledge Bases)Yes (Vertex AI Search)
Evaluation frameworkBuilt-in (groundedness, relevance, etc.)Limited (Bedrock Evaluation - preview)Built-in (AutoSxS, Gen AI Evaluation)
Content safetyAzure AI Content Safety (integrated)Bedrock GuardrailsVertex AI Safety Filters
Enterprise networkingPrivate Endpoints, Managed VNet, data exfiltration protectionVPC endpoints, PrivateLinkVPC-SC, Private Service Connect
RBAC granularityHub + Project level RBACIAM policies (resource-level)IAM + Vertex AI roles
Compliance portfolio100+ certifications90+ certifications100+ certifications
Agent frameworkAzure AI Agent Service, Semantic Kernel, AutoGenBedrock AgentsVertex AI Agent Builder
Workspace hierarchyHub/Project (two-tier)Flat (per-account)Flat (per-project)

Where Azure AI Foundry Shines​

  1. Azure OpenAI exclusivity -- GPT-4.1, o3, and o4-mini are available on Azure with enterprise SLAs, private networking, and content filtering. No other cloud offers these models as managed services.
  2. Enterprise security posture -- The Hub/Project model, Managed VNet with data exfiltration protection, customer-managed keys, and deep Entra ID integration provide a security story that is hard to match.
  3. Microsoft ecosystem integration -- Seamless connection to Microsoft 365 Copilot, Copilot Studio, Dynamics 365, Power Platform, and GitHub Copilot. If the customer is a Microsoft shop, the integration advantage is substantial.
  4. Model catalog breadth -- 1,800+ models (catalog + Hugging Face) give architects maximum flexibility without leaving the platform.
  5. Prompt Flow -- A mature, visual orchestration tool with YAML-based CI/CD support that neither Bedrock nor Vertex AI match in capability.

Where Competitors Have Edges​

AreaCompetitor EdgeDetail
Anthropic Claude modelsAWS BedrockClaude is available on Bedrock as a first-party integration with fine-tuning support. On Azure, Claude is available via the model catalog but with fewer features.
Google Gemini modelsGoogle Vertex AIGemini 2.5 Pro/Flash are Vertex AI exclusives with massive context windows (up to 1M tokens) and native multi-modal capabilities.
SageMaker maturityAWS SageMakerFor classical ML training and MLOps, SageMaker has a longer track record and deeper feature set than Azure ML components within Foundry.
Grounding with searchGoogle Vertex AIGoogle Search grounding in Vertex AI provides real-time web search context natively in API calls -- a unique capability.
SimplicityAWS BedrockBedrock's flat model (no Hub/Project hierarchy) is simpler for teams that do not need multi-project isolation.

Multi-Cloud Considerations​

For organizations running multi-cloud AI strategies:

  • Standardize on OpenAI-compatible API format -- Azure OpenAI, many Serverless API models, and several third-party providers all support the OpenAI chat completions API shape. Building your application against this API makes cloud portability easier.
  • Abstract the orchestration layer -- Use Semantic Kernel or LangChain as your orchestration framework rather than cloud-specific tools (Prompt Flow, Bedrock Agents). These frameworks support multiple model providers.
  • Separate model selection from infrastructure -- Design your architecture so that swapping a model provider (Azure OpenAI to Bedrock Claude) requires a configuration change, not a code rewrite.
  • Evaluate per workload -- Some workloads may be best served by Azure (GPT-4.1 for enterprise chat), others by AWS (Claude for long-context analysis), and others by Google (Gemini for multi-modal). Let the use case drive the platform decision.

Key Takeaways​

#Takeaway
1Azure AI Foundry is the unified platform for building, evaluating, and deploying AI applications on Azure -- it consolidates what was previously Azure ML Studio and Azure AI Studio.
2The Hub/Project model is your workspace hierarchy: Hub = shared infrastructure and connections, Project = isolated workspace per application. Design your Hub/Project topology like you design your subscription topology.
3The Model Catalog offers 1,800+ models via two paths: Serverless API (pay-per-token, zero compute management) and Managed Online Endpoints (dedicated compute, full control).
4Azure OpenAI deployments come in Standard (pay-per-token) and Provisioned (PTU, reserved throughput) variants. PTU planning is a first-class infrastructure concern for high-volume workloads.
5Prompt Flow is a visual DAG-based orchestration tool for building LLM pipelines (RAG, agents, chatbots) that can be deployed as managed endpoints and version-controlled in Git.
6Evaluation is not optional -- use built-in metrics (groundedness, relevance, coherence) and red-teaming evaluations as CI/CD gates before every production deployment.
7Fine-tune only when prompt engineering and RAG are insufficient -- fine-tuning is for style, tone, and specialized vocabulary, not for injecting knowledge (use RAG for that).
8Azure AI Services (Speech, Vision, Language, Document Intelligence, Content Safety) integrate natively with Foundry projects as pre/post-processing capabilities.
9Networking, RBAC, and cost management are the same disciplines you apply to any Azure workload -- private endpoints, managed identity, least-privilege RBAC, right-sized compute, and autoscaling.
10In a multi-cloud comparison, Azure AI Foundry's advantages are OpenAI model exclusivity, enterprise security depth, and Microsoft ecosystem integration. Competitors have edges in specific models (Claude on Bedrock, Gemini on Vertex AI) and simplicity.

What is Next?​

You now understand the platform where AI applications are built and deployed. In the next module, we explore the consumer-facing side of Microsoft's AI strategy -- the Copilot ecosystem that sits on top of this platform.

Next: Module 4: Microsoft Copilot Ecosystem -- M365 Copilot, Copilot Studio, Copilot Actions, GitHub Copilot, Copilot for Azure, and the extensibility model that connects them all.

Previous: Module 2: LLM Landscape