Module 11: Quick Reference Cards β AI Essentials at a Glance
Type: Reference | Level: All Levels Usage: Bookmark this page β your daily AI quick-reference Last Updated: March 2026
How to Use This Pageβ
This module is designed as a fast-lookup reference. Every card is self-contained. Use your browser's Ctrl+F / Cmd+F to jump to any term instantly. No narrative β just facts, tables, and decision shortcuts.
Card 1: LLM Generation Parameters Cheat Sheetβ
Every parameter you can tune when calling a large language model.
| Parameter | Range | Typical Default | What It Controls | Rule of Thumb |
|---|---|---|---|---|
| Temperature | 0.0 β 2.0 | 1.0 | Randomness of token sampling. Lower = deterministic, higher = creative. | 0.0β0.3 for factual / extraction. 0.7β1.0 for creative writing. Never exceed 1.5 in production. |
| Top-P (nucleus sampling) | 0.0 β 1.0 | 1.0 | Cumulative probability cutoff. Only tokens within the top-P mass are considered. | Use 0.9β0.95 for balanced output. Set to 1.0 and control via temperature, or vice-versa β avoid tuning both simultaneously. |
| Top-K | 1 β vocabulary size | Model-dependent | Limits sampling to the K most probable next tokens. | 40β100 is a safe range. Not exposed in Azure OpenAI β available in open-source / Hugging Face models. |
| Frequency Penalty | -2.0 β 2.0 | 0.0 | Penalizes tokens proportionally to how often they already appeared. Reduces repetition. | 0.3β0.8 to reduce repetitive phrasing. Values above 1.0 can distort output. |
| Presence Penalty | -2.0 β 2.0 | 0.0 | Flat penalty applied once a token has appeared at all. Encourages topic diversity. | 0.3β0.6 for varied topic coverage. Combine lightly with frequency penalty β don't max both. |
| Max Tokens (max_completion_tokens) | 1 β model context limit | Model-dependent | Hard ceiling on response length in tokens. | Set explicitly to avoid runaway costs. Estimate: 1 paragraph ~ 80β120 tokens, 1 page ~ 600β800 tokens. |
| Stop Sequences | Up to 4 strings | None | Generation halts when any stop sequence is emitted. | Use ["\n\n"] for single-paragraph answers. Use ["```"] to stop after a code block. |
| Seed | Any integer | None | When set, the service attempts deterministic output (best-effort). | Use for reproducible evaluations and regression testing. Same seed + same prompt + same parameters = same output (mostly). |
| Response Format | text, json_object, json_schema | text | Forces structured output format. | Use json_schema for reliable structured extraction. Always include "respond in JSON" in the prompt when using json_object. |
| N | 1 β 128 | 1 | Number of completions to generate per request. | Use n > 1 only for ranking/voting strategies. Multiplies token cost linearly. |
| Logprobs | true/false, top_logprobs 0β20 | false | Returns log-probabilities for each output token. | Use for confidence scoring, calibration, and classification thresholds. |
| Logit Bias | Token ID β bias (-100 to 100) | {} | Directly adjusts probability of specific tokens. -100 = ban token. | Ban unwanted tokens (e.g., profanity token IDs). Use sparingly β hard to maintain. |
Parameter Interaction Quick Rulesβ
| Scenario | Temperature | Top-P | Freq. Penalty | Presence Penalty |
|---|---|---|---|---|
| Deterministic extraction | 0.0 | 1.0 | 0.0 | 0.0 |
| Conversational chatbot | 0.7 | 0.95 | 0.3 | 0.3 |
| Creative writing | 1.0 | 0.95 | 0.5 | 0.6 |
| Code generation | 0.2 | 0.95 | 0.0 | 0.0 |
| Brainstorming / ideation | 1.2 | 1.0 | 0.8 | 0.8 |
| Summarization | 0.3 | 0.95 | 0.0 | 0.0 |
| Translation | 0.3 | 0.95 | 0.0 | 0.0 |
| Customer support bot | 0.5 | 0.9 | 0.4 | 0.2 |
Common API Call Patternsβ
Python (Azure OpenAI SDK) β Minimal call:
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://<resource>.openai.azure.com/",
api_key="<key>", # or use DefaultAzureCredential
api_version="2025-03-01-preview"
)
response = client.chat.completions.create(
model="<deployment-name>",
messages=[{"role": "user", "content": "Hello"}],
temperature=0.7,
max_tokens=800
)
print(response.choices[0].message.content)
Key API versions (Azure OpenAI):
| API Version | Status | Notes |
|---|---|---|
2025-03-01-preview | Latest preview | Newest features |
2024-12-01-preview | Preview | Structured outputs, reasoning |
2024-10-21 | GA (stable) | Production recommended |
2024-06-01 | GA | Broadly supported |
Card 2: Token Quick Referenceβ
What Is a Token?β
| Fact | Value |
|---|---|
| Average token length (English) | ~4 characters |
| Tokens per word (English avg.) | ~1.33 tokens per word (~0.75 words per token) |
| Tokens per word (code) | ~2β3 tokens per word (symbols split aggressively) |
| Tokens per word (non-Latin scripts) | ~2β4 tokens per character for CJK languages |
Token Estimation Formulasβ
English text: tokens β word_count Γ 1.33
Code: tokens β character_count Γ· 3
Mixed content: tokens β character_count Γ· 4
Common Text Lengths in Tokensβ
| Content Type | Approximate Tokens |
|---|---|
| A short email (3β4 sentences) | ~100β200 |
| One A4 page of text | ~600β800 |
| A long blog post (2,000 words) | ~2,700 |
| A technical whitepaper (10 pages) | ~7,000β9,000 |
| A full novel (80,000 words) | ~107,000 |
| 1 hour of transcribed speech | ~8,000β10,000 |
| A typical Slack conversation (50 messages) | ~2,000β3,000 |
| JSON payload (1 KB) | ~300β400 |
| A complete React component file | ~500β1,500 |
Context Windows by Model (March 2026)β
| Model | Provider | Context Window | Max Output Tokens |
|---|---|---|---|
| GPT-4.1 | Azure OpenAI | 1,047,576 (1M) | 32,768 |
| GPT-4.1 mini | Azure OpenAI | 1,047,576 (1M) | 32,768 |
| GPT-4.1 nano | Azure OpenAI | 1,047,576 (1M) | 32,768 |
| GPT-4o | Azure OpenAI | 128,000 | 16,384 |
| GPT-4o mini | Azure OpenAI | 128,000 | 16,384 |
| o3 | Azure OpenAI | 200,000 | 100,000 |
| o4-mini | Azure OpenAI | 200,000 | 100,000 |
| o3-mini | Azure OpenAI | 200,000 | 100,000 |
| o1 | Azure OpenAI | 200,000 | 100,000 |
| Claude Opus 4 | Anthropic | 200,000 | 32,000 |
| Claude Sonnet 4 | Anthropic | 200,000 | 64,000 |
| Gemini 2.5 Pro | 1,048,576 (1M) | 65,536 | |
| Gemini 2.5 Flash | 1,048,576 (1M) | 65,536 | |
| Llama 4 Maverick | Meta (via Azure) | 1,048,576 (1M) | 32,768 |
| DeepSeek-R1 | DeepSeek (via Azure) | 128,000 | 16,384 |
| Mistral Large | Mistral (via Azure) | 128,000 | 8,192 |
| Phi-4 | Microsoft | 16,384 | 4,096 |
| Phi-4-mini | Microsoft | 128,000 | 4,096 |
Azure OpenAI Pricing (Pay-As-You-Go, per 1M Tokens)β
Prices reflect Global Standard deployment where available. Check the Azure OpenAI pricing page for latest values.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4.1 | $2.00 | $8.00 |
| GPT-4.1 mini | $0.40 | $1.60 |
| GPT-4.1 nano | $0.10 | $0.40 |
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
| o3 | $10.00 | $40.00 |
| o4-mini | $1.10 | $4.40 |
| o3-mini | $1.10 | $4.40 |
| text-embedding-3-large | $0.13 | β |
| text-embedding-3-small | $0.02 | β |
| DALL-E 3 (Standard) | $0.040 / image | β |
| DALL-E 3 (HD) | $0.080 / image | β |
| Whisper | $0.36 / audio hour | β |
Global Batch Pricing (50% Discount)β
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4.1 | $1.00 | $4.00 |
| GPT-4.1 mini | $0.20 | $0.80 |
| GPT-4o | $1.25 | $5.00 |
| GPT-4o mini | $0.075 | $0.30 |
Cost rule of thumb: For a typical chatbot conversation (~1,500 input + 500 output tokens), GPT-4.1 nano costs ~$0.0004 per turn. GPT-4o costs ~$0.009 per turn. That is a ~20x difference.
Card 3: Model Selection Decision Treeβ
Use this table to pick the right model for your workload. Start from the need.
| Need | Recommended Model | Why | Fallback |
|---|---|---|---|
| Simple classification / routing | GPT-4.1 nano | Cheapest, fastest, sufficient for binary/multi-class | GPT-4o mini |
| Structured data extraction | GPT-4.1 mini | Great JSON mode, cost efficient | GPT-4.1 |
| General-purpose chatbot | GPT-4o | Strong general ability, broad knowledge | GPT-4.1 |
| Complex multi-step reasoning | o3 | Deep chain-of-thought, highest reasoning accuracy | o4-mini |
| Reasoning on a budget | o4-mini | 80% of o3 capability at ~10% cost | o3-mini |
| Code generation & review | GPT-4.1 | Optimized for code, instruction following | o4-mini |
| Long document analysis (>100K) | GPT-4.1 | 1M context window, strong recall | Gemini 2.5 Pro |
| Vision / image understanding | GPT-4o | Native multimodal, strong vision | GPT-4.1 (vision) |
| Embeddings | text-embedding-3-large | Best quality Azure embedding | text-embedding-3-small |
| On-device / edge | Phi-4-mini | Small footprint, strong for size | Phi-4 |
| Open-source self-hosted | Llama 4 Maverick | Strong open model, permissive license | DeepSeek-R1 |
| Batch processing (non-real-time) | GPT-4o (Global Batch) | 50% price discount for async | GPT-4.1 mini (Batch) |
| Audio transcription | Whisper | Purpose-built speech-to-text | Azure AI Speech |
| Text-to-speech | Azure AI Speech / GPT-4o Audio | High quality neural voices | β |
| Image generation | DALL-E 3 / GPT Image Gen | Native Azure integration |