NEW in v4.1: Advanced Prompt Safety Monitor & Token Usage Analytics - Now Available! Explore Features →

AI Gateway Built for
Production Scale

Wag-Tail is an enterprise AI gateway that fronts multiple LLM providers with advanced security, semantic caching, intelligent routing, and comprehensive rate limit monitoring. Built for enterprise-grade AI applications.

100+ LLM Providers
99.9% Uptime
50ms Avg Latency
Request
Security
Cache
LLM
Response

What is Wag-Tail AI Gateway?

Wag-Tail AI Gateway is a comprehensive, enterprise-grade security and routing layer for Large Language Model (LLM) applications. It sits between your applications and LLM providers, providing advanced security filtering, intelligent routing, performance optimization, and enterprise-grade observability.

Whether you're building customer-facing AI applications, internal tools, or enterprise AI platforms, Wag-Tail ensures your LLM interactions are secure, fast, and compliant while giving you complete control over costs, routing, and data governance.

Quick Start

Get started with Wag-Tail in under 5 minutes:

import requests

# Replace your direct OpenAI calls
response = requests.post(
    "https://your-wagtail-gateway.com/chat",
    headers={
        "X-API-Key": "your-api-key",
        "Content-Type": "application/json"
    },
    json={"prompt": "What is machine learning?"}
)

# Get secure, filtered, and optimized responses
result = response.json()
print(result["response"])  # AI response with security filtering applied
print(result["cache_hit"])  # True if served from semantic cache (30x faster)

That's it. No complex integrations, no infrastructure changes. Just point your existing LLM calls to Wag-Tail and get enterprise-grade security and performance immediately.

Why Wag-Tail?

Cost Optimization

Reduce your AI spending by up to 70% with intelligent cost controls:

  • Token Management - Track and limit usage per org/group
  • Semantic Caching - 49x faster responses, fewer API calls
  • Prompt Compression - Reduce token consumption
  • Smart Routing - Route to cost-effective providers

Security Out of the Box

Enterprise-grade protection with zero configuration:

  • PII Detection & Masking - Auto-protect sensitive data
  • Prompt Injection Defense - AI-powered threat detection
  • Content Filtering - Block harmful inputs/outputs
  • Compliance Ready - GDPR, HIPAA, SOC2 support

End-to-End Visibility

Complete observability across your AI infrastructure:

  • Request Tracing - Full journey from request to response
  • Usage Analytics - Real-time dashboards per org/group
  • Audit Logging - Comprehensive trails for compliance
  • Langfuse Integration - Deep LLM observability

Enterprise Integrations

Seamless connectivity with industry-leading platforms:

  • F5 Distributed Cloud - AI Security guardrails
  • HashiCorp Vault - Secure credential management
  • Prometheus/Grafana - Metrics and monitoring
  • Webhook Guardrails - Custom security policies

Architecture Overview

Your Applications

HTTP/HTTPS requests

Wag-Tail AI Gateway

Security Pipeline (6 Layers)
  • API Key Authentication
  • Regex & Code Injection Filtering
  • PII Detection & Masking
  • AI Threat Classification (Advanced)
  • Rate Limiting & Quotas (Advanced)
  • Output Content Filtering
Performance Layer
  • Semantic Cache (Advanced) - 30x faster
  • Priority Queue (Advanced) - Enterprise SLA
  • Smart Routing & Failover
Observability Layer
  • Request/Response Logging
  • Usage Analytics & Billing
  • Langfuse Integration (Advanced)
  • Webhook Events

LLM Providers

OpenAI, Azure, Gemini, Claude, Ollama

Performance Benchmarks

Semantic Cache Performance

First Request 2,847ms
Cached Request 58ms
Improvement 49x faster

Security Processing

Basic Pipeline 5-15ms
AI Classification 20-50ms
Total Overhead <100ms

Throughput Capacity

Single Instance 5,000+ req/s
Multi-Node 25,000+ req/s
Enterprise Cluster 50,000+ req/s

Complete Enterprise Platform

All the features you need to secure, optimize, and manage your AI infrastructure

Security

  • PII Detection & Masking
  • AI-Powered Threat Classification
  • Prompt Injection Defense
  • Content Filtering & Guardrails
  • F5 Distributed Cloud Integration
  • Compliance Ready (GDPR, HIPAA, SOC2)

Performance

  • Semantic Caching (49x faster)
  • Token Management & Compression
  • Multi-Provider Routing & Failover
  • Priority Queuing System
  • 100+ LLM Provider Support
  • Cost Optimization & Analytics

Enterprise

  • Complete Admin Portal
  • Real-time Monitoring Dashboards
  • Multi-Tenant Group Management
  • Vault-Managed Secrets
  • Langfuse & Prometheus Integration
  • SSO, Audit Logs & Custom SLAs

Real Stories, Real Results

See how teams like yours solve critical AI challenges with Wag-Tail

The $50K Surprise

Cost Control

What happened: A fintech startup's AI pilot went viral internally. Without usage limits, costs spiraled to $50K in one month.

With Wag-Tail: Token quotas, department-level budgets, and real-time alerts caught the spike at $5K. The team got predictable costs without killing innovation.

Result: 90% cost reduction

The Data Breach That Didn't Happen

Security & Compliance

What happened: An employee pasted customer SSNs into ChatGPT for data analysis. The audit team found hundreds of similar incidents.

With Wag-Tail: PII detection auto-masks sensitive data before it ever leaves your network. Complete audit trails prove compliance to regulators.

Result: Zero PII leakage

2AM on Black Friday

High Availability

What happened: OpenAI hit rate limits during peak shopping. Customer service chatbots went down, tickets piled up.

With Wag-Tail: Automatic failover to Azure OpenAI happened in milliseconds. Semantic caching served 40% of requests without hitting any API.

Result: 99.99% uptime

The 10-Second Wait

Performance

What happened: Users complained about slow AI responses. Average latency was 8-10 seconds, killing user adoption.

With Wag-Tail: Semantic caching recognized similar questions and delivered cached responses in under 100ms. Users never noticed a difference from fresh responses.

Result: 49x faster responses

Shadow AI

AI Governance

What happened: IT discovered 47 different AI tools across departments, each with separate contracts, no oversight, and no security review.

With Wag-Tail: Single gateway for all AI access. Full visibility into who's using what, how much they're spending, and what data is being processed.

Result: Complete visibility

The Multi-Tenant Nightmare

SaaS Providers

What happened: A SaaS company needed to offer AI features to 500+ customers, each with different usage tiers and data isolation requirements.

With Wag-Tail: Multi-tenant isolation with per-customer quotas, audit logs, and model routing. Customers get exactly what they paid for.

Result: Scalable multi-tenancy

Ready to solve your AI challenges?

Enterprise Security

Advanced PII protection, code detection, and comprehensive audit trails for enterprise compliance.

Semantic Caching

Redis-powered semantic caching reduces costs and improves response times for similar queries.

Intelligent Routing

YAML-driven routing with health-based failover across multiple LLM providers.

Observability

Comprehensive monitoring with Langfuse integration, metrics, and distributed tracing.

Rate Limiting

Group-based rate limiting and usage tracking with priority queuing capabilities.

Enterprise Integrations

Seamless connectivity with F5, Vault, Langfuse, Prometheus, and other enterprise platforms.

LLM Provider Integration Framework

Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.

No Coding Required

90% of new model additions need only YAML configuration

1-Minute Setup

New models from existing providers in 1 minute

Hot-Reload Support

Configuration changes without restart needed

Automatic Fallback

Automatic error handling with fallback configurations

Configuration-Driven Provider Support

We ship with 5 major providers and carefully selected models, but you can easily add unlimited models through simple YAML configuration.

90%
Config Only (1 minute)
New models from existing providers
8%
Simple Mapping (10 minutes)
OpenAI-compatible APIs
2%
Custom Integration
Completely new API formats

OpenAI (GPT Models)

Streaming Multimodal

Supported Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o

Use Cases: General purpose, code generation, content creation, complex reasoning

llm:
  provider: openai
  model: gpt-4
  api_key: ${OPENAI_API_KEY}
  api_url: https://api.openai.com/v1/chat/completions
  
  openai:
    temperature: 0.7
    max_tokens: 2048
    timeout: 30

Azure OpenAI (Enterprise GPT)

Enterprise Streaming

Supported Models: gpt-4, gpt-35-turbo, text-embedding-ada-002

Enterprise Benefits: Data residency, private network, SOC 2/HIPAA compliance

llm:
  provider: azure
  model: gpt-4
  azure:
    api_key: ${AZURE_OPENAI_KEY}
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    deployment_name: ${AZURE_DEPLOYMENT_NAME}
    api_version: "2023-12-01-preview"

Google Gemini (Multimodal)

Multimodal Cost-Effective

Supported Models: gemini-pro, gemini-pro-vision, gemini-ultra

Strengths: Advanced reasoning, multimodal capabilities, built-in safety filters

llm:
  provider: gemini
  model: gemini-pro
  gemini:
    api_key: ${GOOGLE_API_KEY}
    endpoint: https://generativelanguage.googleapis.com/v1
    safety_settings:
      harassment: "block_medium_and_above"
      hate_speech: "block_medium_and_above"

Anthropic Claude (Safety-First)

Constitutional AI Long Context

Supported Models: claude-3-opus, claude-3-sonnet, claude-3-haiku

Features: Up to 200K tokens context, extensive safety training, ethical reasoning

llm:
  provider: anthropic
  model: claude-3-sonnet
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    endpoint: https://api.anthropic.com/v1/messages
    max_tokens: 4000
    system_message: "You are a helpful AI assistant."

Ollama (Local Models)

Free Privacy

Supported Models: mistral, llama2, codellama, neural-chat, starcode

Benefits: Complete data privacy, no per-token charges, offline capability

llm:
  provider: ollama
  model: mistral
  ollama:
    api_url: http://localhost:11434/api/generate
    timeout: 60
    context_length: 4096

# Installation:
# brew install ollama
# ollama pull mistral
# ollama serve

Simple YAML Configuration = Unlimited Models

Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.

Which Approach Should I Use?

1
Config Only (90%)
1 minute

New model from existing provider

Examples: GPT-5, Claude-4, Gemini-Ultra-2
2
Simple Mapping (8%)
10 minutes

OpenAI-compatible API

Examples: Together.ai, Replicate, Perplexity
3
Custom Integration (2%)
Contact Us

Different API format

Examples: Cohere, AI21 Labs

Real-World Configuration Examples

Adding New GPT Model (1 minute)

When OpenAI releases GPT-5, just add it to your YAML:

# config/sys_config.yaml - Just add to existing list!
llm:
  provider: openai
  model: gpt-5                    # NEW - just change the model name!
  
  openai:
    api_key: ${OPENAI_API_KEY}
    models: 
      - gpt-4                     # Existing
      - gpt-4-turbo              # Existing  
      - gpt-3.5-turbo            # Existing
      - gpt-5                     # NEW - just add to list!
    timeout: 30
Adding Perplexity API (10 minutes)

Perplexity uses OpenAI-compatible format:

# config/sys_config.yaml - OpenAI-compatible provider
llm:
  provider: perplexity
  model: sonar-medium-online

perplexity:
  api_key: ${PERPLEXITY_API_KEY}
  endpoint: https://api.perplexity.ai    # Different endpoint
  provider_type: openai_compatible       # Maps to OpenAI implementation
  models:
    - sonar-medium-online
    - sonar-small-chat
Enterprise Custom Endpoint (5 minutes)

Your company's custom OpenAI deployment:

# config/sys_config.yaml - Custom enterprise endpoint
llm:
  provider: custom_enterprise
  model: custom-gpt-4-fine-tuned

custom_enterprise:
  api_key: ${ENTERPRISE_API_KEY}
  endpoint: https://llm.yourcompany.com/v1
  provider_type: openai_compatible
  models:
    - custom-gpt-4-fine-tuned
    - company-specific-model

Complete YAML Configuration Reference

Multi-provider configuration with failover chains:

# config/sys_config.yaml - Complete example with all providers
llm:
  provider: openai                # Default provider
  model: gpt-4                    # Default model
  
  # OpenAI Configuration
  openai:
    api_key: ${OPENAI_API_KEY}
    models: [gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o]
    timeout: 30
    
  # Anthropic Configuration  
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    models: [claude-3-opus, claude-3-sonnet, claude-3-haiku]
    timeout: 30
    
  # Google Gemini Configuration
  gemini:
    api_key: ${GOOGLE_API_KEY}
    models: [gemini-pro, gemini-pro-vision, gemini-ultra]
    timeout: 30
    
  # Azure OpenAI Configuration
  azure:
    api_key: ${AZURE_OPENAI_KEY}
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    deployment_name: ${AZURE_DEPLOYMENT_NAME}
    api_version: "2023-12-01-preview"
    models: [gpt-4, gpt-35-turbo]
      
  # Ollama (Local) Configuration
  ollama:
    api_url: http://localhost:11434/api/generate
    models: [mistral, llama2, codellama, neural-chat]
      
  # Together.ai (OpenAI-compatible)
  together:
    api_key: ${TOGETHER_API_KEY}
    endpoint: https://api.together.xyz/inference
    provider_type: openai_compatible
    models: [meta-llama/Llama-2-70b-chat-hf, mistralai/Mistral-7B-Instruct-v0.1]

# Failover Configuration (Advanced Edition)
routing:
  fallback_chain:
    - provider: azure
      model: gpt-4
    - provider: openai  
      model: gpt-4
    - provider: gemini
      model: gemini-pro
    - provider: ollama
      model: mistral

Environment Variables Setup

# .env file - Set up your API keys
export OPENAI_API_KEY="sk-your-openai-key"
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key" 
export GOOGLE_API_KEY="your-google-api-key"
export AZURE_OPENAI_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_DEPLOYMENT_NAME="your-deployment-name"
export TOGETHER_API_KEY="your-together-key"
export PERPLEXITY_API_KEY="your-perplexity-key"

Testing Your Configuration

1. Validate YAML Syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"
2. Test Provider Connectivity
curl -X POST http://localhost:8000/chat \
  -H "X-API-Key: your-api-key" \
  -H "X-LLM-Provider: your-new-provider" \
  -H "X-LLM-Model: your-new-model" \
  -d '{"prompt": "Test message"}'
3. Hot-Reload Configuration (Advanced Edition)
curl -X POST http://localhost:8000/admin/reload_config \
  -H "X-Admin-API-Key: your-admin-key"

Extensible Framework Architecture

Core Design Principles

1. Provider Abstraction

All providers implement a common interface

class BaseLLMProvider:
    def generate(self, prompt: str, context: Dict) -> LLMResponse
    def is_available(self) -> bool
    def get_models(self) -> List[str]
    def estimate_cost(self, prompt: str, response: str) -> float
2. Unified Response Format

Consistent response structure across all providers

@dataclass
class LLMResponse:
    content: str
    model: str
    provider: str
    usage: Dict[str, int]
    latency_ms: int
    success: bool
3. Modular Architecture

Providers are automatically discovered and registered

providers:
  openai:
    enabled: true
    api_key: ${OPENAI_API_KEY}
  azure:
    enabled: true
    endpoint: ${AZURE_ENDPOINT}
  anthropic:
    enabled: true
    api_key: ${ANTHROPIC_API_KEY}

Framework Benefits

Seamless Provider Switching: Change providers via configuration without code changes
Multi-Provider Deployments: Route different workloads to optimal providers
Intelligent Failover: Automatic failover chains ensure high availability
Cost Optimization: Intelligent routing based on cost and performance
Easy Extension: Add new providers with minimal code (30-50 lines)

Adding New Models - Configuration Over Code

Most users never need to write code! Here's how to add new LLM models using our configuration-driven approach:

Choose Your Approach Based on Your Needs

Approach 1: Config Only (90%)
Easy

When to use: Adding new models from existing providers (OpenAI, Anthropic, Google, Azure, Ollama)

Time: 1 minute
Example scenarios:
  • OpenAI releases GPT-5
  • Anthropic adds Claude-4
  • New Ollama model available
Approach 2: Simple Mapping (8%)
Medium

When to use: OpenAI-compatible APIs with different endpoints

Time: 10 minutes
Example providers:
  • Together.ai
  • Replicate
  • Perplexity
  • Your company's custom endpoint
Approach 3: Custom Integration (2%)
Contact Us

When to use: Completely different API formats requiring custom integration

Contact our team for support
Example providers:
  • Cohere
  • AI21 Labs
  • Custom proprietary APIs

Approach 1: Config Only (90% of users)

1
Edit YAML Configuration

Just add the new model to your existing provider configuration

# config/sys_config.yaml
llm:
  provider: openai
  model: gpt-5                    # NEW - just change model name!
  
  openai:
    models: 
      - gpt-4                     # Existing
      - gpt-4-turbo              # Existing
      - gpt-5                     # NEW - add to list!
    # ... rest of config unchanged
2
Test Immediately

No restart required with hot-reload support

curl -X POST http://localhost:8000/chat \
  -H "X-LLM-Provider: openai" \
  -H "X-LLM-Model: gpt-5" \
  -d '{"prompt": "Hello from new model!"}'

Approach 2: OpenAI-Compatible (8% of users)

1
Add Provider Configuration

Configure the new provider with OpenAI-compatible mapping

# config/sys_config.yaml
llm:
  provider: perplexity
  model: sonar-medium-online

perplexity:
  api_key: ${PERPLEXITY_API_KEY}
  endpoint: https://api.perplexity.ai    # Different endpoint
  provider_type: openai_compatible       # Key mapping
  models:
    - sonar-medium-online
    - sonar-small-chat
2
Enable Provider (One-line change)

Add to compatible providers list

# config/provider_mappings.py
OPENAI_COMPATIBLE_PROVIDERS = [
    'together', 
    'replicate', 
    'perplexity'    # Just add this line!
]

Approach 3: Custom Integration (2% of users)

Only needed for completely different API formats. Contact our team for custom integration support.

Need a custom integration? Our team can help you integrate any LLM provider with different API formats. Contact us for enterprise support.

Real-World Success Stories

Enterprise Success
5 minutes

Company: Fortune 500 Financial Services

Need: Private GPT-4 deployment behind corporate firewall

Solution: Added custom endpoint configuration - no coding required!

Startup Speed
2 minutes

Company: AI Startup

Need: Switch from OpenAI to Together.ai for cost savings

Solution: Simple provider mapping - saved 80% on API costs!

Research Lab
1 minute

Organization: University AI Research Lab

Need: Test latest Claude-3.5-Sonnet model

Solution: Added to models list - immediate access to new capabilities!

Performance Benchmarks & Cost Analysis

Latency Comparison (Average Response Time)

Provider Model Avg Latency 95th Percentile Use Case
Azure OpenAI gpt-4 1,200ms 2,100ms Enterprise
OpenAI gpt-4 1,500ms 2,800ms General
Anthropic claude-3-sonnet 1,800ms 3,200ms Analysis
Google Gemini gemini-pro 1,100ms 2,000ms Balanced
Ollama mistral 800ms 1,200ms Local

Cost Comparison (per 1M tokens)

Provider Model Input Cost Output Cost Total (1:1 ratio)
OpenAI gpt-3.5-turbo $0.50 $1.50 $1.00
OpenAI gpt-4 $10.00 $30.00 $20.00
Azure OpenAI gpt-4 $10.00 $30.00 $20.00
Anthropic claude-3-sonnet $3.00 $15.00 $9.00
Google Gemini gemini-pro $2.50 $7.50 $5.00
Ollama mistral $0.00 $0.00 $0.00

Provider Selection Guidelines

Development

Use Ollama for cost-effective testing and rapid iteration

Production

Use Azure OpenAI for enterprise reliability and SLA guarantees

High Volume

Mix of providers for load distribution and cost optimization

Cost Sensitive

Use Gemini Pro or local models for budget constraints

Complex Reasoning

Use Claude-3-opus or GPT-4 for analytical tasks

Speed Critical

Use GPT-3.5-turbo or Gemini Pro for low latency needs

Request Lifecycle Architecture

Every request flows through our secure, optimized pipeline

1

Authentication

API key validation and organization resolution

2

Security Filters

PII protection, code detection, content classification

3

Rate Limiting

Group-based limits and priority queue management

4

Semantic Cache

Redis-powered caching for similar queries

5

LLM Routing

Provider selection and failover handling

6

Response

Caching, metrics, and audit trail completion

Documentation

Everything you need to get started, configure, and deploy Wag-Tail at scale

Getting Started with Wag-Tail

Your journey to production-ready AI gateway deployment

5-Minute Quick Start

Perfect for development, prototyping, and getting started quickly

1
Clone Repository
# Contact support.ai.gateway@wag-tail.com for source code access
cd wag-tail-ai-gateway
2
Setup Environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Success!

Wag-Tail is now running with full PII protection and security filtering.

System Requirements
Python 3.9+ (3.11+ recommended)
RAM 2GB minimum (4GB+ recommended)
CPU 2 cores minimum (4+ recommended)
Storage 5GB minimum (10GB+ recommended)

Verification & Testing

Health Check
curl http://localhost:8000/admin/health \
  -H "X-Admin-API-Key: your-admin-key"
Expected: {"status": "healthy", "version": "3.4.0", "edition": "basic"}
Security Test
curl -X POST http://localhost:8000/chat \
  -H "X-API-Key: b6c91d9d2ff66624356f5e5cfd03dc784d80a2eedd6af0d94e908d7b19e25e85" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "SELECT * FROM users; DROP TABLE users;"}'
Should be blocked: {"flag": "blocked", "reason": "SQL injection pattern detected"}

System Configuration

Comprehensive configuration guide for both OSS and Enterprise editions

Wag-Tail uses a hierarchical configuration system that supports YAML-based files, environment variable overrides, runtime updates, and edition-specific features with automatic capability detection.

YAML Configuration

Structured settings with clear hierarchy

Environment Overrides

Flexible deployment configuration

Runtime Updates

Dynamic configuration changes

Edition-Specific

Automatic capability detection

Configuration File Structure

config/
 sys_config.yaml           # Main configuration file
 integrations.yaml         # Integration settings
 security_config.yaml      # Security policies
 llm_providers.yaml        # LLM provider configurations
 environments/
     development.yaml      # Development overrides
     staging.yaml          # Staging environment settings
     production.yaml       # Production configuration

Core Configuration Sections

Basic Application Configuration
# Basic sys_config.yaml
edition: "enterprise"  # or "oss"
version: "1.0.0"
environment: "production"

app:
  name: "Wag-Tail AI Gateway"
  host: "0.0.0.0"
  port: 8000
  debug: false
  workers: 4
  max_request_size_mb: 10
  request_timeout: 300

database:
  type: "postgresql"  # sqlite, postgresql, mysql
  postgresql:
    host: "${DB_HOST:localhost}"
    port: "${DB_PORT:5432}"
    database: "${DB_NAME:wagtail}"
    username: "${DB_USER:wagtail}"
    password: "${DB_PASSWORD}"
    pool_size: 10

logging:
  level: "${LOG_LEVEL:INFO}"
  format: "json"
  file:
    enabled: true
    path: "logs/wagtail.log"
    max_size_mb: 100
Security Configuration
security:
  # API Authentication
  api_keys:
    enabled: true
    header_name: "X-API-Key"
    allow_query_param: false  # Security: disable for production
    default_key: "${DEFAULT_API_KEY}"
    
  # Rate limiting
  rate_limiting:
    enabled: true
    per_minute: 100
    per_hour: 1000
    per_day: 10000
    burst_limit: 20
    
  # Content filtering
  content_filtering:
    enabled: true
    block_code_execution: true
    block_sql_injection: true
    block_xss_attempts: true
    
  # PII protection
  pii_protection:
    enabled: true
    detection_confidence: 0.8
    anonymization_method: "mask"  # mask, replace, redact
    entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "SSN", "CREDIT_CARD"]
    
  # TLS/SSL settings
  tls:
    enabled: true
    cert_file: "${TLS_CERT_FILE:certs/server.crt}"
    key_file: "${TLS_KEY_FILE:certs/server.key}"
LLM Provider Configuration
llm:
  default_provider: "openai"
  default_model: "gpt-3.5-turbo"
  
  providers:
    ollama:
      enabled: true
      api_url: "${OLLAMA_URL:http://localhost:11434/api/generate}"
      models: ["mistral", "llama2", "codellama"]
      timeout: 60
      max_retries: 3
      
    openai:
      enabled: true
      api_key: "${OPENAI_API_KEY}"
      api_url: "https://api.openai.com/v1"
      models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
      timeout: 120
      max_tokens: 4000
      temperature: 0.7
      
    gemini:
      enabled: true
      api_key: "${GEMINI_API_KEY}"
      api_url: "https://generativelanguage.googleapis.com/v1"
      models: ["gemini-pro", "gemini-pro-vision"]
      timeout: 90
      
    azure:
      enabled: true
      api_key: "${AZURE_OPENAI_API_KEY}"
      api_url: "${AZURE_OPENAI_ENDPOINT}"
      api_version: "2023-12-01-preview"
      deployment_name: "${AZURE_DEPLOYMENT_NAME}"
Enterprise Features Configuration
# Redis configuration (Enterprise)
redis:
  enabled: true
  host: "${REDIS_HOST:localhost}"
  port: "${REDIS_PORT:6379}"
  password: "${REDIS_PASSWORD}"
  database: 0
  max_connections: 20

# Semantic caching (Enterprise)
caching:
  semantic:
    enabled: true
    provider: "redis"
    ttl: 3600  # seconds
    similarity_threshold: 0.85
    max_cache_size_mb: 1000
    
  response:
    enabled: true
    default_ttl: 300
    max_ttl: 86400

# Monitoring & observability
monitoring:
  metrics:
    enabled: true
    endpoint: "/metrics"
    format: "prometheus"
    
  tracing:
    enabled: true
    provider: "jaeger"
    endpoint: "${TRACING_ENDPOINT}"
    service_name: "wagtail-gateway"
    sample_rate: 0.1
    
  apm:
    enabled: true
    provider: "newrelic"
    license_key: "${APM_LICENSE_KEY}"

Environment Configuration

Development
# environments/development.yaml
app:
  debug: true
  reload: true
  workers: 1

logging:
  level: "DEBUG"
  console:
    colored: true

database:
  type: "sqlite"
  sqlite:
    path: "data/dev.db"

security:
  rate_limiting:
    enabled: false
  tls:
    enabled: false
Production
# environments/production.yaml
app:
  debug: false
  reload: false
  workers: 8
  
security:
  rate_limiting:
    enabled: true
    per_minute: 60
  tls:
    enabled: true
    verify_client: true
    
logging:
  level: "INFO"
  aggregation:
    enabled: true
    
monitoring:
  metrics:
    enabled: true
  tracing:
    enabled: true
  apm:
    enabled: true

Environment Variables

Application
WAGTAIL_ENVIRONMENT - Environment name
WAGTAIL_HOST - Bind host
WAGTAIL_PORT - Bind port
WAGTAIL_WORKERS - Worker processes
Database
DB_HOST - Database host
DB_PORT - Database port
DB_NAME - Database name
DB_USER - Database user
DB_PASSWORD - Database password
LLM APIs
OPENAI_API_KEY - OpenAI API key
GEMINI_API_KEY - Google Gemini key
AZURE_OPENAI_API_KEY - Azure OpenAI key
ANTHROPIC_API_KEY - Anthropic API key
Security
DEFAULT_API_KEY - Default API key
JWT_SECRET - JWT signing secret
TLS_CERT_FILE - TLS certificate
WEBHOOK_SECRET - Webhook secret

Configuration Loading Hierarchy

1
Default Values

Built-in defaults (lowest priority)

2
Base Config

sys_config.yaml file

3
Environment Files

environments/{env}.yaml

4
Integration Configs

Integration-specific settings

5
Environment Variables

Runtime overrides (highest priority)

Configuration Best Practices

Security
  • Use environment variables for secrets
  • Never commit secrets to version control
  • Implement configuration validation
  • Rotate secrets regularly
  • Use secure file permissions (600/640)
Performance
  • Cache configuration in memory
  • Use lazy loading for large configs
  • Optimize configuration parsing
  • Monitor configuration load times
  • Minimize configuration file size
Operations
  • Version control configuration files
  • Test changes in staging first
  • Implement rollback procedures
  • Document all configuration options
  • Use configuration templates
Testing
  • Validate configuration syntax
  • Test in multiple environments
  • Implement configuration test suites
  • Use configuration smoke tests
  • Check for drift detection

Configuration Troubleshooting

Validation Commands
# Check file permissions
ls -la config/sys_config.yaml

# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"

# Check environment variables
env | grep WAGTAIL

# Test database connectivity
python -c "import psycopg2; conn = psycopg2.connect(host='localhost', database='wagtail', user='wagtail', password='password'); print('Connected')"

# Debug configuration loading
python -c "from config_loader import load_configuration; print(load_configuration())"

Enterprise Reference Architecture

Wag-Tail AI Gateway is designed for flexible deployment across various infrastructure environments, from simple single-server deployments to complex multi-cloud, multi-region enterprise architectures.

Edge Layer

CDN WAF Load Balancer

API Gateway Layer

Nginx Plus Kong AWS API Gateway Azure APIM

AI Gateway Layer

Wag-Tail AI Gateway Token Management Semantic Cache LLM Routing

Security Layer

F5 Guardrails PII Detection Content Filtering Threat Detection

Data Layer

PostgreSQL Redis Cluster Object Storage

LLM Layer

OpenAI Anthropic Azure OpenAI 100+ Providers

Single Server

Perfect for development and small-scale deployments using Docker Compose

  • Docker containers
  • Nginx reverse proxy
  • Local PostgreSQL & Redis

Kubernetes

Enterprise-scale deployment with auto-scaling and high availability

  • Horizontal Pod Autoscaling
  • Service mesh integration
  • Cloud-native storage

Multi-Cloud

Global deployment across AWS, Azure, and GCP with API gateway integration

  • Regional deployments
  • Global load balancing
  • Cross-cloud replication

Single-Server Deployment

Ideal for development, testing, and small-scale production environments.

Architecture Components

Internet
Nginx
Wag-Tail
PostgreSQL/Redis

Docker Compose Configuration

# docker-compose.yml
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    container_name: wagtail_nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - wagtail
    restart: unless-stopped

  wagtail:
    image: wagtail/ai-gateway:latest
    container_name: wagtail_app
    environment:
      - WAGTAIL_ENVIRONMENT=production
      - DB_HOST=postgres
      - REDIS_HOST=redis
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./config:/app/config
      - ./logs:/app/logs
    depends_on:
      - postgres
      - redis
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  postgres:
    image: postgres:15-alpine
    container_name: wagtail_postgres
    environment:
      - POSTGRES_DB=wagtail
      - POSTGRES_USER=wagtail
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    container_name: wagtail_redis
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Nginx Configuration

# nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream wagtail_backend {
        server wagtail:8000;
    }

    server {
        listen 80;
        server_name your-domain.com;
        return 301 https://$server_name$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name your-domain.com;

        ssl_certificate /etc/nginx/ssl/server.crt;
        ssl_certificate_key /etc/nginx/ssl/server.key;

        location / {
            proxy_pass http://wagtail_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        location /health {
            proxy_pass http://wagtail_backend/health;
            access_log off;
        }
    }
}

Quick Start Commands

docker-compose up -d Start all services
docker-compose logs -f wagtail View application logs
docker-compose exec wagtail /app/healthcheck.sh Check application health

Kubernetes Deployment

Enterprise-scale deployment with auto-scaling, high availability, and cloud-native features.

Kubernetes Architecture

Ingress Layer
Nginx Ingress Cert Manager TLS Termination
Application Layer
Deployment Service HPA ConfigMap Secrets
Data Layer
PostgreSQL Cluster Redis Cluster Persistent Volumes
Monitoring Layer
Prometheus Grafana Jaeger AlertManager

Core Kubernetes Manifests

Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wagtail-gateway
  namespace: wagtail
  labels:
    app: wagtail-gateway
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: wagtail-gateway
  template:
    metadata:
      labels:
        app: wagtail-gateway
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: wagtail-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: wagtail
        image: wagtail/ai-gateway:v1.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 8000
          protocol: TCP
        env:
        - name: WAGTAIL_ENVIRONMENT
          value: "production"
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wagtail-secrets
              key: db-password
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wagtail-secrets
              key: redis-password
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
          readOnly: true
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config-volume
        configMap:
          name: wagtail-config
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: wagtail-hpa
  namespace: wagtail
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: wagtail-gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wagtail-ingress
  namespace: wagtail
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  tls:
  - hosts:
    - api.wagtail.ai
    secretName: wagtail-tls
  rules:
  - host: api.wagtail.ai
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: wagtail-service
            port:
              number: 80

Deployment Commands

kubectl apply -f k8s/ Deploy all manifests
kubectl get pods -n wagtail Check pod status
kubectl logs -f deployment/wagtail-gateway -n wagtail View application logs
kubectl port-forward svc/wagtail-service 8080:80 -n wagtail Local port forwarding

Multi-Cloud Deployment

Global deployment across AWS, Azure, and GCP with regional failover and API gateway integration.

Global Architecture

AWS US-East
  • EKS Cluster
  • RDS PostgreSQL
  • ElastiCache Redis
  • API Gateway
Azure EU-West
  • AKS Cluster
  • Azure Database
  • Azure Cache
  • API Management
GCP Asia-Pacific
  • GKE Cluster
  • Cloud SQL
  • Memorystore
  • Apigee
Global Services
CloudFlare DNS HashiCorp Vault Global Monitoring Cross-Region Backup

Terraform Infrastructure

# EKS Cluster
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  
  cluster_name    = "wagtail-cluster"
  cluster_version = "1.28"
  
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
  
  eks_managed_node_groups = {
    wagtail_nodes = {
      desired_size = 3
      max_size     = 10
      min_size     = 3
      
      instance_types = ["t3.large"]
      
      k8s_labels = {
        Environment = "production"
        Application = "wagtail"
      }
    }
  }
}

# RDS PostgreSQL
resource "aws_db_instance" "wagtail_db" {
  identifier = "wagtail-postgres"
  
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.r6g.large"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_encrypted     = true
  
  db_name  = "wagtail"
  username = "wagtail"
  password = var.db_password
  
  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.wagtail.name
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot = false
  final_snapshot_identifier = "wagtail-final-snapshot"
}

Kong API Gateway Integration

_format_version: "3.0"

services:
  - name: wagtail-gateway
    url: http://wagtail-service.wagtail.svc.cluster.local:80
    retries: 3
    connect_timeout: 10000
    read_timeout: 60000
    write_timeout: 60000

routes:
  - name: wagtail-chat
    service: wagtail-gateway
    paths:
      - /chat
    methods:
      - POST
    strip_path: false

plugins:
  # Rate limiting
  - name: rate-limiting
    service: wagtail-gateway
    config:
      minute: 100
      hour: 1000
      day: 10000
      policy: redis
      redis_host: redis-service.wagtail.svc.cluster.local

  # Authentication
  - name: key-auth
    service: wagtail-gateway
    config:
      key_names:
        - X-API-Key
      hide_credentials: true

  # CORS
  - name: cors
    service: wagtail-gateway
    config:
      origins:
        - "https://app.yourcompany.com"
      methods:
        - GET
        - POST
        - OPTIONS
      credentials: true
      max_age: 3600

Monitoring & Observability

Comprehensive monitoring, logging, and tracing for production deployments.

Monitoring Architecture

Metrics Collection
Prometheus Node Exporter cAdvisor Custom Metrics
Logging Pipeline
Fluentd Elasticsearch Logstash Kibana
Distributed Tracing
Jaeger Zipkin OpenTelemetry Collector
Visualization & Alerting
Grafana AlertManager PagerDuty

Prometheus Configuration

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "wagtail-rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'wagtail-gateway'
    static_configs:
      - targets: ['wagtail-service:8000']
    metrics_path: /metrics
    scrape_interval: 10s

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - wagtail
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

  - job_name: 'postgres-exporter'
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: 'redis-exporter'
    static_configs:
      - targets: ['redis-exporter:9121']

Key Metrics Dashboard

Request Rate
rate(wagtail_requests_total[5m])
Response Time
histogram_quantile(0.95, rate(wagtail_request_duration_seconds_bucket[5m]))
Error Rate
rate(wagtail_requests_total{status=~"4..|5.."}[5m])
LLM Response Times
wagtail_llm_request_duration_seconds

Alerting Rules

High Error Rate

Error rate > 5% for 5 minutes

High Response Time

95th percentile > 1s for 5 minutes

Pod Crash Loop

Pod restart count > 3 in 10 minutes

Database Connection Issues

Database connection pool exhausted

Security Architecture

Zero-trust security model with comprehensive protection layers.

Security Architecture Layers

Perimeter Security
  • CloudFlare DDoS Protection
  • Web Application Firewall
  • Rate Limiting
Identity & Access
  • OAuth 2.0 / OIDC
  • Multi-Factor Authentication
  • Role-Based Access Control
Network Security
  • Virtual Private Cloud
  • Private Subnets
  • Security Groups
Data Security
  • Encryption at Rest
  • Encryption in Transit
  • Secret Management

Istio Security Policies

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: wagtail-security-policy
  namespace: wagtail
spec:
  selector:
    matchLabels:
      app: wagtail-gateway
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/wagtail/sa/wagtail-service-account"]
  - to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/chat", "/health", "/metrics"]
  - when:
    - key: source.ip
      values: ["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]

---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: wagtail-mtls
  namespace: wagtail
spec:
  selector:
    matchLabels:
      app: wagtail-gateway
  mtls:
    mode: STRICT

Disaster Recovery & Backup

Daily Backup

Incremental PostgreSQL & Redis backups to S3

Weekly Full Backup

Complete system backup with configuration

Long-term Archive

Monthly backups archived to Glacier

Cross-Region DR

Standby environment in secondary region

Security Checklist