AI Gateway Built for
Production Scale
Wag-Tail is an enterprise AI gateway that fronts multiple LLM providers with advanced security, semantic caching, intelligent routing, and comprehensive rate limit monitoring. Built for enterprise-grade AI applications.
What is Wag-Tail AI Gateway?
Wag-Tail AI Gateway is a comprehensive, enterprise-grade security and routing layer for Large Language Model (LLM) applications. It sits between your applications and LLM providers, providing advanced security filtering, intelligent routing, performance optimization, and enterprise-grade observability.
Whether you're building customer-facing AI applications, internal tools, or enterprise AI platforms, Wag-Tail ensures your LLM interactions are secure, fast, and compliant while giving you complete control over costs, routing, and data governance.
Quick Start
Get started with Wag-Tail in under 5 minutes:
import requests
# Replace your direct OpenAI calls
response = requests.post(
"https://your-wagtail-gateway.com/chat",
headers={
"X-API-Key": "your-api-key",
"Content-Type": "application/json"
},
json={"prompt": "What is machine learning?"}
)
# Get secure, filtered, and optimized responses
result = response.json()
print(result["response"]) # AI response with security filtering applied
print(result["cache_hit"]) # True if served from semantic cache (30x faster)
That's it. No complex integrations, no infrastructure changes. Just point your existing LLM calls to Wag-Tail and get enterprise-grade security and performance immediately.
Why Wag-Tail?
Cost Optimization
Reduce your AI spending by up to 70% with intelligent cost controls:
- Token Management - Track and limit usage per org/group
- Semantic Caching - 49x faster responses, fewer API calls
- Prompt Compression - Reduce token consumption
- Smart Routing - Route to cost-effective providers
Security Out of the Box
Enterprise-grade protection with zero configuration:
- PII Detection & Masking - Auto-protect sensitive data
- Prompt Injection Defense - AI-powered threat detection
- Content Filtering - Block harmful inputs/outputs
- Compliance Ready - GDPR, HIPAA, SOC2 support
End-to-End Visibility
Complete observability across your AI infrastructure:
- Request Tracing - Full journey from request to response
- Usage Analytics - Real-time dashboards per org/group
- Audit Logging - Comprehensive trails for compliance
- Langfuse Integration - Deep LLM observability
Enterprise Integrations
Seamless connectivity with industry-leading platforms:
- F5 Distributed Cloud - AI Security guardrails
- HashiCorp Vault - Secure credential management
- Prometheus/Grafana - Metrics and monitoring
- Webhook Guardrails - Custom security policies
Architecture Overview
Your Applications
HTTP/HTTPS requests
Wag-Tail AI Gateway
- API Key Authentication
- Regex & Code Injection Filtering
- PII Detection & Masking
- AI Threat Classification (Advanced)
- Rate Limiting & Quotas (Advanced)
- Output Content Filtering
- Semantic Cache (Advanced) - 30x faster
- Priority Queue (Advanced) - Enterprise SLA
- Smart Routing & Failover
- Request/Response Logging
- Usage Analytics & Billing
- Langfuse Integration (Advanced)
- Webhook Events
LLM Providers
OpenAI, Azure, Gemini, Claude, Ollama
Performance Benchmarks
Semantic Cache Performance
Security Processing
Throughput Capacity
Complete Enterprise Platform
All the features you need to secure, optimize, and manage your AI infrastructure
Security
- PII Detection & Masking
- AI-Powered Threat Classification
- Prompt Injection Defense
- Content Filtering & Guardrails
- F5 Distributed Cloud Integration
- Compliance Ready (GDPR, HIPAA, SOC2)
Performance
- Semantic Caching (49x faster)
- Token Management & Compression
- Multi-Provider Routing & Failover
- Priority Queuing System
- 100+ LLM Provider Support
- Cost Optimization & Analytics
Enterprise
- Complete Admin Portal
- Real-time Monitoring Dashboards
- Multi-Tenant Group Management
- Vault-Managed Secrets
- Langfuse & Prometheus Integration
- SSO, Audit Logs & Custom SLAs
Real Stories, Real Results
See how teams like yours solve critical AI challenges with Wag-Tail
The $50K Surprise
Cost ControlWhat happened: A fintech startup's AI pilot went viral internally. Without usage limits, costs spiraled to $50K in one month.
With Wag-Tail: Token quotas, department-level budgets, and real-time alerts caught the spike at $5K. The team got predictable costs without killing innovation.
The Data Breach That Didn't Happen
Security & ComplianceWhat happened: An employee pasted customer SSNs into ChatGPT for data analysis. The audit team found hundreds of similar incidents.
With Wag-Tail: PII detection auto-masks sensitive data before it ever leaves your network. Complete audit trails prove compliance to regulators.
2AM on Black Friday
High AvailabilityWhat happened: OpenAI hit rate limits during peak shopping. Customer service chatbots went down, tickets piled up.
With Wag-Tail: Automatic failover to Azure OpenAI happened in milliseconds. Semantic caching served 40% of requests without hitting any API.
The 10-Second Wait
PerformanceWhat happened: Users complained about slow AI responses. Average latency was 8-10 seconds, killing user adoption.
With Wag-Tail: Semantic caching recognized similar questions and delivered cached responses in under 100ms. Users never noticed a difference from fresh responses.
Shadow AI
AI GovernanceWhat happened: IT discovered 47 different AI tools across departments, each with separate contracts, no oversight, and no security review.
With Wag-Tail: Single gateway for all AI access. Full visibility into who's using what, how much they're spending, and what data is being processed.
The Multi-Tenant Nightmare
SaaS ProvidersWhat happened: A SaaS company needed to offer AI features to 500+ customers, each with different usage tiers and data isolation requirements.
With Wag-Tail: Multi-tenant isolation with per-customer quotas, audit logs, and model routing. Customers get exactly what they paid for.
Ready to solve your AI challenges?
Enterprise Security
Advanced PII protection, code detection, and comprehensive audit trails for enterprise compliance.
Semantic Caching
Redis-powered semantic caching reduces costs and improves response times for similar queries.
Intelligent Routing
YAML-driven routing with health-based failover across multiple LLM providers.
Observability
Comprehensive monitoring with Langfuse integration, metrics, and distributed tracing.
Rate Limiting
Group-based rate limiting and usage tracking with priority queuing capabilities.
Enterprise Integrations
Seamless connectivity with F5, Vault, Langfuse, Prometheus, and other enterprise platforms.
LLM Provider Integration Framework
Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.
No Coding Required
90% of new model additions need only YAML configuration
1-Minute Setup
New models from existing providers in 1 minute
Hot-Reload Support
Configuration changes without restart needed
Automatic Fallback
Automatic error handling with fallback configurations
Configuration-Driven Provider Support
We ship with 5 major providers and carefully selected models, but you can easily add unlimited models through simple YAML configuration.
OpenAI (GPT Models)
Supported Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o
Use Cases: General purpose, code generation, content creation, complex reasoning
llm:
provider: openai
model: gpt-4
api_key: ${OPENAI_API_KEY}
api_url: https://api.openai.com/v1/chat/completions
openai:
temperature: 0.7
max_tokens: 2048
timeout: 30
Azure OpenAI (Enterprise GPT)
Supported Models: gpt-4, gpt-35-turbo, text-embedding-ada-002
Enterprise Benefits: Data residency, private network, SOC 2/HIPAA compliance
llm:
provider: azure
model: gpt-4
azure:
api_key: ${AZURE_OPENAI_KEY}
endpoint: ${AZURE_OPENAI_ENDPOINT}
deployment_name: ${AZURE_DEPLOYMENT_NAME}
api_version: "2023-12-01-preview"
Google Gemini (Multimodal)
Supported Models: gemini-pro, gemini-pro-vision, gemini-ultra
Strengths: Advanced reasoning, multimodal capabilities, built-in safety filters
llm:
provider: gemini
model: gemini-pro
gemini:
api_key: ${GOOGLE_API_KEY}
endpoint: https://generativelanguage.googleapis.com/v1
safety_settings:
harassment: "block_medium_and_above"
hate_speech: "block_medium_and_above"
Anthropic Claude (Safety-First)
Supported Models: claude-3-opus, claude-3-sonnet, claude-3-haiku
Features: Up to 200K tokens context, extensive safety training, ethical reasoning
llm:
provider: anthropic
model: claude-3-sonnet
anthropic:
api_key: ${ANTHROPIC_API_KEY}
endpoint: https://api.anthropic.com/v1/messages
max_tokens: 4000
system_message: "You are a helpful AI assistant."
Ollama (Local Models)
Supported Models: mistral, llama2, codellama, neural-chat, starcode
Benefits: Complete data privacy, no per-token charges, offline capability
llm:
provider: ollama
model: mistral
ollama:
api_url: http://localhost:11434/api/generate
timeout: 60
context_length: 4096
# Installation:
# brew install ollama
# ollama pull mistral
# ollama serve
Simple YAML Configuration = Unlimited Models
Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.
Which Approach Should I Use?
Config Only (90%)
New model from existing provider
Simple Mapping (8%)
OpenAI-compatible API
Custom Integration (2%)
Different API format
Real-World Configuration Examples
Adding New GPT Model (1 minute)
When OpenAI releases GPT-5, just add it to your YAML:
# config/sys_config.yaml - Just add to existing list!
llm:
provider: openai
model: gpt-5 # NEW - just change the model name!
openai:
api_key: ${OPENAI_API_KEY}
models:
- gpt-4 # Existing
- gpt-4-turbo # Existing
- gpt-3.5-turbo # Existing
- gpt-5 # NEW - just add to list!
timeout: 30
Adding Perplexity API (10 minutes)
Perplexity uses OpenAI-compatible format:
# config/sys_config.yaml - OpenAI-compatible provider
llm:
provider: perplexity
model: sonar-medium-online
perplexity:
api_key: ${PERPLEXITY_API_KEY}
endpoint: https://api.perplexity.ai # Different endpoint
provider_type: openai_compatible # Maps to OpenAI implementation
models:
- sonar-medium-online
- sonar-small-chat
Enterprise Custom Endpoint (5 minutes)
Your company's custom OpenAI deployment:
# config/sys_config.yaml - Custom enterprise endpoint
llm:
provider: custom_enterprise
model: custom-gpt-4-fine-tuned
custom_enterprise:
api_key: ${ENTERPRISE_API_KEY}
endpoint: https://llm.yourcompany.com/v1
provider_type: openai_compatible
models:
- custom-gpt-4-fine-tuned
- company-specific-model
Complete YAML Configuration Reference
Multi-provider configuration with failover chains:
# config/sys_config.yaml - Complete example with all providers
llm:
provider: openai # Default provider
model: gpt-4 # Default model
# OpenAI Configuration
openai:
api_key: ${OPENAI_API_KEY}
models: [gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o]
timeout: 30
# Anthropic Configuration
anthropic:
api_key: ${ANTHROPIC_API_KEY}
models: [claude-3-opus, claude-3-sonnet, claude-3-haiku]
timeout: 30
# Google Gemini Configuration
gemini:
api_key: ${GOOGLE_API_KEY}
models: [gemini-pro, gemini-pro-vision, gemini-ultra]
timeout: 30
# Azure OpenAI Configuration
azure:
api_key: ${AZURE_OPENAI_KEY}
endpoint: ${AZURE_OPENAI_ENDPOINT}
deployment_name: ${AZURE_DEPLOYMENT_NAME}
api_version: "2023-12-01-preview"
models: [gpt-4, gpt-35-turbo]
# Ollama (Local) Configuration
ollama:
api_url: http://localhost:11434/api/generate
models: [mistral, llama2, codellama, neural-chat]
# Together.ai (OpenAI-compatible)
together:
api_key: ${TOGETHER_API_KEY}
endpoint: https://api.together.xyz/inference
provider_type: openai_compatible
models: [meta-llama/Llama-2-70b-chat-hf, mistralai/Mistral-7B-Instruct-v0.1]
# Failover Configuration (Advanced Edition)
routing:
fallback_chain:
- provider: azure
model: gpt-4
- provider: openai
model: gpt-4
- provider: gemini
model: gemini-pro
- provider: ollama
model: mistral
Environment Variables Setup
# .env file - Set up your API keys export OPENAI_API_KEY="sk-your-openai-key" export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key" export GOOGLE_API_KEY="your-google-api-key" export AZURE_OPENAI_KEY="your-azure-key" export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com" export AZURE_DEPLOYMENT_NAME="your-deployment-name" export TOGETHER_API_KEY="your-together-key" export PERPLEXITY_API_KEY="your-perplexity-key"
Testing Your Configuration
1. Validate YAML Syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"
2. Test Provider Connectivity
curl -X POST http://localhost:8000/chat \
-H "X-API-Key: your-api-key" \
-H "X-LLM-Provider: your-new-provider" \
-H "X-LLM-Model: your-new-model" \
-d '{"prompt": "Test message"}'
3. Hot-Reload Configuration (Advanced Edition)
curl -X POST http://localhost:8000/admin/reload_config \ -H "X-Admin-API-Key: your-admin-key"
Extensible Framework Architecture
Core Design Principles
1. Provider Abstraction
All providers implement a common interface
class BaseLLMProvider:
def generate(self, prompt: str, context: Dict) -> LLMResponse
def is_available(self) -> bool
def get_models(self) -> List[str]
def estimate_cost(self, prompt: str, response: str) -> float
2. Unified Response Format
Consistent response structure across all providers
@dataclass
class LLMResponse:
content: str
model: str
provider: str
usage: Dict[str, int]
latency_ms: int
success: bool
3. Modular Architecture
Providers are automatically discovered and registered
providers:
openai:
enabled: true
api_key: ${OPENAI_API_KEY}
azure:
enabled: true
endpoint: ${AZURE_ENDPOINT}
anthropic:
enabled: true
api_key: ${ANTHROPIC_API_KEY}
Framework Benefits
Adding New Models - Configuration Over Code
Most users never need to write code! Here's how to add new LLM models using our configuration-driven approach:
Choose Your Approach Based on Your Needs
Approach 1: Config Only (90%)
EasyWhen to use: Adding new models from existing providers (OpenAI, Anthropic, Google, Azure, Ollama)
- OpenAI releases GPT-5
- Anthropic adds Claude-4
- New Ollama model available
Approach 2: Simple Mapping (8%)
MediumWhen to use: OpenAI-compatible APIs with different endpoints
- Together.ai
- Replicate
- Perplexity
- Your company's custom endpoint
Approach 3: Custom Integration (2%)
Contact UsWhen to use: Completely different API formats requiring custom integration
- Cohere
- AI21 Labs
- Custom proprietary APIs
Approach 1: Config Only (90% of users)
Edit YAML Configuration
Just add the new model to your existing provider configuration
# config/sys_config.yaml
llm:
provider: openai
model: gpt-5 # NEW - just change model name!
openai:
models:
- gpt-4 # Existing
- gpt-4-turbo # Existing
- gpt-5 # NEW - add to list!
# ... rest of config unchanged
Test Immediately
No restart required with hot-reload support
curl -X POST http://localhost:8000/chat \
-H "X-LLM-Provider: openai" \
-H "X-LLM-Model: gpt-5" \
-d '{"prompt": "Hello from new model!"}'
Approach 2: OpenAI-Compatible (8% of users)
Add Provider Configuration
Configure the new provider with OpenAI-compatible mapping
# config/sys_config.yaml
llm:
provider: perplexity
model: sonar-medium-online
perplexity:
api_key: ${PERPLEXITY_API_KEY}
endpoint: https://api.perplexity.ai # Different endpoint
provider_type: openai_compatible # Key mapping
models:
- sonar-medium-online
- sonar-small-chat
Enable Provider (One-line change)
Add to compatible providers list
# config/provider_mappings.py
OPENAI_COMPATIBLE_PROVIDERS = [
'together',
'replicate',
'perplexity' # Just add this line!
]
Approach 3: Custom Integration (2% of users)
Only needed for completely different API formats. Contact our team for custom integration support.
Need a custom integration? Our team can help you integrate any LLM provider with different API formats. Contact us for enterprise support.
Real-World Success Stories
Enterprise Success
5 minutesCompany: Fortune 500 Financial Services
Need: Private GPT-4 deployment behind corporate firewall
Solution: Added custom endpoint configuration - no coding required!
Startup Speed
2 minutesCompany: AI Startup
Need: Switch from OpenAI to Together.ai for cost savings
Solution: Simple provider mapping - saved 80% on API costs!
Research Lab
1 minuteOrganization: University AI Research Lab
Need: Test latest Claude-3.5-Sonnet model
Solution: Added to models list - immediate access to new capabilities!
Performance Benchmarks & Cost Analysis
Latency Comparison (Average Response Time)
Cost Comparison (per 1M tokens)
Provider Selection Guidelines
Development
Use Ollama for cost-effective testing and rapid iteration
Production
Use Azure OpenAI for enterprise reliability and SLA guarantees
High Volume
Mix of providers for load distribution and cost optimization
Cost Sensitive
Use Gemini Pro or local models for budget constraints
Complex Reasoning
Use Claude-3-opus or GPT-4 for analytical tasks
Speed Critical
Use GPT-3.5-turbo or Gemini Pro for low latency needs
Request Lifecycle Architecture
Every request flows through our secure, optimized pipeline
Authentication
API key validation and organization resolution
Security Filters
PII protection, code detection, content classification
Rate Limiting
Group-based limits and priority queue management
Semantic Cache
Redis-powered caching for similar queries
LLM Routing
Provider selection and failover handling
Response
Caching, metrics, and audit trail completion
Documentation
Everything you need to get started, configure, and deploy Wag-Tail at scale
Getting Started with Wag-Tail
Your journey to production-ready AI gateway deployment
5-Minute Quick Start
Perfect for development, prototyping, and getting started quickly
Clone Repository
# Contact support.ai.gateway@wag-tail.com for source code access
cd wag-tail-ai-gateway
Setup Environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
System Requirements
Verification & Testing
Health Check
curl http://localhost:8000/admin/health \
-H "X-Admin-API-Key: your-admin-key"
{"status": "healthy", "version": "3.4.0", "edition": "basic"}
Security Test
curl -X POST http://localhost:8000/chat \
-H "X-API-Key: b6c91d9d2ff66624356f5e5cfd03dc784d80a2eedd6af0d94e908d7b19e25e85" \
-H "Content-Type: application/json" \
-d '{"prompt": "SELECT * FROM users; DROP TABLE users;"}'
{"flag": "blocked", "reason": "SQL injection pattern detected"}
System Configuration
Comprehensive configuration guide for both OSS and Enterprise editions
Wag-Tail uses a hierarchical configuration system that supports YAML-based files, environment variable overrides, runtime updates, and edition-specific features with automatic capability detection.
YAML Configuration
Structured settings with clear hierarchy
Environment Overrides
Flexible deployment configuration
Runtime Updates
Dynamic configuration changes
Edition-Specific
Automatic capability detection
Configuration File Structure
config/
sys_config.yaml # Main configuration file
integrations.yaml # Integration settings
security_config.yaml # Security policies
llm_providers.yaml # LLM provider configurations
environments/
development.yaml # Development overrides
staging.yaml # Staging environment settings
production.yaml # Production configuration
Core Configuration Sections
Basic Application Configuration
# Basic sys_config.yaml
edition: "enterprise" # or "oss"
version: "1.0.0"
environment: "production"
app:
name: "Wag-Tail AI Gateway"
host: "0.0.0.0"
port: 8000
debug: false
workers: 4
max_request_size_mb: 10
request_timeout: 300
database:
type: "postgresql" # sqlite, postgresql, mysql
postgresql:
host: "${DB_HOST:localhost}"
port: "${DB_PORT:5432}"
database: "${DB_NAME:wagtail}"
username: "${DB_USER:wagtail}"
password: "${DB_PASSWORD}"
pool_size: 10
logging:
level: "${LOG_LEVEL:INFO}"
format: "json"
file:
enabled: true
path: "logs/wagtail.log"
max_size_mb: 100
Security Configuration
security:
# API Authentication
api_keys:
enabled: true
header_name: "X-API-Key"
allow_query_param: false # Security: disable for production
default_key: "${DEFAULT_API_KEY}"
# Rate limiting
rate_limiting:
enabled: true
per_minute: 100
per_hour: 1000
per_day: 10000
burst_limit: 20
# Content filtering
content_filtering:
enabled: true
block_code_execution: true
block_sql_injection: true
block_xss_attempts: true
# PII protection
pii_protection:
enabled: true
detection_confidence: 0.8
anonymization_method: "mask" # mask, replace, redact
entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "SSN", "CREDIT_CARD"]
# TLS/SSL settings
tls:
enabled: true
cert_file: "${TLS_CERT_FILE:certs/server.crt}"
key_file: "${TLS_KEY_FILE:certs/server.key}"
LLM Provider Configuration
llm:
default_provider: "openai"
default_model: "gpt-3.5-turbo"
providers:
ollama:
enabled: true
api_url: "${OLLAMA_URL:http://localhost:11434/api/generate}"
models: ["mistral", "llama2", "codellama"]
timeout: 60
max_retries: 3
openai:
enabled: true
api_key: "${OPENAI_API_KEY}"
api_url: "https://api.openai.com/v1"
models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
timeout: 120
max_tokens: 4000
temperature: 0.7
gemini:
enabled: true
api_key: "${GEMINI_API_KEY}"
api_url: "https://generativelanguage.googleapis.com/v1"
models: ["gemini-pro", "gemini-pro-vision"]
timeout: 90
azure:
enabled: true
api_key: "${AZURE_OPENAI_API_KEY}"
api_url: "${AZURE_OPENAI_ENDPOINT}"
api_version: "2023-12-01-preview"
deployment_name: "${AZURE_DEPLOYMENT_NAME}"
Enterprise Features Configuration
# Redis configuration (Enterprise)
redis:
enabled: true
host: "${REDIS_HOST:localhost}"
port: "${REDIS_PORT:6379}"
password: "${REDIS_PASSWORD}"
database: 0
max_connections: 20
# Semantic caching (Enterprise)
caching:
semantic:
enabled: true
provider: "redis"
ttl: 3600 # seconds
similarity_threshold: 0.85
max_cache_size_mb: 1000
response:
enabled: true
default_ttl: 300
max_ttl: 86400
# Monitoring & observability
monitoring:
metrics:
enabled: true
endpoint: "/metrics"
format: "prometheus"
tracing:
enabled: true
provider: "jaeger"
endpoint: "${TRACING_ENDPOINT}"
service_name: "wagtail-gateway"
sample_rate: 0.1
apm:
enabled: true
provider: "newrelic"
license_key: "${APM_LICENSE_KEY}"
Environment Configuration
Development
# environments/development.yaml
app:
debug: true
reload: true
workers: 1
logging:
level: "DEBUG"
console:
colored: true
database:
type: "sqlite"
sqlite:
path: "data/dev.db"
security:
rate_limiting:
enabled: false
tls:
enabled: false
Production
# environments/production.yaml
app:
debug: false
reload: false
workers: 8
security:
rate_limiting:
enabled: true
per_minute: 60
tls:
enabled: true
verify_client: true
logging:
level: "INFO"
aggregation:
enabled: true
monitoring:
metrics:
enabled: true
tracing:
enabled: true
apm:
enabled: true
Environment Variables
Application
WAGTAIL_ENVIRONMENT - Environment nameWAGTAIL_HOST - Bind hostWAGTAIL_PORT - Bind portWAGTAIL_WORKERS - Worker processesDatabase
DB_HOST - Database hostDB_PORT - Database portDB_NAME - Database nameDB_USER - Database userDB_PASSWORD - Database passwordLLM APIs
OPENAI_API_KEY - OpenAI API keyGEMINI_API_KEY - Google Gemini keyAZURE_OPENAI_API_KEY - Azure OpenAI keyANTHROPIC_API_KEY - Anthropic API keySecurity
DEFAULT_API_KEY - Default API keyJWT_SECRET - JWT signing secretTLS_CERT_FILE - TLS certificateWEBHOOK_SECRET - Webhook secretConfiguration Loading Hierarchy
Default Values
Built-in defaults (lowest priority)
Base Config
sys_config.yaml file
Environment Files
environments/{env}.yaml
Integration Configs
Integration-specific settings
Environment Variables
Runtime overrides (highest priority)
Configuration Best Practices
Security
- Use environment variables for secrets
- Never commit secrets to version control
- Implement configuration validation
- Rotate secrets regularly
- Use secure file permissions (600/640)
Performance
- Cache configuration in memory
- Use lazy loading for large configs
- Optimize configuration parsing
- Monitor configuration load times
- Minimize configuration file size
Operations
- Version control configuration files
- Test changes in staging first
- Implement rollback procedures
- Document all configuration options
- Use configuration templates
Testing
- Validate configuration syntax
- Test in multiple environments
- Implement configuration test suites
- Use configuration smoke tests
- Check for drift detection
Configuration Troubleshooting
Validation Commands
# Check file permissions
ls -la config/sys_config.yaml
# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"
# Check environment variables
env | grep WAGTAIL
# Test database connectivity
python -c "import psycopg2; conn = psycopg2.connect(host='localhost', database='wagtail', user='wagtail', password='password'); print('Connected')"
# Debug configuration loading
python -c "from config_loader import load_configuration; print(load_configuration())"
Enterprise Reference Architecture
Wag-Tail AI Gateway is designed for flexible deployment across various infrastructure environments, from simple single-server deployments to complex multi-cloud, multi-region enterprise architectures.
Edge Layer
API Gateway Layer
AI Gateway Layer
Security Layer
Data Layer
LLM Layer
Single Server
Perfect for development and small-scale deployments using Docker Compose
- Docker containers
- Nginx reverse proxy
- Local PostgreSQL & Redis
Kubernetes
Enterprise-scale deployment with auto-scaling and high availability
- Horizontal Pod Autoscaling
- Service mesh integration
- Cloud-native storage
Multi-Cloud
Global deployment across AWS, Azure, and GCP with API gateway integration
- Regional deployments
- Global load balancing
- Cross-cloud replication
Single-Server Deployment
Ideal for development, testing, and small-scale production environments.
Architecture Components
Docker Compose Configuration
# docker-compose.yml
version: '3.8'
services:
nginx:
image: nginx:alpine
container_name: wagtail_nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- wagtail
restart: unless-stopped
wagtail:
image: wagtail/ai-gateway:latest
container_name: wagtail_app
environment:
- WAGTAIL_ENVIRONMENT=production
- DB_HOST=postgres
- REDIS_HOST=redis
- OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
- ./config:/app/config
- ./logs:/app/logs
depends_on:
- postgres
- redis
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
postgres:
image: postgres:15-alpine
container_name: wagtail_postgres
environment:
- POSTGRES_DB=wagtail
- POSTGRES_USER=wagtail
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
restart: unless-stopped
redis:
image: redis:7-alpine
container_name: wagtail_redis
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_data:/data
restart: unless-stopped
volumes:
postgres_data:
redis_data:
Nginx Configuration
# nginx.conf
events {
worker_connections 1024;
}
http {
upstream wagtail_backend {
server wagtail:8000;
}
server {
listen 80;
server_name your-domain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/ssl/server.key;
location / {
proxy_pass http://wagtail_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /health {
proxy_pass http://wagtail_backend/health;
access_log off;
}
}
}
Quick Start Commands
docker-compose up -d
Start all services
docker-compose logs -f wagtail
View application logs
docker-compose exec wagtail /app/healthcheck.sh
Check application health
Kubernetes Deployment
Enterprise-scale deployment with auto-scaling, high availability, and cloud-native features.
Kubernetes Architecture
Ingress Layer
Nginx Ingress Cert Manager TLS TerminationApplication Layer
Deployment Service HPA ConfigMap SecretsData Layer
PostgreSQL Cluster Redis Cluster Persistent VolumesMonitoring Layer
Prometheus Grafana Jaeger AlertManagerCore Kubernetes Manifests
Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: wagtail-gateway
namespace: wagtail
labels:
app: wagtail-gateway
version: v1.0.0
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: wagtail-gateway
template:
metadata:
labels:
app: wagtail-gateway
version: v1.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: wagtail-service-account
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: wagtail
image: wagtail/ai-gateway:v1.0.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8000
protocol: TCP
env:
- name: WAGTAIL_ENVIRONMENT
value: "production"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: wagtail-secrets
key: db-password
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: wagtail-secrets
key: redis-password
volumeMounts:
- name: config-volume
mountPath: /app/config
readOnly: true
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config-volume
configMap:
name: wagtail-config
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: wagtail-hpa
namespace: wagtail
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: wagtail-gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: wagtail-ingress
namespace: wagtail
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
tls:
- hosts:
- api.wagtail.ai
secretName: wagtail-tls
rules:
- host: api.wagtail.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: wagtail-service
port:
number: 80
Deployment Commands
kubectl apply -f k8s/
Deploy all manifests
kubectl get pods -n wagtail
Check pod status
kubectl logs -f deployment/wagtail-gateway -n wagtail
View application logs
kubectl port-forward svc/wagtail-service 8080:80 -n wagtail
Local port forwarding
Multi-Cloud Deployment
Global deployment across AWS, Azure, and GCP with regional failover and API gateway integration.
Global Architecture
AWS US-East
- EKS Cluster
- RDS PostgreSQL
- ElastiCache Redis
- API Gateway
Azure EU-West
- AKS Cluster
- Azure Database
- Azure Cache
- API Management
GCP Asia-Pacific
- GKE Cluster
- Cloud SQL
- Memorystore
- Apigee
Global Services
Terraform Infrastructure
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "wagtail-cluster"
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
wagtail_nodes = {
desired_size = 3
max_size = 10
min_size = 3
instance_types = ["t3.large"]
k8s_labels = {
Environment = "production"
Application = "wagtail"
}
}
}
}
# RDS PostgreSQL
resource "aws_db_instance" "wagtail_db" {
identifier = "wagtail-postgres"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.r6g.large"
allocated_storage = 100
max_allocated_storage = 1000
storage_encrypted = true
db_name = "wagtail"
username = "wagtail"
password = var.db_password
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.wagtail.name
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
skip_final_snapshot = false
final_snapshot_identifier = "wagtail-final-snapshot"
}
Kong API Gateway Integration
_format_version: "3.0"
services:
- name: wagtail-gateway
url: http://wagtail-service.wagtail.svc.cluster.local:80
retries: 3
connect_timeout: 10000
read_timeout: 60000
write_timeout: 60000
routes:
- name: wagtail-chat
service: wagtail-gateway
paths:
- /chat
methods:
- POST
strip_path: false
plugins:
# Rate limiting
- name: rate-limiting
service: wagtail-gateway
config:
minute: 100
hour: 1000
day: 10000
policy: redis
redis_host: redis-service.wagtail.svc.cluster.local
# Authentication
- name: key-auth
service: wagtail-gateway
config:
key_names:
- X-API-Key
hide_credentials: true
# CORS
- name: cors
service: wagtail-gateway
config:
origins:
- "https://app.yourcompany.com"
methods:
- GET
- POST
- OPTIONS
credentials: true
max_age: 3600
Monitoring & Observability
Comprehensive monitoring, logging, and tracing for production deployments.
Monitoring Architecture
Metrics Collection
Prometheus Node Exporter cAdvisor Custom MetricsLogging Pipeline
Fluentd Elasticsearch Logstash KibanaDistributed Tracing
Jaeger Zipkin OpenTelemetry CollectorVisualization & Alerting
Grafana AlertManager PagerDutyPrometheus Configuration
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "wagtail-rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'wagtail-gateway'
static_configs:
- targets: ['wagtail-service:8000']
metrics_path: /metrics
scrape_interval: 10s
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- wagtail
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- job_name: 'postgres-exporter'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'redis-exporter'
static_configs:
- targets: ['redis-exporter:9121']
Key Metrics Dashboard
Request Rate
rate(wagtail_requests_total[5m])
Response Time
histogram_quantile(0.95, rate(wagtail_request_duration_seconds_bucket[5m]))
Error Rate
rate(wagtail_requests_total{status=~"4..|5.."}[5m])
LLM Response Times
wagtail_llm_request_duration_seconds
Alerting Rules
Error rate > 5% for 5 minutes
95th percentile > 1s for 5 minutes
Pod restart count > 3 in 10 minutes
Database connection pool exhausted
Security Architecture
Zero-trust security model with comprehensive protection layers.
Security Architecture Layers
Perimeter Security
- CloudFlare DDoS Protection
- Web Application Firewall
- Rate Limiting
Identity & Access
- OAuth 2.0 / OIDC
- Multi-Factor Authentication
- Role-Based Access Control
Network Security
- Virtual Private Cloud
- Private Subnets
- Security Groups
Data Security
- Encryption at Rest
- Encryption in Transit
- Secret Management
Istio Security Policies
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: wagtail-security-policy
namespace: wagtail
spec:
selector:
matchLabels:
app: wagtail-gateway
rules:
- from:
- source:
principals: ["cluster.local/ns/wagtail/sa/wagtail-service-account"]
- to:
- operation:
methods: ["GET", "POST"]
paths: ["/chat", "/health", "/metrics"]
- when:
- key: source.ip
values: ["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: wagtail-mtls
namespace: wagtail
spec:
selector:
matchLabels:
app: wagtail-gateway
mtls:
mode: STRICT
Disaster Recovery & Backup
Daily Backup
Incremental PostgreSQL & Redis backups to S3
Weekly Full Backup
Complete system backup with configuration
Long-term Archive
Monthly backups archived to Glacier
Cross-Region DR
Standby environment in secondary region