AI Gateway Built for
Production Scale

Wag-Tail is a FastAPI gateway that fronts multiple LLM providers with advanced security, semantic caching, intelligent routing, and comprehensive rate limit monitoring. Built for enterprise-grade AI applications.

100+ LLM Providers
99.9% Uptime
50ms Avg Latency
Request
Security
Cache
LLM
Response

What is Wag-Tail AI Gateway?

Wag-Tail AI Gateway is a comprehensive, enterprise-grade security and routing layer for Large Language Model (LLM) applications. It sits between your applications and LLM providers, providing advanced security filtering, intelligent routing, performance optimization, and enterprise-grade observability.

Whether you're building customer-facing AI applications, internal tools, or enterprise AI platforms, Wag-Tail ensures your LLM interactions are secure, fast, and compliant while giving you complete control over costs, routing, and data governance.

Quick Start

Get started with Wag-Tail in under 5 minutes:

import requests

# Replace your direct OpenAI calls
response = requests.post(
    "https://your-wagtail-gateway.com/chat",
    headers={
        "X-API-Key": "your-api-key",
        "Content-Type": "application/json"
    },
    json={"prompt": "What is machine learning?"}
)

# Get secure, filtered, and optimized responses
result = response.json()
print(result["response"])  # AI response with security filtering applied
print(result["cache_hit"])  # True if served from semantic cache (30x faster)

That's it. No complex integrations, no infrastructure changes. Just point your existing LLM calls to Wag-Tail and get enterprise-grade security and performance immediately.

Core Problems Wag-Tail Solves

Security & Compliance Challenges

The Problem: Direct LLM API calls expose your applications to:

  • Prompt injection attacks
  • Data leakage and PII exposure
  • Malicious content generation
  • Compliance violations (GDPR, HIPAA, SOX)

Wag-Tail Solution: Multi-layer security pipeline with:

  • AI-powered threat detection using DistilBERT classification
  • PII detection and masking with custom recognizers
  • SQL injection and XSS protection with pattern-based filtering
  • Output sanitization preventing harmful content generation
  • Real-time security monitoring with webhook integrations

Performance & Cost Optimization

The Problem: LLM calls are:

  • Expensive ($0.01-$0.30+ per request)
  • Slow (2-10 second response times)
  • Inconsistent (provider outages, rate limits)

Wag-Tail Solution: Intelligent optimization with:

  • Semantic caching delivering 30x+ performance improvements
  • Multi-provider routing with automatic failover
  • Cost-optimized model selection based on prompt complexity
  • Request prioritization for enterprise customers

Enterprise Requirements

The Problem: Production LLM deployments need:

  • Centralized governance and control
  • Detailed usage analytics and billing
  • Multi-tenant isolation
  • Audit trails and compliance reporting

Wag-Tail Solution: Enterprise-grade platform with:

  • Multi-organization isolation with per-tenant quotas
  • Comprehensive audit logging for compliance teams
  • Real-time usage analytics for cost management
  • Vault integration for secure credential management
  • Role-based access control with group-level permissions

Architecture Overview

Your Applications

HTTP/HTTPS requests

Wag-Tail AI Gateway

Security Pipeline (6 Layers)
  • API Key Authentication
  • Regex & Code Injection Filtering
  • PII Detection & Masking
  • AI Threat Classification (Advanced)
  • Rate Limiting & Quotas (Advanced)
  • Output Content Filtering
Performance Layer
  • Semantic Cache (Advanced) - 30x faster
  • Priority Queue (Advanced) - Enterprise SLA
  • Smart Routing & Failover
Observability Layer
  • Request/Response Logging
  • Usage Analytics & Billing
  • Langfuse Integration (Advanced)
  • Webhook Events

LLM Providers

OpenAI, Azure, Gemini, Claude, Ollama

Performance Benchmarks

Semantic Cache Performance

First Request 2,847ms
Cached Request 58ms
Improvement 49x faster

Security Processing

Basic Pipeline 5-15ms
AI Classification 20-50ms
Total Overhead <100ms

Throughput Capacity

Basic Edition 1,000+ req/s
Advanced Edition 5,000+ req/s
Enterprise Cluster 50,000+ req/s

Three Editions, Unlimited Possibilities

Choose between Basic OSS, Advanced Licensed, or Enterprise editions based on your needs

Basic Edition

OPEN SOURCE
  • PII Protection & Filtering
  • Code Detection & Security
  • Multi-Provider Support (11+)
  • Basic Authentication
  • Plugin Architecture
  • Production Ready

Advanced Edition

LICENSED
  • Everything in Basic +
  • AI Classification & Routing
  • Priority Queuing System
  • Semantic Caching (Redis)
  • Multi-Provider Failover
  • Webhook Guardrail Integration
  • Langfuse Integration
  • Vault-Managed Secrets
  • Group Management
  • Admin API Access

Enterprise Edition

ENTERPRISE
  • Everything in Advanced +
  • Complete Admin Portal
  • Cost Management & Analytics
  • Enterprise SSO (SAML/OIDC)
  • Custom Branding
  • Compliance Reporting
  • Real-time Dashboards
  • White-label Options
  • 24/7 Dedicated Support
  • Custom SLAs
  • Enterprise Admin Portal

Enterprise Security

Advanced PII protection, code detection, and comprehensive audit trails for enterprise compliance.

Semantic Caching

Redis-powered semantic caching reduces costs and improves response times for similar queries.

Intelligent Routing

YAML-driven routing with health-based failover across multiple LLM providers.

Observability

Comprehensive monitoring with Langfuse integration, metrics, and distributed tracing.

Rate Limiting

Group-based rate limiting and usage tracking with priority queuing capabilities.

Plugin Pipeline

Extensible plugin architecture for custom security, processing, and integration needs.

LLM Provider Integration Framework

Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.

No Coding Required

90% of new model additions need only YAML configuration

1-Minute Setup

New models from existing providers in 1 minute

Hot-Reload Support

Configuration changes without restart needed

Automatic Fallback

Automatic error handling with fallback configurations

Configuration-Driven Provider Support

We ship with 5 major providers and carefully selected models, but you can easily add unlimited models through simple YAML configuration.

90%
Config Only (1 minute)
New models from existing providers
8%
Simple Mapping (10 minutes)
OpenAI-compatible APIs
2%
Full Plugin (2+ hours)
Completely new API formats

OpenAI (GPT Models)

Streaming Multimodal

Supported Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o

Use Cases: General purpose, code generation, content creation, complex reasoning

llm:
  provider: openai
  model: gpt-4
  api_key: ${OPENAI_API_KEY}
  api_url: https://api.openai.com/v1/chat/completions
  
  openai:
    temperature: 0.7
    max_tokens: 2048
    timeout: 30

Azure OpenAI (Enterprise GPT)

Enterprise Streaming

Supported Models: gpt-4, gpt-35-turbo, text-embedding-ada-002

Enterprise Benefits: Data residency, private network, SOC 2/HIPAA compliance

llm:
  provider: azure
  model: gpt-4
  azure:
    api_key: ${AZURE_OPENAI_KEY}
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    deployment_name: ${AZURE_DEPLOYMENT_NAME}
    api_version: "2023-12-01-preview"

Google Gemini (Multimodal)

Multimodal Cost-Effective

Supported Models: gemini-pro, gemini-pro-vision, gemini-ultra

Strengths: Advanced reasoning, multimodal capabilities, built-in safety filters

llm:
  provider: gemini
  model: gemini-pro
  gemini:
    api_key: ${GOOGLE_API_KEY}
    endpoint: https://generativelanguage.googleapis.com/v1
    safety_settings:
      harassment: "block_medium_and_above"
      hate_speech: "block_medium_and_above"

Anthropic Claude (Safety-First)

Constitutional AI Long Context

Supported Models: claude-3-opus, claude-3-sonnet, claude-3-haiku

Features: Up to 200K tokens context, extensive safety training, ethical reasoning

llm:
  provider: anthropic
  model: claude-3-sonnet
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    endpoint: https://api.anthropic.com/v1/messages
    max_tokens: 4000
    system_message: "You are a helpful AI assistant."

Ollama (Local Models)

Free Privacy

Supported Models: mistral, llama2, codellama, neural-chat, starcode

Benefits: Complete data privacy, no per-token charges, offline capability

llm:
  provider: ollama
  model: mistral
  ollama:
    api_url: http://localhost:11434/api/generate
    timeout: 60
    context_length: 4096

# Installation:
# brew install ollama
# ollama pull mistral
# ollama serve

Advanced Provider Features

Seamless Provider Switching

Switch providers without changing application code

# Development
llm:
  provider: ollama
  model: mistral

# Production  
llm:
  provider: azure
  model: gpt-4

Multi-Provider Routing

Route different request types to optimal providers

routing_rules:
  - condition: "request_type == 'code'"
    provider: "anthropic"
    model: "claude-3-opus"
  - condition: "cost_sensitive == true"
    provider: "ollama"
    model: "mistral"

Intelligent Failover

Automatic failover ensures high availability

fallback_chain:
  - provider: "azure"     # Primary
  - provider: "openai"    # Backup
  - provider: "gemini"    # Secondary
  - provider: "ollama"    # Last resort

Header-Based Selection

Override provider and model via HTTP headers

curl -X POST /chat \
  -H "X-LLM-Provider: openai" \
  -H "X-LLM-Model: gpt-4" \
  -d '{"prompt": "Hello!"}'

Simple YAML Configuration = Unlimited Models

Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.

Which Approach Should I Use?

1
Config Only (90%)
1 minute

New model from existing provider

Examples: GPT-5, Claude-4, Gemini-Ultra-2
2
Simple Mapping (8%)
10 minutes

OpenAI-compatible API

Examples: Together.ai, Replicate, Perplexity
3
Full Plugin (2%)
2+ hours

Different API format

Examples: Cohere, AI21 Labs

Real-World Configuration Examples

Adding New GPT Model (1 minute)

When OpenAI releases GPT-5, just add it to your YAML:

# config/sys_config.yaml - Just add to existing list!
llm:
  provider: openai
  model: gpt-5                    # NEW - just change the model name!
  
  openai:
    api_key: ${OPENAI_API_KEY}
    models: 
      - gpt-4                     # Existing
      - gpt-4-turbo              # Existing  
      - gpt-3.5-turbo            # Existing
      - gpt-5                     # NEW - just add to list!
    timeout: 30
Adding Perplexity API (10 minutes)

Perplexity uses OpenAI-compatible format:

# config/sys_config.yaml - OpenAI-compatible provider
llm:
  provider: perplexity
  model: sonar-medium-online

perplexity:
  api_key: ${PERPLEXITY_API_KEY}
  endpoint: https://api.perplexity.ai    # Different endpoint
  provider_type: openai_compatible       # Maps to OpenAI implementation
  models:
    - sonar-medium-online
    - sonar-small-chat
Enterprise Custom Endpoint (5 minutes)

Your company's custom OpenAI deployment:

# config/sys_config.yaml - Custom enterprise endpoint
llm:
  provider: custom_enterprise
  model: custom-gpt-4-fine-tuned

custom_enterprise:
  api_key: ${ENTERPRISE_API_KEY}
  endpoint: https://llm.yourcompany.com/v1
  provider_type: openai_compatible
  models:
    - custom-gpt-4-fine-tuned
    - company-specific-model

Complete YAML Configuration Reference

Multi-provider configuration with failover chains:

# config/sys_config.yaml - Complete example with all providers
llm:
  provider: openai                # Default provider
  model: gpt-4                    # Default model
  
  # OpenAI Configuration
  openai:
    api_key: ${OPENAI_API_KEY}
    models: [gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o]
    timeout: 30
    
  # Anthropic Configuration  
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    models: [claude-3-opus, claude-3-sonnet, claude-3-haiku]
    timeout: 30
    
  # Google Gemini Configuration
  gemini:
    api_key: ${GOOGLE_API_KEY}
    models: [gemini-pro, gemini-pro-vision, gemini-ultra]
    timeout: 30
    
  # Azure OpenAI Configuration
  azure:
    api_key: ${AZURE_OPENAI_KEY}
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    deployment_name: ${AZURE_DEPLOYMENT_NAME}
    api_version: "2023-12-01-preview"
    models: [gpt-4, gpt-35-turbo]
      
  # Ollama (Local) Configuration
  ollama:
    api_url: http://localhost:11434/api/generate
    models: [mistral, llama2, codellama, neural-chat]
      
  # Together.ai (OpenAI-compatible)
  together:
    api_key: ${TOGETHER_API_KEY}
    endpoint: https://api.together.xyz/inference
    provider_type: openai_compatible
    models: [meta-llama/Llama-2-70b-chat-hf, mistralai/Mistral-7B-Instruct-v0.1]

# Failover Configuration (Advanced Edition)
routing:
  fallback_chain:
    - provider: azure
      model: gpt-4
    - provider: openai  
      model: gpt-4
    - provider: gemini
      model: gemini-pro
    - provider: ollama
      model: mistral

Environment Variables Setup

# .env file - Set up your API keys
export OPENAI_API_KEY="sk-your-openai-key"
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key" 
export GOOGLE_API_KEY="your-google-api-key"
export AZURE_OPENAI_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_DEPLOYMENT_NAME="your-deployment-name"
export TOGETHER_API_KEY="your-together-key"
export PERPLEXITY_API_KEY="your-perplexity-key"

Testing Your Configuration

1. Validate YAML Syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"
2. Test Provider Connectivity
curl -X POST http://localhost:8000/chat \
  -H "X-API-Key: your-api-key" \
  -H "X-LLM-Provider: your-new-provider" \
  -H "X-LLM-Model: your-new-model" \
  -d '{"prompt": "Test message"}'
3. Hot-Reload Configuration (Advanced Edition)
curl -X POST http://localhost:8000/admin/reload_config \
  -H "X-Admin-API-Key: your-admin-key"

Extensible Framework Architecture

Core Design Principles

1. Provider Abstraction

All providers implement a common interface

class BaseLLMProvider:
    def generate(self, prompt: str, context: Dict) -> LLMResponse
    def is_available(self) -> bool
    def get_models(self) -> List[str]
    def estimate_cost(self, prompt: str, response: str) -> float
2. Unified Response Format

Consistent response structure across all providers

@dataclass
class LLMResponse:
    content: str
    model: str
    provider: str
    usage: Dict[str, int]
    latency_ms: int
    success: bool
3. Plugin-Based Architecture

Providers are automatically discovered and registered

entry_points={
    'wag_tail_llm_providers': [
        'openai = wag_tail_llm_openai:OpenAIProvider',
        'azure = wag_tail_llm_azure:AzureProvider',
        'custom = wag_tail_llm_custom:CustomProvider',
    ],
}

Framework Benefits

Seamless Provider Switching: Change providers via configuration without code changes
Multi-Provider Deployments: Route different workloads to optimal providers
Intelligent Failover: Automatic failover chains ensure high availability
Cost Optimization: Intelligent routing based on cost and performance
Easy Extension: Add new providers with minimal code (30-50 lines)

Adding New Models - Configuration Over Code

Most users never need to write code! Here's how to add new LLM models using our configuration-driven approach:

Choose Your Approach Based on Your Needs

Approach 1: Config Only (90%)
Easy

When to use: Adding new models from existing providers (OpenAI, Anthropic, Google, Azure, Ollama)

Time: 1 minute
Example scenarios:
  • OpenAI releases GPT-5
  • Anthropic adds Claude-4
  • New Ollama model available
Approach 2: Simple Mapping (8%)
Medium

When to use: OpenAI-compatible APIs with different endpoints

Time: 10 minutes
Example providers:
  • Together.ai
  • Replicate
  • Perplexity
  • Your company's custom endpoint
Approach 3: Full Plugin (2%)
Advanced

When to use: Completely different API formats requiring custom code

Time: 2+ hours
Example providers:
  • Cohere
  • AI21 Labs
  • Custom proprietary APIs

Approach 1: Config Only (90% of users)

1
Edit YAML Configuration

Just add the new model to your existing provider configuration

# config/sys_config.yaml
llm:
  provider: openai
  model: gpt-5                    # NEW - just change model name!
  
  openai:
    models: 
      - gpt-4                     # Existing
      - gpt-4-turbo              # Existing
      - gpt-5                     # NEW - add to list!
    # ... rest of config unchanged
2
Test Immediately

No restart required with hot-reload support

curl -X POST http://localhost:8000/chat \
  -H "X-LLM-Provider: openai" \
  -H "X-LLM-Model: gpt-5" \
  -d '{"prompt": "Hello from new model!"}'

Approach 2: OpenAI-Compatible (8% of users)

1
Add Provider Configuration

Configure the new provider with OpenAI-compatible mapping

# config/sys_config.yaml
llm:
  provider: perplexity
  model: sonar-medium-online

perplexity:
  api_key: ${PERPLEXITY_API_KEY}
  endpoint: https://api.perplexity.ai    # Different endpoint
  provider_type: openai_compatible       # Key mapping
  models:
    - sonar-medium-online
    - sonar-small-chat
2
Enable Provider (One-line change)

Add to compatible providers list

# config/provider_mappings.py
OPENAI_COMPATIBLE_PROVIDERS = [
    'together', 
    'replicate', 
    'perplexity'    # Just add this line!
]

Approach 3: Full Plugin Development (2% of users)

Only needed for completely different API formats. See our Plugin Development Guide for detailed instructions.

1. Create Provider Class: Extend BaseLLMProvider
2. Package Setup: Create Python package with entry points
3. Configuration Validation: Add Pydantic validation
4. Testing & Integration: Unit tests and installation

Tip: Before building a full plugin, check if your provider has an OpenAI-compatible API. Many modern LLM providers now offer OpenAI compatibility!

Real-World Success Stories

Enterprise Success
5 minutes

Company: Fortune 500 Financial Services

Need: Private GPT-4 deployment behind corporate firewall

Solution: Added custom endpoint configuration - no coding required!

Startup Speed
2 minutes

Company: AI Startup

Need: Switch from OpenAI to Together.ai for cost savings

Solution: Simple provider mapping - saved 80% on API costs!

Research Lab
1 minute

Organization: University AI Research Lab

Need: Test latest Claude-3.5-Sonnet model

Solution: Added to models list - immediate access to new capabilities!

Performance Benchmarks & Cost Analysis

Latency Comparison (Average Response Time)

Provider Model Avg Latency 95th Percentile Use Case
Azure OpenAI gpt-4 1,200ms 2,100ms Enterprise
OpenAI gpt-4 1,500ms 2,800ms General
Anthropic claude-3-sonnet 1,800ms 3,200ms Analysis
Google Gemini gemini-pro 1,100ms 2,000ms Balanced
Ollama mistral 800ms 1,200ms Local

Cost Comparison (per 1M tokens)

Provider Model Input Cost Output Cost Total (1:1 ratio)
OpenAI gpt-3.5-turbo $0.50 $1.50 $1.00
OpenAI gpt-4 $10.00 $30.00 $20.00
Azure OpenAI gpt-4 $10.00 $30.00 $20.00
Anthropic claude-3-sonnet $3.00 $15.00 $9.00
Google Gemini gemini-pro $2.50 $7.50 $5.00
Ollama mistral $0.00 $0.00 $0.00

Provider Selection Guidelines

Development

Use Ollama for cost-effective testing and rapid iteration

Production

Use Azure OpenAI for enterprise reliability and SLA guarantees

High Volume

Mix of providers for load distribution and cost optimization

Cost Sensitive

Use Gemini Pro or local models for budget constraints

Complex Reasoning

Use Claude-3-opus or GPT-4 for analytical tasks

Speed Critical

Use GPT-3.5-turbo or Gemini Pro for low latency needs

Request Lifecycle Architecture

Every request flows through our secure, optimized pipeline

1

Authentication

API key validation and organization resolution

2

Security Filters

PII protection, code detection, content classification

3

Rate Limiting

Group-based limits and priority queue management

4

Semantic Cache

Redis-powered caching for similar queries

5

LLM Routing

Provider selection and failover handling

6

Response

Caching, metrics, and audit trail completion

StarToken Plugin Framework

Build custom plugins for the Wag-Tail AI Gateway with our powerful, extensible framework

The StarToken Plugin Framework is Wag-Tail's modular architecture that allows you to extend the AI Gateway with custom functionality. Whether you need specialized security filters, custom authentication, unique analytics, or integrations with third-party services, our plugin system provides the foundation to build exactly what you need.

Built-in Plugins (Included)

The Wag-Tail AI Gateway ships with a comprehensive set of production-ready plugins:

Security & Authentication

Plugin
Edition
Benefits
Key Authentication
Both
  • Database + Redis cached API key validation
  • Org/Group ID resolution
  • Fast lookup with fallback
Basic Guard
Both
  • SQL injection detection
  • Code pattern filtering
  • XSS protection
  • System command blocking
PII Guard
Both
  • Personally Identifiable Information detection
  • Phone number, email, SSN masking
  • GDPR compliance support
  • Custom Hong Kong ID recognition
AI Classifier
Advanced
  • DistilBERT-powered intent classification
  • Attack/jailbreak detection
  • Offensive content filtering
  • Custom threat model training
Output Guard
Advanced
  • LLM response filtering
  • Sensitive information blocking
  • Policy-based response modification
  • Multi-layer output validation

Performance & Routing

Plugin
Edition
Benefits
Semantic Cache
Advanced
  • ChromaDB-powered similarity matching
  • 30x+ performance improvement
  • Intelligent cache invalidation
  • Embedding-based retrieval
LLM Routing
Advanced
  • Multi-provider load balancing
  • Automatic failover chains
  • Cost optimization routing
  • Latency-based selection
Priority Queue
Advanced
  • Weighted fair queuing
  • Enterprise customer prioritization
  • Anti-starvation algorithms
  • SLA-based scheduling

Enterprise & Operations

Plugin
Edition
Benefits
Group Rate Limit
Advanced
  • Per-organization quota management
  • Hierarchical rate limiting
  • Real-time usage tracking
  • Monthly/daily/hourly limits
Vault Integration
Advanced
  • HashiCorp Vault credential management
  • Dynamic secret rotation
  • Secure API key storage
  • Enterprise key lifecycle
Langfuse Telemetry
Advanced
  • Comprehensive observability
  • Request/response tracing
  • Performance analytics
  • Custom metric collection
Webhook GuardRail
Advanced
  • External security system integration
  • Real-time event streaming
  • HMAC signature verification
  • Configurable event filtering

Plugin Benefits by Use Case

Financial Services

  • PII Guard: Automatic detection and masking of sensitive financial data
  • AI Classifier: Advanced threat detection for financial prompt attacks
  • Vault Integration: Secure credential management for regulatory compliance
  • Priority Queue: VIP customer request prioritization

Healthcare

  • PII Guard: HIPAA-compliant PHI detection and redaction
  • Output Guard: Medical advice filtering and liability protection
  • Semantic Cache: Fast retrieval while maintaining privacy
  • Webhook GuardRail: Integration with healthcare monitoring systems

E-commerce

  • Priority Queue: Premium customer service levels
  • Group Rate Limit: Tiered API access based on subscription
  • LLM Routing: Cost-optimized model selection
  • Semantic Cache: Fast product recommendation responses

Enterprise SaaS

  • Key Authentication: Multi-tenant API key management
  • Langfuse Telemetry: Customer usage analytics and billing
  • Vault Integration: Secure multi-environment deployments
  • Group Rate Limit: Fair usage across customer tiers

Plugin Architecture

Plugin Lifecycle

1
on_request(request, context)
  • Authentication & authorization
  • Request validation & filtering
  • Rate limiting & quota checks
  • Pre-processing transformations
2
LLM Provider Call
  • Semantic cache lookup (if available)
  • LLM routing & provider selection
  • Actual LLM API request
3
on_response(request, context, llm_output)
  • Response filtering & validation
  • Output guard & safety checks
  • Telemetry & analytics collection
  • Post-processing transformations

Building Custom Plugins

1. Plugin Structure

Create a standard Python package structure with setup.py, plugin class, and configuration.

my_custom_plugin/
 setup.py                    # Package configuration
 my_custom_plugin/
    __init__.py             # Plugin module
    plugin.py               # Main plugin class
    config.py               # Configuration loader
    utils.py                # Helper functions
 README.md                   # Plugin documentation

2. Plugin Implementation

Extend the PluginBase class and implement the required methods:

from plugins.base import PluginBase
from fastapi.responses import JSONResponse

class MyCustomPlugin(PluginBase):
    name = "my_custom_plugin"
    
    def on_request(self, request, context):
        # Validate requests before LLM processing
        return None  # Continue or return JSONResponse to block
    
    def on_response(self, request, context, llm_output):
        # Process responses after LLM
        return llm_output, False  # (response, modified)

3. Entry Point Registration

Register your plugin with the framework using Python entry points:

entry_points={
    "wag_tail.plugins": [
        "my_custom_plugin = my_package.module:PluginClass"
    ]
}

4. Installation & Configuration

Install and configure your plugin in the gateway environment:

# Install plugin
pip install my_custom_plugin

# Configure in sys_config.yaml
plugins:
  enabled:
    - my_custom_plugin

Plugin Examples

Custom Rate Limiter

IP-based rate limiting with sliding window

class CustomRateLimiterPlugin(PluginBase):
    def on_request(self, request, context):
        client_ip = context.get("client_ip")
        if self.is_rate_limited(client_ip):
            return JSONResponse(
                {"error": "Rate limit exceeded"},
                status_code=429
            )
        return None

Content Enrichment

Add metadata to responses

class ContentEnrichmentPlugin(PluginBase):
    def on_response(self, request, context, llm_output):
        enriched_output = llm_output.copy()
        enriched_output["metadata"] = {
            "processed_at": time.time(),
            "plugin_version": "1.0.0"
        }
        return enriched_output, True

Audit Logging

Log all requests to database for compliance

class AuditLogPlugin(PluginBase):
    def on_request(self, request, context):
        self.log_to_database(
            context.get("request_id"),
            context.get("org_id"),
            request.json().get("prompt")
        )
        return None

Getting Started

Ready to build your first plugin? Follow these steps:

1

Plan Your Plugin

Define the specific functionality you need, identify which plugin hooks to implement, and plan your configuration and dependencies.

2

Set Up Development Environment

Create plugin directory structure and install Wag-Tail development dependencies.

3

Implement Core Functionality

Start with basic plugin structure, implement required methods, add configuration loading, and include comprehensive logging.

4

Package and Deploy

Build wheel package, install in gateway environment, configure plugin in sys_config.yaml, and restart gateway.

Best Practices

Performance

  • Keep processing time under 50ms for on_request hooks
  • Set reasonable timeouts for external service calls
  • Monitor plugin execution time and success rates

Security

  • Use environment variables for secrets
  • Validate input to prevent injection attacks
  • Sanitize data before external API calls

Reliability

  • Always handle exceptions and provide fallback behavior
  • Include relevant context in all log messages
  • Write comprehensive unit tests

Documentation

Everything you need to get started, configure, and deploy Wag-Tail at scale

Getting Started with Wag-Tail

Your journey to production-ready AI gateway deployment

5-Minute Quick Start

Perfect for development, prototyping, and getting started quickly

1
Clone Repository
# Contact support@wag-tail.com for source code access
cd wag-tail-ai-gateway
2
Setup Environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Success!

Your Basic Edition is now running with full PII protection and security filtering.

System Requirements
Python 3.9+ (3.11+ recommended)
RAM 2GB minimum (4GB+ recommended)
CPU 2 cores minimum (4+ recommended)
Storage 5GB minimum (10GB+ recommended)

Verification & Testing

Health Check
curl http://localhost:8000/admin/health \
  -H "X-Admin-API-Key: your-admin-key"
Expected: {"status": "healthy", "version": "3.4.0", "edition": "basic"}
Security Test
curl -X POST http://localhost:8000/chat \
  -H "X-API-Key: b6c91d9d2ff66624356f5e5cfd03dc784d80a2eedd6af0d94e908d7b19e25e85" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "SELECT * FROM users; DROP TABLE users;"}'
Should be blocked: {"flag": "blocked", "reason": "SQL injection pattern detected"}

Admin API - Advanced Edition Only

Comprehensive monitoring and management capabilities for Wag-Tail AI Gateway Advanced Edition

The Admin API provides comprehensive monitoring and management capabilities for Wag-Tail AI Gateway Advanced Edition. All admin endpoints require an Advanced Edition license and valid admin API key.

What you can do with the Admin API

Monitor System Health

Get real-time status of all critical services including Redis, Vault, Langfuse, and core application components

Analyze Usage Patterns

Access detailed analytics across organizations, endpoints, and time periods

Manage Cache Performance

Monitor semantic cache effectiveness and clear cache when needed

Automate Operations

Integrate with monitoring systems, CI/CD pipelines, and alerting platforms

Who can use the Admin API

Advanced Edition Users

The Admin API is exclusively available to Advanced Edition customers. With your Advanced license, you get full access to all administrative endpoints with enterprise-grade capabilities.

Basic Edition Users

Basic Edition (OSS) users do not have access to any Admin API functionality. All admin endpoints will return 403 Forbidden for Basic Edition deployments.

Alternative approaches for Basic Edition:
  • Application logs and standard logging frameworks
  • External monitoring tools (Prometheus, Grafana, DataDog)
  • Health check endpoints in your applications
  • Standard observability and APM solutions

Getting Started

Prerequisites

Before using the Admin API, ensure you have:

Advanced Edition License Required for any admin endpoint access
Admin API Key Configured in your gateway settings
Network Access Secure connectivity to your gateway deployment

Authentication

All admin endpoints require the x-admin-api-key header:

curl -H "x-admin-api-key: YOUR_ADMIN_API_KEY" http://localhost:8000/admin/endpoint

The admin API key is configured in your system configuration:

admin:
  api_key: "your_admin_api_key_here"

Rate Limit Monitoring Endpoints (NEW)

Get Organization Rate Limit Status
GET /admin/rate-limit/status/{org_id}

Monitor comprehensive rate limit status for any organization with real-time usage tracking.

Example Request:
curl -X GET "http://localhost:8000/admin/rate-limit/status/Enterprise%20Customer" \
     -H "x-admin-api-key: YOUR_API_KEY"
Response:
{
  "org_id": "Enterprise Customer",
  "edition": "advanced",
  "usage_stats": {
    "requests_today": 150,
    "requests_this_month": 2500,
    "monthly_limit": 100000,
    "remaining_requests": 97500,
    "usage_percentage": 2.5,
    "reset_date": "2025-09-01"
  },
  "groups": [
    {
      "group_id": "production",
      "requests_today": 100,
      "requests_this_month": 1800,
      "monthly_limit": 60000,
      "remaining_requests": 58200,
      "usage_percentage": 3.0
    }
  ],
  "status": "healthy",
  "warnings": []
}
Status Values:
  • healthy: Usage below 90% of limits
  • warning: Usage at 90-99% of limits
  • over_limit: Usage at or above 100% of limits
Get All Organizations Status
GET /admin/rate-limit/status

Get rate limit status for all organizations in your system.

System Status Overview
GET /admin/system/status

Get comprehensive system statistics and health metrics.

Response:
{
  "version": "4.0.0",
  "edition": "advanced",
  "uptime_seconds": 3600,
  "total_organizations": 5,
  "active_organizations": 3,
  "total_requests_today": 1250,
  "plugin_count": 12
}

System Health Endpoints

Health Check
GET /admin/health

Comprehensive system health check with dependency monitoring.

Response:
{
  "status": "healthy",
  "services": {
    "fastapi": "ok",
    "redis": "ok",
    "vault": "ok",
    "langfuse": "ok"
  },
  "edition": "advanced",
  "uptime_seconds": 3600
}
License Information
GET /admin/license

Get current license information and validity.

Hot-reload License
POST /admin/reload_license

Hot-reload license without server restart.

Analytics & Usage Endpoints

Usage Statistics
GET /admin/usage

Detailed usage statistics across all organizations.

Cache Statistics
GET /admin/cache_stats

Semantic cache performance metrics and analytics.

Response:
{
  "total_entries": 150,
  "hit_rate": 85.0,
  "total_hits": 1500,
  "total_requests": 1765,
  "memory_usage_mb": 15.0
}

Operations & Maintenance

Clear Cache
POST /admin/clear_cache

Clear all semantic cache entries.

Reset Usage Counters
POST /admin/reset_usage

Reset all usage counters and statistics.

Usage Testing (Development)
POST /admin/usage/increment/{org_id}?group_id={group_id}

Manually increment usage counters for testing purposes.

Monthly Limits (NEW)

Advanced Edition includes 100,000 monthly request limits with:

  • Organization-level tracking
  • Group-level sub-limits
  • Automatic monthly resets
  • Real-time usage monitoring
  • Warning thresholds at 90%
  • Over-limit protection

Database & Storage (NEW)

SQLite Database

Advanced Edition automatically creates and maintains a SQLite database for persistent usage tracking:

  • Location: data/wag_tail.db
  • Tables: api_keys, org_usage, group_usage
  • Automatic Backup: Recommended for production

Monitoring Examples

Check High Usage Organizations
curl -X GET "http://localhost:8000/admin/rate-limit/status" \
     -H "x-admin-api-key: YOUR_KEY" | \
     jq '.[] | select(.status != "healthy")'
Monitor Specific Group Usage
curl -X GET "http://localhost:8000/admin/rate-limit/status/Enterprise%20Customer" \
     -H "x-admin-api-key: YOUR_KEY" | \
     jq '.groups[] | select(.usage_percentage > 80)'

Error Handling

  • 403 Forbidden: Invalid admin API key or Basic Edition license
  • 404 Not Found: Organization not found
  • 500 Internal Server Error: Database or system error

Security & Access

  • Admin API keys should be rotated regularly
  • All admin operations are logged for audit trails
  • Advanced Edition license required for all endpoints
  • Rate limit data stored securely in local SQLite database

Getting Started

1 Ensure Advanced Edition license is installed
2 Configure admin API key in system settings
3 Start monitoring with /admin/health endpoint
4 Set up group limits in group_config.yaml
5 Monitor usage with rate limit endpoints

Ready to monitor your AI Gateway like a pro? Start with the health check endpoint and explore the comprehensive monitoring capabilities!

System Configuration

Comprehensive configuration guide for both OSS and Enterprise editions

Wag-Tail uses a hierarchical configuration system that supports YAML-based files, environment variable overrides, runtime updates, and edition-specific features with automatic capability detection.

YAML Configuration

Structured settings with clear hierarchy

Environment Overrides

Flexible deployment configuration

Runtime Updates

Dynamic configuration changes

Edition-Specific

Automatic capability detection

Configuration File Structure

config/
 sys_config.yaml           # Main configuration file
 plugins_config.yaml       # Plugin-specific settings
 security_config.yaml      # Security policies
 llm_providers.yaml        # LLM provider configurations
 environments/
     development.yaml      # Development overrides
     staging.yaml          # Staging environment settings
     production.yaml       # Production configuration

Core Configuration Sections

Basic Application Configuration
# Basic sys_config.yaml
edition: "enterprise"  # or "oss"
version: "1.0.0"
environment: "production"

app:
  name: "Wag-Tail AI Gateway"
  host: "0.0.0.0"
  port: 8000
  debug: false
  workers: 4
  max_request_size_mb: 10
  request_timeout: 300

database:
  type: "postgresql"  # sqlite, postgresql, mysql
  postgresql:
    host: "${DB_HOST:localhost}"
    port: "${DB_PORT:5432}"
    database: "${DB_NAME:wagtail}"
    username: "${DB_USER:wagtail}"
    password: "${DB_PASSWORD}"
    pool_size: 10

logging:
  level: "${LOG_LEVEL:INFO}"
  format: "json"
  file:
    enabled: true
    path: "logs/wagtail.log"
    max_size_mb: 100
Security Configuration
security:
  # API Authentication
  api_keys:
    enabled: true
    header_name: "X-API-Key"
    allow_query_param: false  # Security: disable for production
    default_key: "${DEFAULT_API_KEY}"
    
  # Rate limiting
  rate_limiting:
    enabled: true
    per_minute: 100
    per_hour: 1000
    per_day: 10000
    burst_limit: 20
    
  # Content filtering
  content_filtering:
    enabled: true
    block_code_execution: true
    block_sql_injection: true
    block_xss_attempts: true
    
  # PII protection
  pii_protection:
    enabled: true
    detection_confidence: 0.8
    anonymization_method: "mask"  # mask, replace, redact
    entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "SSN", "CREDIT_CARD"]
    
  # TLS/SSL settings
  tls:
    enabled: true
    cert_file: "${TLS_CERT_FILE:certs/server.crt}"
    key_file: "${TLS_KEY_FILE:certs/server.key}"
LLM Provider Configuration
llm:
  default_provider: "openai"
  default_model: "gpt-3.5-turbo"
  
  providers:
    ollama:
      enabled: true
      api_url: "${OLLAMA_URL:http://localhost:11434/api/generate}"
      models: ["mistral", "llama2", "codellama"]
      timeout: 60
      max_retries: 3
      
    openai:
      enabled: true
      api_key: "${OPENAI_API_KEY}"
      api_url: "https://api.openai.com/v1"
      models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
      timeout: 120
      max_tokens: 4000
      temperature: 0.7
      
    gemini:
      enabled: true
      api_key: "${GEMINI_API_KEY}"
      api_url: "https://generativelanguage.googleapis.com/v1"
      models: ["gemini-pro", "gemini-pro-vision"]
      timeout: 90
      
    azure:
      enabled: true
      api_key: "${AZURE_OPENAI_API_KEY}"
      api_url: "${AZURE_OPENAI_ENDPOINT}"
      api_version: "2023-12-01-preview"
      deployment_name: "${AZURE_DEPLOYMENT_NAME}"
Enterprise Features Configuration
# Redis configuration (Enterprise)
redis:
  enabled: true
  host: "${REDIS_HOST:localhost}"
  port: "${REDIS_PORT:6379}"
  password: "${REDIS_PASSWORD}"
  database: 0
  max_connections: 20

# Semantic caching (Enterprise)
caching:
  semantic:
    enabled: true
    provider: "redis"
    ttl: 3600  # seconds
    similarity_threshold: 0.85
    max_cache_size_mb: 1000
    
  response:
    enabled: true
    default_ttl: 300
    max_ttl: 86400

# Monitoring & observability
monitoring:
  metrics:
    enabled: true
    endpoint: "/metrics"
    format: "prometheus"
    
  tracing:
    enabled: true
    provider: "jaeger"
    endpoint: "${TRACING_ENDPOINT}"
    service_name: "wagtail-gateway"
    sample_rate: 0.1
    
  apm:
    enabled: true
    provider: "newrelic"
    license_key: "${APM_LICENSE_KEY}"

Environment Configuration

Development
# environments/development.yaml
app:
  debug: true
  reload: true
  workers: 1

logging:
  level: "DEBUG"
  console:
    colored: true

database:
  type: "sqlite"
  sqlite:
    path: "data/dev.db"

security:
  rate_limiting:
    enabled: false
  tls:
    enabled: false
Production
# environments/production.yaml
app:
  debug: false
  reload: false
  workers: 8
  
security:
  rate_limiting:
    enabled: true
    per_minute: 60
  tls:
    enabled: true
    verify_client: true
    
logging:
  level: "INFO"
  aggregation:
    enabled: true
    
monitoring:
  metrics:
    enabled: true
  tracing:
    enabled: true
  apm:
    enabled: true

Environment Variables

Application
WAGTAIL_ENVIRONMENT - Environment name
WAGTAIL_HOST - Bind host
WAGTAIL_PORT - Bind port
WAGTAIL_WORKERS - Worker processes
Database
DB_HOST - Database host
DB_PORT - Database port
DB_NAME - Database name
DB_USER - Database user
DB_PASSWORD - Database password
LLM APIs
OPENAI_API_KEY - OpenAI API key
GEMINI_API_KEY - Google Gemini key
AZURE_OPENAI_API_KEY - Azure OpenAI key
ANTHROPIC_API_KEY - Anthropic API key
Security
DEFAULT_API_KEY - Default API key
JWT_SECRET - JWT signing secret
TLS_CERT_FILE - TLS certificate
WEBHOOK_SECRET - Webhook secret

Configuration Loading Hierarchy

1
Default Values

Built-in defaults (lowest priority)

2
Base Config

sys_config.yaml file

3
Environment Files

environments/{env}.yaml

4
Plugin Configs

Plugin-specific settings

5
Environment Variables

Runtime overrides (highest priority)

Configuration Best Practices

Security
  • Use environment variables for secrets
  • Never commit secrets to version control
  • Implement configuration validation
  • Rotate secrets regularly
  • Use secure file permissions (600/640)
Performance
  • Cache configuration in memory
  • Use lazy loading for large configs
  • Optimize configuration parsing
  • Monitor configuration load times
  • Minimize configuration file size
Operations
  • Version control configuration files
  • Test changes in staging first
  • Implement rollback procedures
  • Document all configuration options
  • Use configuration templates
Testing
  • Validate configuration syntax
  • Test in multiple environments
  • Implement configuration test suites
  • Use configuration smoke tests
  • Check for drift detection

Configuration Troubleshooting

Validation Commands
# Check file permissions
ls -la config/sys_config.yaml

# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"

# Check environment variables
env | grep WAGTAIL

# Test database connectivity
python -c "import psycopg2; conn = psycopg2.connect(host='localhost', database='wagtail', user='wagtail', password='password'); print('Connected')"

# Debug configuration loading
python -c "from config_loader import load_configuration; print(load_configuration())"

Enterprise Reference Architecture

Wag-Tail AI Gateway is designed for flexible deployment across various infrastructure environments, from simple single-server deployments to complex multi-cloud, multi-region enterprise architectures.

Edge Layer

CDN WAF Load Balancer

API Gateway Layer

API Gateway Authentication Rate Limiting

Application Layer

Wag-Tail Pods Service Mesh Auto Scaling

Data Layer

PostgreSQL Redis Cluster Object Storage

Single Server

Perfect for development and small-scale deployments using Docker Compose

  • Docker containers
  • Nginx reverse proxy
  • Local PostgreSQL & Redis

Kubernetes

Enterprise-scale deployment with auto-scaling and high availability

  • Horizontal Pod Autoscaling
  • Service mesh integration
  • Cloud-native storage

Multi-Cloud

Global deployment across AWS, Azure, and GCP with API gateway integration

  • Regional deployments
  • Global load balancing
  • Cross-cloud replication

Single-Server Deployment

Ideal for development, testing, and small-scale production environments.

Architecture Components

Internet
Nginx
Wag-Tail
PostgreSQL/Redis

Docker Compose Configuration

# docker-compose.yml
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    container_name: wagtail_nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - wagtail
    restart: unless-stopped

  wagtail:
    image: wagtail/ai-gateway:latest
    container_name: wagtail_app
    environment:
      - WAGTAIL_ENVIRONMENT=production
      - DB_HOST=postgres
      - REDIS_HOST=redis
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./config:/app/config
      - ./logs:/app/logs
    depends_on:
      - postgres
      - redis
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  postgres:
    image: postgres:15-alpine
    container_name: wagtail_postgres
    environment:
      - POSTGRES_DB=wagtail
      - POSTGRES_USER=wagtail
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    container_name: wagtail_redis
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Nginx Configuration

# nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream wagtail_backend {
        server wagtail:8000;
    }

    server {
        listen 80;
        server_name your-domain.com;
        return 301 https://$server_name$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name your-domain.com;

        ssl_certificate /etc/nginx/ssl/server.crt;
        ssl_certificate_key /etc/nginx/ssl/server.key;

        location / {
            proxy_pass http://wagtail_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        location /health {
            proxy_pass http://wagtail_backend/health;
            access_log off;
        }
    }
}

Quick Start Commands

docker-compose up -d Start all services
docker-compose logs -f wagtail View application logs
docker-compose exec wagtail /app/healthcheck.sh Check application health

Kubernetes Deployment

Enterprise-scale deployment with auto-scaling, high availability, and cloud-native features.

Kubernetes Architecture

Ingress Layer
Nginx Ingress Cert Manager TLS Termination
Application Layer
Deployment Service HPA ConfigMap Secrets
Data Layer
PostgreSQL Cluster Redis Cluster Persistent Volumes
Monitoring Layer
Prometheus Grafana Jaeger AlertManager

Core Kubernetes Manifests

Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wagtail-gateway
  namespace: wagtail
  labels:
    app: wagtail-gateway
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: wagtail-gateway
  template:
    metadata:
      labels:
        app: wagtail-gateway
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: wagtail-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: wagtail
        image: wagtail/ai-gateway:v1.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 8000
          protocol: TCP
        env:
        - name: WAGTAIL_ENVIRONMENT
          value: "production"
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wagtail-secrets
              key: db-password
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wagtail-secrets
              key: redis-password
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
          readOnly: true
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config-volume
        configMap:
          name: wagtail-config
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: wagtail-hpa
  namespace: wagtail
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: wagtail-gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wagtail-ingress
  namespace: wagtail
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  tls:
  - hosts:
    - api.wagtail.ai
    secretName: wagtail-tls
  rules:
  - host: api.wagtail.ai
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: wagtail-service
            port:
              number: 80

Deployment Commands

kubectl apply -f k8s/ Deploy all manifests
kubectl get pods -n wagtail Check pod status
kubectl logs -f deployment/wagtail-gateway -n wagtail View application logs
kubectl port-forward svc/wagtail-service 8080:80 -n wagtail Local port forwarding

Multi-Cloud Deployment

Global deployment across AWS, Azure, and GCP with regional failover and API gateway integration.

Global Architecture

AWS US-East
  • EKS Cluster
  • RDS PostgreSQL
  • ElastiCache Redis
  • API Gateway
Azure EU-West
  • AKS Cluster
  • Azure Database
  • Azure Cache
  • API Management
GCP Asia-Pacific
  • GKE Cluster
  • Cloud SQL
  • Memorystore
  • Apigee
Global Services
CloudFlare DNS HashiCorp Vault Global Monitoring Cross-Region Backup

Terraform Infrastructure

# EKS Cluster
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  
  cluster_name    = "wagtail-cluster"
  cluster_version = "1.28"
  
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
  
  eks_managed_node_groups = {
    wagtail_nodes = {
      desired_size = 3
      max_size     = 10
      min_size     = 3
      
      instance_types = ["t3.large"]
      
      k8s_labels = {
        Environment = "production"
        Application = "wagtail"
      }
    }
  }
}

# RDS PostgreSQL
resource "aws_db_instance" "wagtail_db" {
  identifier = "wagtail-postgres"
  
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.r6g.large"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_encrypted     = true
  
  db_name  = "wagtail"
  username = "wagtail"
  password = var.db_password
  
  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.wagtail.name
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot = false
  final_snapshot_identifier = "wagtail-final-snapshot"
}

Kong API Gateway Integration

_format_version: "3.0"

services:
  - name: wagtail-gateway
    url: http://wagtail-service.wagtail.svc.cluster.local:80
    retries: 3
    connect_timeout: 10000
    read_timeout: 60000
    write_timeout: 60000

routes:
  - name: wagtail-chat
    service: wagtail-gateway
    paths:
      - /chat
    methods:
      - POST
    strip_path: false

plugins:
  # Rate limiting
  - name: rate-limiting
    service: wagtail-gateway
    config:
      minute: 100
      hour: 1000
      day: 10000
      policy: redis
      redis_host: redis-service.wagtail.svc.cluster.local

  # Authentication
  - name: key-auth
    service: wagtail-gateway
    config:
      key_names:
        - X-API-Key
      hide_credentials: true

  # CORS
  - name: cors
    service: wagtail-gateway
    config:
      origins:
        - "https://app.yourcompany.com"
      methods:
        - GET
        - POST
        - OPTIONS
      credentials: true
      max_age: 3600

Monitoring & Observability

Comprehensive monitoring, logging, and tracing for production deployments.

Monitoring Architecture

Metrics Collection
Prometheus Node Exporter cAdvisor Custom Metrics
Logging Pipeline
Fluentd Elasticsearch Logstash Kibana
Distributed Tracing
Jaeger Zipkin OpenTelemetry Collector
Visualization & Alerting
Grafana AlertManager PagerDuty

Prometheus Configuration

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "wagtail-rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'wagtail-gateway'
    static_configs:
      - targets: ['wagtail-service:8000']
    metrics_path: /metrics
    scrape_interval: 10s

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - wagtail
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

  - job_name: 'postgres-exporter'
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: 'redis-exporter'
    static_configs:
      - targets: ['redis-exporter:9121']

Key Metrics Dashboard

Request Rate
rate(wagtail_requests_total[5m])
Response Time
histogram_quantile(0.95, rate(wagtail_request_duration_seconds_bucket[5m]))
Error Rate
rate(wagtail_requests_total{status=~"4..|5.."}[5m])
LLM Response Times
wagtail_llm_request_duration_seconds

Alerting Rules

High Error Rate

Error rate > 5% for 5 minutes

High Response Time

95th percentile > 1s for 5 minutes

Pod Crash Loop

Pod restart count > 3 in 10 minutes

Database Connection Issues

Database connection pool exhausted

Security Architecture

Zero-trust security model with comprehensive protection layers.

Security Architecture Layers

Perimeter Security
  • CloudFlare DDoS Protection
  • Web Application Firewall
  • Rate Limiting
Identity & Access
  • OAuth 2.0 / OIDC
  • Multi-Factor Authentication
  • Role-Based Access Control
Network Security
  • Virtual Private Cloud
  • Private Subnets
  • Security Groups
Data Security
  • Encryption at Rest
  • Encryption in Transit
  • Secret Management

Istio Security Policies

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: wagtail-security-policy
  namespace: wagtail
spec:
  selector:
    matchLabels:
      app: wagtail-gateway
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/wagtail/sa/wagtail-service-account"]
  - to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/chat", "/health", "/metrics"]
  - when:
    - key: source.ip
      values: ["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]

---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: wagtail-mtls
  namespace: wagtail
spec:
  selector:
    matchLabels:
      app: wagtail-gateway
  mtls:
    mode: STRICT

Disaster Recovery & Backup

Daily Backup

Incremental PostgreSQL & Redis backups to S3

Weekly Full Backup

Complete system backup with configuration

Long-term Archive

Monthly backups archived to Glacier

Cross-Region DR

Standby environment in secondary region

Security Checklist