AI Gateway Built for
Production Scale

Wag-Tail is a FastAPI gateway that fronts multiple LLM providers with advanced security, semantic caching, intelligent routing, and comprehensive rate limit monitoring. Built for enterprise-grade AI applications.

100+ LLM Providers

99.9% Uptime

50ms Avg Latency

Get Started View Architecture

Request

Security

Cache

LLM

Response

What is Wag-Tail AI Gateway?

Wag-Tail AI Gateway is a comprehensive, enterprise-grade security and routing layer for Large Language Model (LLM) applications. It sits between your applications and LLM providers, providing advanced security filtering, intelligent routing, performance optimization, and enterprise-grade observability.

Whether you're building customer-facing AI applications, internal tools, or enterprise AI platforms, Wag-Tail ensures your LLM interactions are secure, fast, and compliant while giving you complete control over costs, routing, and data governance.

Quick Start

Get started with Wag-Tail in under 5 minutes:

import requests

# Replace your direct OpenAI calls
response = requests.post(
    "https://your-wagtail-gateway.com/chat",
    headers={
        "X-API-Key": "your-api-key",
        "Content-Type": "application/json"
    },
    json={"prompt": "What is machine learning?"}
)

# Get secure, filtered, and optimized responses
result = response.json()
print(result["response"])  # AI response with security filtering applied
print(result["cache_hit"])  # True if served from semantic cache (30x faster)

That's it. No complex integrations, no infrastructure changes. Just point your existing LLM calls to Wag-Tail and get enterprise-grade security and performance immediately.

Core Problems Wag-Tail Solves

Security & Compliance Challenges

The Problem: Direct LLM API calls expose your applications to:

Prompt injection attacks
Data leakage and PII exposure
Malicious content generation
Compliance violations (GDPR, HIPAA, SOX)

Wag-Tail Solution: Multi-layer security pipeline with:

AI-powered threat detection using DistilBERT classification
PII detection and masking with custom recognizers
SQL injection and XSS protection with pattern-based filtering
Output sanitization preventing harmful content generation
Real-time security monitoring with webhook integrations

Performance & Cost Optimization

The Problem: LLM calls are:

Expensive ($0.01-$0.30+ per request)
Slow (2-10 second response times)
Inconsistent (provider outages, rate limits)

Wag-Tail Solution: Intelligent optimization with:

Semantic caching delivering 30x+ performance improvements
Multi-provider routing with automatic failover
Cost-optimized model selection based on prompt complexity
Request prioritization for enterprise customers

Enterprise Requirements

The Problem: Production LLM deployments need:

Centralized governance and control
Detailed usage analytics and billing
Multi-tenant isolation
Audit trails and compliance reporting

Wag-Tail Solution: Enterprise-grade platform with:

Multi-organization isolation with per-tenant quotas
Comprehensive audit logging for compliance teams
Real-time usage analytics for cost management
Vault integration for secure credential management
Role-based access control with group-level permissions

Architecture Overview

Your Applications

HTTP/HTTPS requests

Wag-Tail AI Gateway

Security Pipeline (6 Layers)

API Key Authentication
Regex & Code Injection Filtering
PII Detection & Masking
AI Threat Classification (Advanced)
Rate Limiting & Quotas (Advanced)
Output Content Filtering

Performance Layer

Semantic Cache (Advanced) - 30x faster
Priority Queue (Advanced) - Enterprise SLA
Smart Routing & Failover

Observability Layer

Request/Response Logging
Usage Analytics & Billing
Langfuse Integration (Advanced)
Webhook Events

LLM Providers

OpenAI, Azure, Gemini, Claude, Ollama

Performance Benchmarks

Semantic Cache Performance

First Request 2,847ms

Cached Request 58ms

                                        Improvement
                                        49x faster
                                    

Security Processing

Basic Pipeline 5-15ms

AI Classification 20-50ms

Total Overhead <100ms

Throughput Capacity

Basic Edition 1,000+ req/s

Advanced Edition 5,000+ req/s

Enterprise Cluster 50,000+ req/s

Three Editions, Unlimited Possibilities

Choose between Basic OSS, Advanced Licensed, or Enterprise editions based on your needs

Basic Edition

OPEN SOURCE

PII Protection & Filtering
Code Detection & Security
Multi-Provider Support (11+)
Basic Authentication
Plugin Architecture
Production Ready

Advanced Edition

LICENSED

Everything in Basic +
AI Classification & Routing
Priority Queuing System
Semantic Caching (Redis)
Multi-Provider Failover
Webhook Guardrail Integration
Langfuse Integration
Vault-Managed Secrets
Group Management
Admin API Access

Enterprise Edition

ENTERPRISE

Everything in Advanced +
Complete Admin Portal
Cost Management & Analytics
Enterprise SSO (SAML/OIDC)
Custom Branding
Compliance Reporting
Real-time Dashboards
White-label Options
24/7 Dedicated Support
Custom SLAs
Enterprise Admin Portal

Real Stories, Real Results

See how teams like yours solve critical AI challenges with Wag-Tail

The $50K Surprise

Cost Control

What happened: A fintech startup's AI pilot went viral internally. Without usage limits, costs spiraled to $50K in one month.

With Wag-Tail: Token quotas, department-level budgets, and real-time alerts caught the spike at $5K. The team got predictable costs without killing innovation.

Result: 90% cost reduction

The Data Breach That Didn't Happen

Security & Compliance

What happened: An employee pasted customer SSNs into ChatGPT for data analysis. The audit team found hundreds of similar incidents.

With Wag-Tail: PII detection auto-masks sensitive data before it ever leaves your network. Complete audit trails prove compliance to regulators.

Result: Zero PII leakage

2AM on Black Friday

High Availability

What happened: OpenAI hit rate limits during peak shopping. Customer service chatbots went down, tickets piled up.

With Wag-Tail: Automatic failover to Azure OpenAI happened in milliseconds. Semantic caching served 40% of requests without hitting any API.

Result: 99.99% uptime

The 10-Second Wait

Performance

What happened: Users complained about slow AI responses. Average latency was 8-10 seconds, killing user adoption.

With Wag-Tail: Semantic caching recognized similar questions and delivered cached responses in under 100ms. Users never noticed a difference from fresh responses.

Result: 49x faster responses

Shadow AI

AI Governance

What happened: IT discovered 47 different AI tools across departments, each with separate contracts, no oversight, and no security review.

With Wag-Tail: Single gateway for all AI access. Full visibility into who's using what, how much they're spending, and what data is being processed.

Result: Complete visibility

The Multi-Tenant Nightmare

SaaS Providers

What happened: A SaaS company needed to offer AI features to 500+ customers, each with different usage tiers and data isolation requirements.

With Wag-Tail: Multi-tenant isolation with per-customer quotas, audit logs, and model routing. Customers get exactly what they paid for.

Result: Scalable multi-tenancy

Ready to solve your AI challenges?

Talk to Us View Pricing

Enterprise Security

Advanced PII protection, code detection, and comprehensive audit trails for enterprise compliance.

Semantic Caching

Redis-powered semantic caching reduces costs and improves response times for similar queries.

Intelligent Routing

YAML-driven routing with health-based failover across multiple LLM providers.

Observability

Comprehensive monitoring with Langfuse integration, metrics, and distributed tracing.

Rate Limiting

Group-based rate limiting and usage tracking with priority queuing capabilities.

Plugin Pipeline

Extensible plugin architecture for custom security, processing, and integration needs.

LLM Provider Integration Framework

Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.

No Coding Required

90% of new model additions need only YAML configuration

1-Minute Setup

New models from existing providers in 1 minute

Hot-Reload Support

Configuration changes without restart needed

Automatic Fallback

Automatic error handling with fallback configurations

Configuration-Driven Provider Support

We ship with 5 major providers and carefully selected models, but you can easily add unlimited models through simple YAML configuration.

90%

Config Only (1 minute)

New models from existing providers

Simple Mapping (10 minutes)

OpenAI-compatible APIs

Full Plugin (2+ hours)

Completely new API formats

OpenAI (GPT Models)

Streaming Multimodal

Supported Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o

Use Cases: General purpose, code generation, content creation, complex reasoning

llm:
  provider: openai
  model: gpt-4
  api_key: ${OPENAI_API_KEY}
  api_url: https://api.openai.com/v1/chat/completions
  
  openai:
    temperature: 0.7
    max_tokens: 2048
    timeout: 30

Azure OpenAI (Enterprise GPT)

Enterprise Streaming

Supported Models: gpt-4, gpt-35-turbo, text-embedding-ada-002

Enterprise Benefits: Data residency, private network, SOC 2/HIPAA compliance

llm:
  provider: azure
  model: gpt-4
  azure:
    api_key: ${AZURE_OPENAI_KEY}
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    deployment_name: ${AZURE_DEPLOYMENT_NAME}
    api_version: "2023-12-01-preview"

Google Gemini (Multimodal)

Multimodal Cost-Effective

Supported Models: gemini-pro, gemini-pro-vision, gemini-ultra

Strengths: Advanced reasoning, multimodal capabilities, built-in safety filters

llm:
  provider: gemini
  model: gemini-pro
  gemini:
    api_key: ${GOOGLE_API_KEY}
    endpoint: https://generativelanguage.googleapis.com/v1
    safety_settings:
      harassment: "block_medium_and_above"
      hate_speech: "block_medium_and_above"

Anthropic Claude (Safety-First)

Constitutional AI Long Context

Supported Models: claude-3-opus, claude-3-sonnet, claude-3-haiku

Features: Up to 200K tokens context, extensive safety training, ethical reasoning

llm:
  provider: anthropic
  model: claude-3-sonnet
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    endpoint: https://api.anthropic.com/v1/messages
    max_tokens: 4000
    system_message: "You are a helpful AI assistant."

Ollama (Local Models)

Free Privacy

Supported Models: mistral, llama2, codellama, neural-chat, starcode

Benefits: Complete data privacy, no per-token charges, offline capability

llm:
  provider: ollama
  model: mistral
  ollama:
    api_url: http://localhost:11434/api/generate
    timeout: 60
    context_length: 4096

# Installation:
# brew install ollama
# ollama pull mistral
# ollama serve

Advanced Provider Features

Seamless Provider Switching

Switch providers without changing application code

# Development
llm:
  provider: ollama
  model: mistral

# Production  
llm:
  provider: azure
  model: gpt-4

Multi-Provider Routing

Route different request types to optimal providers

routing_rules:
  - condition: "request_type == 'code'"
    provider: "anthropic"
    model: "claude-3-opus"
  - condition: "cost_sensitive == true"
    provider: "ollama"
    model: "mistral"

Intelligent Failover

Automatic failover ensures high availability

fallback_chain:
  - provider: "azure"     # Primary
  - provider: "openai"    # Backup
  - provider: "gemini"    # Secondary
  - provider: "ollama"    # Last resort

Header-Based Selection

Override provider and model via HTTP headers

curl -X POST /chat \
  -H "X-LLM-Provider: openai" \
  -H "X-LLM-Model: gpt-4" \
  -d '{"prompt": "Hello!"}'

Simple YAML Configuration = Unlimited Models

Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.

Which Approach Should I Use?

Config Only (90%)

1 minute

New model from existing provider

Examples: GPT-5, Claude-4, Gemini-Ultra-2

Simple Mapping (8%)

10 minutes

OpenAI-compatible API

Examples: Together.ai, Replicate, Perplexity

Full Plugin (2%)

2+ hours

Different API format

Examples: Cohere, AI21 Labs

Real-World Configuration Examples

Adding New GPT Model (1 minute)

When OpenAI releases GPT-5, just add it to your YAML:

# config/sys_config.yaml - Just add to existing list!
llm:
  provider: openai
  model: gpt-5                    # NEW - just change the model name!
  
  openai:
    api_key: ${OPENAI_API_KEY}
    models: 
      - gpt-4                     # Existing
      - gpt-4-turbo              # Existing  
      - gpt-3.5-turbo            # Existing
      - gpt-5                     # NEW - just add to list!
    timeout: 30

Adding Perplexity API (10 minutes)

Perplexity uses OpenAI-compatible format:

# config/sys_config.yaml - OpenAI-compatible provider
llm:
  provider: perplexity
  model: sonar-medium-online

perplexity:
  api_key: ${PERPLEXITY_API_KEY}
  endpoint: https://api.perplexity.ai    # Different endpoint
  provider_type: openai_compatible       # Maps to OpenAI implementation
  models:
    - sonar-medium-online
    - sonar-small-chat

Enterprise Custom Endpoint (5 minutes)

Your company's custom OpenAI deployment:

# config/sys_config.yaml - Custom enterprise endpoint
llm:
  provider: custom_enterprise
  model: custom-gpt-4-fine-tuned

custom_enterprise:
  api_key: ${ENTERPRISE_API_KEY}
  endpoint: https://llm.yourcompany.com/v1
  provider_type: openai_compatible
  models:
    - custom-gpt-4-fine-tuned
    - company-specific-model

Complete YAML Configuration Reference

Multi-provider configuration with failover chains:

# config/sys_config.yaml - Complete example with all providers
llm:
  provider: openai                # Default provider
  model: gpt-4                    # Default model
  
  # OpenAI Configuration
  openai:
    api_key: ${OPENAI_API_KEY}
    models: [gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o]
    timeout: 30
    
  # Anthropic Configuration  
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    models: [claude-3-opus, claude-3-sonnet, claude-3-haiku]
    timeout: 30
    
  # Google Gemini Configuration
  gemini:
    api_key: ${GOOGLE_API_KEY}
    models: [gemini-pro, gemini-pro-vision, gemini-ultra]
    timeout: 30
    
  # Azure OpenAI Configuration
  azure:
    api_key: ${AZURE_OPENAI_KEY}
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    deployment_name: ${AZURE_DEPLOYMENT_NAME}
    api_version: "2023-12-01-preview"
    models: [gpt-4, gpt-35-turbo]
      
  # Ollama (Local) Configuration
  ollama:
    api_url: http://localhost:11434/api/generate
    models: [mistral, llama2, codellama, neural-chat]
      
  # Together.ai (OpenAI-compatible)
  together:
    api_key: ${TOGETHER_API_KEY}
    endpoint: https://api.together.xyz/inference
    provider_type: openai_compatible
    models: [meta-llama/Llama-2-70b-chat-hf, mistralai/Mistral-7B-Instruct-v0.1]

# Failover Configuration (Advanced Edition)
routing:
  fallback_chain:
    - provider: azure
      model: gpt-4
    - provider: openai  
      model: gpt-4
    - provider: gemini
      model: gemini-pro
    - provider: ollama
      model: mistral

Environment Variables Setup

# .env file - Set up your API keys
export OPENAI_API_KEY="sk-your-openai-key"
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key" 
export GOOGLE_API_KEY="your-google-api-key"
export AZURE_OPENAI_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_DEPLOYMENT_NAME="your-deployment-name"
export TOGETHER_API_KEY="your-together-key"
export PERPLEXITY_API_KEY="your-perplexity-key"

Testing Your Configuration

1. Validate YAML Syntax

python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"

2. Test Provider Connectivity

curl -X POST http://localhost:8000/chat \
  -H "X-API-Key: your-api-key" \
  -H "X-LLM-Provider: your-new-provider" \
  -H "X-LLM-Model: your-new-model" \
  -d '{"prompt": "Test message"}'

3. Hot-Reload Configuration (Advanced Edition)

curl -X POST http://localhost:8000/admin/reload_config \
  -H "X-Admin-API-Key: your-admin-key"

Extensible Framework Architecture

Core Design Principles

1. Provider Abstraction

All providers implement a common interface

class BaseLLMProvider:
    def generate(self, prompt: str, context: Dict) -> LLMResponse
    def is_available(self) -> bool
    def get_models(self) -> List[str]
    def estimate_cost(self, prompt: str, response: str) -> float

2. Unified Response Format

Consistent response structure across all providers

@dataclass
class LLMResponse:
    content: str
    model: str
    provider: str
    usage: Dict[str, int]
    latency_ms: int
    success: bool

3. Plugin-Based Architecture

Providers are automatically discovered and registered

entry_points={
    'wag_tail_llm_providers': [
        'openai = wag_tail_llm_openai:OpenAIProvider',
        'azure = wag_tail_llm_azure:AzureProvider',
        'custom = wag_tail_llm_custom:CustomProvider',
    ],
}

Framework Benefits

Seamless Provider Switching: Change providers via configuration without code changes

Multi-Provider Deployments: Route different workloads to optimal providers

Intelligent Failover: Automatic failover chains ensure high availability

Cost Optimization: Intelligent routing based on cost and performance

Easy Extension: Add new providers with minimal code (30-50 lines)

Adding New Models - Configuration Over Code

Most users never need to write code! Here's how to add new LLM models using our configuration-driven approach:

Choose Your Approach Based on Your Needs

Approach 1: Config Only (90%)

Easy

When to use: Adding new models from existing providers (OpenAI, Anthropic, Google, Azure, Ollama)

Time: 1 minute

Example scenarios:

OpenAI releases GPT-5
Anthropic adds Claude-4
New Ollama model available

Approach 2: Simple Mapping (8%)

Medium

When to use: OpenAI-compatible APIs with different endpoints

Time: 10 minutes

Example providers:

Together.ai
Replicate
Perplexity
Your company's custom endpoint

Approach 3: Full Plugin (2%)

Advanced

When to use: Completely different API formats requiring custom code

Time: 2+ hours

Example providers:

Cohere
AI21 Labs
Custom proprietary APIs

Approach 1: Config Only (90% of users)

Edit YAML Configuration

Just add the new model to your existing provider configuration

# config/sys_config.yaml
llm:
  provider: openai
  model: gpt-5                    # NEW - just change model name!
  
  openai:
    models: 
      - gpt-4                     # Existing
      - gpt-4-turbo              # Existing
      - gpt-5                     # NEW - add to list!
    # ... rest of config unchanged

Test Immediately

No restart required with hot-reload support

curl -X POST http://localhost:8000/chat \
  -H "X-LLM-Provider: openai" \
  -H "X-LLM-Model: gpt-5" \
  -d '{"prompt": "Hello from new model!"}'

Approach 2: OpenAI-Compatible (8% of users)

Add Provider Configuration

Configure the new provider with OpenAI-compatible mapping

# config/sys_config.yaml
llm:
  provider: perplexity
  model: sonar-medium-online

perplexity:
  api_key: ${PERPLEXITY_API_KEY}
  endpoint: https://api.perplexity.ai    # Different endpoint
  provider_type: openai_compatible       # Key mapping
  models:
    - sonar-medium-online
    - sonar-small-chat

Enable Provider (One-line change)

Add to compatible providers list

# config/provider_mappings.py
OPENAI_COMPATIBLE_PROVIDERS = [
    'together', 
    'replicate', 
    'perplexity'    # Just add this line!
]

Approach 3: Full Plugin Development (2% of users)

Only needed for completely different API formats. See our Plugin Development Guide for detailed instructions.

1. Create Provider Class: Extend BaseLLMProvider

2. Package Setup: Create Python package with entry points

3. Configuration Validation: Add Pydantic validation

4. Testing & Integration: Unit tests and installation

Tip: Before building a full plugin, check if your provider has an OpenAI-compatible API. Many modern LLM providers now offer OpenAI compatibility!

Real-World Success Stories

Enterprise Success

5 minutes

Company: Fortune 500 Financial Services

Need: Private GPT-4 deployment behind corporate firewall

Solution: Added custom endpoint configuration - no coding required!

Startup Speed

2 minutes

Company: AI Startup

Need: Switch from OpenAI to Together.ai for cost savings

Solution: Simple provider mapping - saved 80% on API costs!

Research Lab

1 minute

Organization: University AI Research Lab

Need: Test latest Claude-3.5-Sonnet model

Solution: Added to models list - immediate access to new capabilities!

Performance Benchmarks & Cost Analysis

Latency Comparison (Average Response Time)

Provider Model Avg Latency 95th Percentile Use Case

Azure OpenAI gpt-4 1,200ms 2,100ms Enterprise

OpenAI gpt-4 1,500ms 2,800ms General

Anthropic claude-3-sonnet 1,800ms 3,200ms Analysis

Google Gemini gemini-pro 1,100ms 2,000ms Balanced

Ollama mistral 800ms 1,200ms Local

Cost Comparison (per 1M tokens)

Provider Model Input Cost Output Cost Total (1:1 ratio)

OpenAI gpt-3.5-turbo $0.50 $1.50 $1.00

OpenAI gpt-4 $10.00 $30.00 $20.00

Azure OpenAI gpt-4 $10.00 $30.00 $20.00

Anthropic claude-3-sonnet $3.00 $15.00 $9.00

Google Gemini gemini-pro $2.50 $7.50 $5.00

Ollama mistral $0.00 $0.00 $0.00

Provider Selection Guidelines

Development

Use Ollama for cost-effective testing and rapid iteration

Production

Use Azure OpenAI for enterprise reliability and SLA guarantees

High Volume

Mix of providers for load distribution and cost optimization

Cost Sensitive

Use Gemini Pro or local models for budget constraints

Complex Reasoning

Use Claude-3-opus or GPT-4 for analytical tasks

Speed Critical

Use GPT-3.5-turbo or Gemini Pro for low latency needs

Request Lifecycle Architecture

Every request flows through our secure, optimized pipeline

Authentication

API key validation and organization resolution

Security Filters

PII protection, code detection, content classification

Rate Limiting

Group-based limits and priority queue management

Semantic Cache

Redis-powered caching for similar queries

LLM Routing

Provider selection and failover handling

Response

Caching, metrics, and audit trail completion

StarToken Plugin Framework

Build custom plugins for the Wag-Tail AI Gateway with our powerful, extensible framework

The StarToken Plugin Framework is Wag-Tail's modular architecture that allows you to extend the AI Gateway with custom functionality. Whether you need specialized security filters, custom authentication, unique analytics, or integrations with third-party services, our plugin system provides the foundation to build exactly what you need.

Built-in Plugins (Included)

The Wag-Tail AI Gateway ships with a comprehensive set of production-ready plugins:

Security & Authentication

Key Authentication

Both

Database + Redis cached API key validation
Org/Group ID resolution
Fast lookup with fallback

Basic Guard

Both

SQL injection detection
Code pattern filtering
XSS protection
System command blocking

PII Guard

Both

Personally Identifiable Information detection
Phone number, email, SSN masking
GDPR compliance support
Custom Hong Kong ID recognition

AI Classifier

Advanced

DistilBERT-powered intent classification
Attack/jailbreak detection
Offensive content filtering
Custom threat model training

Output Guard

Advanced

LLM response filtering
Sensitive information blocking
Policy-based response modification
Multi-layer output validation

Performance & Routing

Semantic Cache

Advanced

ChromaDB-powered similarity matching
30x+ performance improvement
Intelligent cache invalidation
Embedding-based retrieval

LLM Routing

Advanced

Multi-provider load balancing
Automatic failover chains
Cost optimization routing
Latency-based selection

Priority Queue

Advanced

Weighted fair queuing
Enterprise customer prioritization
Anti-starvation algorithms
SLA-based scheduling

Enterprise & Operations

Group Rate Limit

Advanced

Per-organization quota management
Hierarchical rate limiting
Real-time usage tracking
Monthly/daily/hourly limits

Vault Integration

Advanced

HashiCorp Vault credential management
Dynamic secret rotation
Secure API key storage
Enterprise key lifecycle

Langfuse Telemetry

Advanced

Comprehensive observability
Request/response tracing
Performance analytics
Custom metric collection

Webhook GuardRail

Advanced

External security system integration
Real-time event streaming
HMAC signature verification
Configurable event filtering

Plugin Benefits by Use Case

Financial Services

PII Guard: Automatic detection and masking of sensitive financial data
AI Classifier: Advanced threat detection for financial prompt attacks
Vault Integration: Secure credential management for regulatory compliance
Priority Queue: VIP customer request prioritization

Healthcare

PII Guard: HIPAA-compliant PHI detection and redaction
Output Guard: Medical advice filtering and liability protection
Semantic Cache: Fast retrieval while maintaining privacy
Webhook GuardRail: Integration with healthcare monitoring systems

E-commerce

Priority Queue: Premium customer service levels
Group Rate Limit: Tiered API access based on subscription
LLM Routing: Cost-optimized model selection
Semantic Cache: Fast product recommendation responses

Enterprise SaaS

Key Authentication: Multi-tenant API key management
Langfuse Telemetry: Customer usage analytics and billing
Vault Integration: Secure multi-environment deployments
Group Rate Limit: Fair usage across customer tiers

Plugin Architecture

Plugin Lifecycle

on_request(request, context)

Authentication & authorization
Request validation & filtering
Rate limiting & quota checks
Pre-processing transformations

LLM Provider Call

Semantic cache lookup (if available)
LLM routing & provider selection
Actual LLM API request

on_response(request, context, llm_output)

Response filtering & validation
Output guard & safety checks
Telemetry & analytics collection
Post-processing transformations

Building Custom Plugins

1. Plugin Structure

Create a standard Python package structure with setup.py, plugin class, and configuration.

my_custom_plugin/
 setup.py                    # Package configuration
 my_custom_plugin/
    __init__.py             # Plugin module
    plugin.py               # Main plugin class
    config.py               # Configuration loader
    utils.py                # Helper functions
 README.md                   # Plugin documentation

2. Plugin Implementation

Extend the PluginBase class and implement the required methods:

from plugins.base import PluginBase
from fastapi.responses import JSONResponse

class MyCustomPlugin(PluginBase):
    name = "my_custom_plugin"
    
    def on_request(self, request, context):
        # Validate requests before LLM processing
        return None  # Continue or return JSONResponse to block
    
    def on_response(self, request, context, llm_output):
        # Process responses after LLM
        return llm_output, False  # (response, modified)

3. Entry Point Registration

entry_points={
    "wag_tail.plugins": [
        "my_custom_plugin = my_package.module:PluginClass"
    ]
}

4. Installation & Configuration

Install and configure your plugin in the gateway environment:

# Install plugin
pip install my_custom_plugin

# Configure in sys_config.yaml
plugins:
  enabled:
    - my_custom_plugin

Plugin Examples

Custom Rate Limiter

IP-based rate limiting with sliding window

class CustomRateLimiterPlugin(PluginBase):
    def on_request(self, request, context):
        client_ip = context.get("client_ip")
        if self.is_rate_limited(client_ip):
            return JSONResponse(
                {"error": "Rate limit exceeded"},
                status_code=429
            )
        return None

Content Enrichment

Add metadata to responses

class ContentEnrichmentPlugin(PluginBase):
    def on_response(self, request, context, llm_output):
        enriched_output = llm_output.copy()
        enriched_output["metadata"] = {
            "processed_at": time.time(),
            "plugin_version": "1.0.0"
        }
        return enriched_output, True

Audit Logging

Log all requests to database for compliance

class AuditLogPlugin(PluginBase):
    def on_request(self, request, context):
        self.log_to_database(
            context.get("request_id"),
            context.get("org_id"),
            request.json().get("prompt")
        )
        return None

Getting Started

Ready to build your first plugin? Follow these steps:

Plan Your Plugin

Define the specific functionality you need, identify which plugin hooks to implement, and plan your configuration and dependencies.

Set Up Development Environment

Create plugin directory structure and install Wag-Tail development dependencies.

Implement Core Functionality

Start with basic plugin structure, implement required methods, add configuration loading, and include comprehensive logging.

Package and Deploy

Build wheel package, install in gateway environment, configure plugin in sys_config.yaml, and restart gateway.

Best Practices

Performance

Keep processing time under 50ms for on_request hooks
Set reasonable timeouts for external service calls
Monitor plugin execution time and success rates

Security

Use environment variables for secrets
Validate input to prevent injection attacks
Sanitize data before external API calls

Reliability

Always handle exceptions and provide fallback behavior
Include relevant context in all log messages
Write comprehensive unit tests

Documentation

Everything you need to get started, configure, and deploy Wag-Tail at scale

Getting Started with Wag-Tail

Your journey to production-ready AI gateway deployment

5-Minute Quick Start

Perfect for development, prototyping, and getting started quickly

Clone Repository

# Contact support@wag-tail.com for source code access
cd wag-tail-ai-gateway

Setup Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

System Requirements

Python 3.9+ (3.11+ recommended)

RAM 2GB minimum (4GB+ recommended)

CPU 2 cores minimum (4+ recommended)

Storage 5GB minimum (10GB+ recommended)

Verification & Testing

Health Check

curl http://localhost:8000/admin/health \
  -H "X-Admin-API-Key: your-admin-key"

Expected: {"status": "healthy", "version": "3.4.0", "edition": "basic"}

Security Test

curl -X POST http://localhost:8000/chat \
  -H "X-API-Key: b6c91d9d2ff66624356f5e5cfd03dc784d80a2eedd6af0d94e908d7b19e25e85" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "SELECT * FROM users; DROP TABLE users;"}'

Should be blocked: {"flag": "blocked", "reason": "SQL injection pattern detected"}

Admin API - Advanced Edition Only

Comprehensive monitoring and management capabilities for Wag-Tail AI Gateway Advanced Edition

The Admin API provides comprehensive monitoring and management capabilities for Wag-Tail AI Gateway Advanced Edition. All admin endpoints require an Advanced Edition license and valid admin API key.

What you can do with the Admin API

Monitor System Health

Get real-time status of all critical services including Redis, Vault, Langfuse, and core application components

Analyze Usage Patterns

Access detailed analytics across organizations, endpoints, and time periods

Manage Cache Performance

Monitor semantic cache effectiveness and clear cache when needed

Automate Operations

Integrate with monitoring systems, CI/CD pipelines, and alerting platforms

Who can use the Admin API

Advanced Edition Users

The Admin API is exclusively available to Advanced Edition customers. With your Advanced license, you get full access to all administrative endpoints with enterprise-grade capabilities.

Basic Edition Users

Basic Edition (OSS) users do not have access to any Admin API functionality. All admin endpoints will return 403 Forbidden for Basic Edition deployments.

Alternative approaches for Basic Edition:

Application logs and standard logging frameworks
External monitoring tools (Prometheus, Grafana, DataDog)
Health check endpoints in your applications
Standard observability and APM solutions

Getting Started

Prerequisites

Before using the Admin API, ensure you have:

Advanced Edition License Required for any admin endpoint access

Admin API Key Configured in your gateway settings

Network Access Secure connectivity to your gateway deployment

Authentication

All admin endpoints require the x-admin-api-key header:

curl -H "x-admin-api-key: YOUR_ADMIN_API_KEY" http://localhost:8000/admin/endpoint

The admin API key is configured in your system configuration:

admin:
  api_key: "your_admin_api_key_here"

Rate Limit Monitoring Endpoints (NEW)

Get Organization Rate Limit Status

GET /admin/rate-limit/status/{org_id}

Monitor comprehensive rate limit status for any organization with real-time usage tracking.

Example Request:

curl -X GET "http://localhost:8000/admin/rate-limit/status/Enterprise%20Customer" \
     -H "x-admin-api-key: YOUR_API_KEY"

Response:

{
  "org_id": "Enterprise Customer",
  "edition": "advanced",
  "usage_stats": {
    "requests_today": 150,
    "requests_this_month": 2500,
    "monthly_limit": 100000,
    "remaining_requests": 97500,
    "usage_percentage": 2.5,
    "reset_date": "2025-09-01"
  },
  "groups": [
    {
      "group_id": "production",
      "requests_today": 100,
      "requests_this_month": 1800,
      "monthly_limit": 60000,
      "remaining_requests": 58200,
      "usage_percentage": 3.0
    }
  ],
  "status": "healthy",
  "warnings": []
}

Status Values:

healthy: Usage below 90% of limits
warning: Usage at 90-99% of limits
over_limit: Usage at or above 100% of limits

Get All Organizations Status

GET /admin/rate-limit/status

Get rate limit status for all organizations in your system.

System Status Overview

GET /admin/system/status

Get comprehensive system statistics and health metrics.

Response:

{
  "version": "4.0.0",
  "edition": "advanced",
  "uptime_seconds": 3600,
  "total_organizations": 5,
  "active_organizations": 3,
  "total_requests_today": 1250,
  "plugin_count": 12
}

System Health Endpoints

Health Check

GET /admin/health

Comprehensive system health check with dependency monitoring.

Response:

{
  "status": "healthy",
  "services": {
    "fastapi": "ok",
    "redis": "ok",
    "vault": "ok",
    "langfuse": "ok"
  },
  "edition": "advanced",
  "uptime_seconds": 3600
}

License Information

GET /admin/license

Get current license information and validity.

Hot-reload License

POST /admin/reload_license

Hot-reload license without server restart.

Analytics & Usage Endpoints

Usage Statistics

GET /admin/usage

Detailed usage statistics across all organizations.

Cache Statistics

GET /admin/cache_stats

Semantic cache performance metrics and analytics.

Response:

{
  "total_entries": 150,
  "hit_rate": 85.0,
  "total_hits": 1500,
  "total_requests": 1765,
  "memory_usage_mb": 15.0
}

Operations & Maintenance

Clear Cache

POST /admin/clear_cache

Clear all semantic cache entries.

Reset Usage Counters

POST /admin/reset_usage

Reset all usage counters and statistics.

Usage Testing (Development)

POST /admin/usage/increment/{org_id}?group_id={group_id}

Manually increment usage counters for testing purposes.

Monthly Limits (NEW)

Advanced Edition includes 100,000 monthly request limits with:

Organization-level tracking
Group-level sub-limits
Automatic monthly resets
Real-time usage monitoring
Warning thresholds at 90%
Over-limit protection

Database & Storage (NEW)

SQLite Database

Advanced Edition automatically creates and maintains a SQLite database for persistent usage tracking:

Location: data/wag_tail.db
Tables: api_keys, org_usage, group_usage
Automatic Backup: Recommended for production

Monitoring Examples

Check High Usage Organizations

curl -X GET "http://localhost:8000/admin/rate-limit/status" \
     -H "x-admin-api-key: YOUR_KEY" | \
     jq '.[] | select(.status != "healthy")'

Monitor Specific Group Usage

curl -X GET "http://localhost:8000/admin/rate-limit/status/Enterprise%20Customer" \
     -H "x-admin-api-key: YOUR_KEY" | \
     jq '.groups[] | select(.usage_percentage > 80)'

Error Handling

403 Forbidden: Invalid admin API key or Basic Edition license
404 Not Found: Organization not found
500 Internal Server Error: Database or system error

Security & Access

Admin API keys should be rotated regularly
All admin operations are logged for audit trails
Advanced Edition license required for all endpoints
Rate limit data stored securely in local SQLite database

Getting Started

1 Ensure Advanced Edition license is installed

2 Configure admin API key in system settings

3 Start monitoring with /admin/health endpoint

4 Set up group limits in group_config.yaml

5 Monitor usage with rate limit endpoints

Get Advanced Edition View Documentation

Ready to monitor your AI Gateway like a pro? Start with the health check endpoint and explore the comprehensive monitoring capabilities!

System Configuration

Comprehensive configuration guide for both OSS and Enterprise editions

Wag-Tail uses a hierarchical configuration system that supports YAML-based files, environment variable overrides, runtime updates, and edition-specific features with automatic capability detection.

YAML Configuration

Structured settings with clear hierarchy

Environment Overrides

Flexible deployment configuration

Runtime Updates

Dynamic configuration changes

Edition-Specific

Automatic capability detection

Configuration File Structure

config/
 sys_config.yaml           # Main configuration file
 plugins_config.yaml       # Plugin-specific settings
 security_config.yaml      # Security policies
 llm_providers.yaml        # LLM provider configurations
 environments/
     development.yaml      # Development overrides
     staging.yaml          # Staging environment settings
     production.yaml       # Production configuration

Core Configuration Sections

Basic Application Configuration

# Basic sys_config.yaml
edition: "enterprise"  # or "oss"
version: "1.0.0"
environment: "production"

app:
  name: "Wag-Tail AI Gateway"
  host: "0.0.0.0"
  port: 8000
  debug: false
  workers: 4
  max_request_size_mb: 10
  request_timeout: 300

database:
  type: "postgresql"  # sqlite, postgresql, mysql
  postgresql:
    host: "${DB_HOST:localhost}"
    port: "${DB_PORT:5432}"
    database: "${DB_NAME:wagtail}"
    username: "${DB_USER:wagtail}"
    password: "${DB_PASSWORD}"
    pool_size: 10

logging:
  level: "${LOG_LEVEL:INFO}"
  format: "json"
  file:
    enabled: true
    path: "logs/wagtail.log"
    max_size_mb: 100

Security Configuration

security:
  # API Authentication
  api_keys:
    enabled: true
    header_name: "X-API-Key"
    allow_query_param: false  # Security: disable for production
    default_key: "${DEFAULT_API_KEY}"
    
  # Rate limiting
  rate_limiting:
    enabled: true
    per_minute: 100
    per_hour: 1000
    per_day: 10000
    burst_limit: 20
    
  # Content filtering
  content_filtering:
    enabled: true
    block_code_execution: true
    block_sql_injection: true
    block_xss_attempts: true
    
  # PII protection
  pii_protection:
    enabled: true
    detection_confidence: 0.8
    anonymization_method: "mask"  # mask, replace, redact
    entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "SSN", "CREDIT_CARD"]
    
  # TLS/SSL settings
  tls:
    enabled: true
    cert_file: "${TLS_CERT_FILE:certs/server.crt}"
    key_file: "${TLS_KEY_FILE:certs/server.key}"

LLM Provider Configuration

llm:
  default_provider: "openai"
  default_model: "gpt-3.5-turbo"
  
  providers:
    ollama:
      enabled: true
      api_url: "${OLLAMA_URL:http://localhost:11434/api/generate}"
      models: ["mistral", "llama2", "codellama"]
      timeout: 60
      max_retries: 3
      
    openai:
      enabled: true
      api_key: "${OPENAI_API_KEY}"
      api_url: "https://api.openai.com/v1"
      models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
      timeout: 120
      max_tokens: 4000
      temperature: 0.7
      
    gemini:
      enabled: true
      api_key: "${GEMINI_API_KEY}"
      api_url: "https://generativelanguage.googleapis.com/v1"
      models: ["gemini-pro", "gemini-pro-vision"]
      timeout: 90
      
    azure:
      enabled: true
      api_key: "${AZURE_OPENAI_API_KEY}"
      api_url: "${AZURE_OPENAI_ENDPOINT}"
      api_version: "2023-12-01-preview"
      deployment_name: "${AZURE_DEPLOYMENT_NAME}"

Enterprise Features Configuration

# Redis configuration (Enterprise)
redis:
  enabled: true
  host: "${REDIS_HOST:localhost}"
  port: "${REDIS_PORT:6379}"
  password: "${REDIS_PASSWORD}"
  database: 0
  max_connections: 20

# Semantic caching (Enterprise)
caching:
  semantic:
    enabled: true
    provider: "redis"
    ttl: 3600  # seconds
    similarity_threshold: 0.85
    max_cache_size_mb: 1000
    
  response:
    enabled: true
    default_ttl: 300
    max_ttl: 86400

# Monitoring & observability
monitoring:
  metrics:
    enabled: true
    endpoint: "/metrics"
    format: "prometheus"
    
  tracing:
    enabled: true
    provider: "jaeger"
    endpoint: "${TRACING_ENDPOINT}"
    service_name: "wagtail-gateway"
    sample_rate: 0.1
    
  apm:
    enabled: true
    provider: "newrelic"
    license_key: "${APM_LICENSE_KEY}"

Environment Configuration

Development

# environments/development.yaml
app:
  debug: true
  reload: true
  workers: 1

logging:
  level: "DEBUG"
  console:
    colored: true

database:
  type: "sqlite"
  sqlite:
    path: "data/dev.db"

security:
  rate_limiting:
    enabled: false
  tls:
    enabled: false

Production

# environments/production.yaml
app:
  debug: false
  reload: false
  workers: 8
  
security:
  rate_limiting:
    enabled: true
    per_minute: 60
  tls:
    enabled: true
    verify_client: true
    
logging:
  level: "INFO"
  aggregation:
    enabled: true
    
monitoring:
  metrics:
    enabled: true
  tracing:
    enabled: true
  apm:
    enabled: true

Environment Variables

Application

WAGTAIL_ENVIRONMENT - Environment name

WAGTAIL_HOST - Bind host

WAGTAIL_PORT - Bind port

WAGTAIL_WORKERS - Worker processes

Database

DB_HOST - Database host

DB_PORT - Database port

DB_NAME - Database name

DB_USER - Database user

DB_PASSWORD - Database password

LLM APIs

OPENAI_API_KEY - OpenAI API key

GEMINI_API_KEY - Google Gemini key

AZURE_OPENAI_API_KEY - Azure OpenAI key

ANTHROPIC_API_KEY - Anthropic API key

Security

DEFAULT_API_KEY - Default API key

JWT_SECRET - JWT signing secret

TLS_CERT_FILE - TLS certificate

WEBHOOK_SECRET - Webhook secret

Configuration Loading Hierarchy

Default Values

Built-in defaults (lowest priority)

Base Config

sys_config.yaml file

Environment Files

environments/{env}.yaml

Plugin Configs

Plugin-specific settings

Environment Variables

Runtime overrides (highest priority)

Configuration Best Practices

Security

Use environment variables for secrets
Never commit secrets to version control
Implement configuration validation
Rotate secrets regularly
Use secure file permissions (600/640)

Performance

Cache configuration in memory
Use lazy loading for large configs
Optimize configuration parsing
Monitor configuration load times
Minimize configuration file size

Operations

Version control configuration files
Test changes in staging first
Implement rollback procedures
Document all configuration options
Use configuration templates

Testing

Validate configuration syntax
Test in multiple environments
Implement configuration test suites
Use configuration smoke tests
Check for drift detection

Configuration Troubleshooting

Validation Commands

# Check file permissions
ls -la config/sys_config.yaml

# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"

# Check environment variables
env | grep WAGTAIL

# Test database connectivity
python -c "import psycopg2; conn = psycopg2.connect(host='localhost', database='wagtail', user='wagtail', password='password'); print('Connected')"

# Debug configuration loading
python -c "from config_loader import load_configuration; print(load_configuration())"

Enterprise Reference Architecture

Wag-Tail AI Gateway is designed for flexible deployment across various infrastructure environments, from simple single-server deployments to complex multi-cloud, multi-region enterprise architectures.

Edge Layer

CDN WAF Load Balancer

API Gateway Layer

API Gateway Authentication Rate Limiting

Application Layer

Wag-Tail Pods Service Mesh Auto Scaling

Data Layer

PostgreSQL Redis Cluster Object Storage

Single Server

Perfect for development and small-scale deployments using Docker Compose

Docker containers
Nginx reverse proxy
Local PostgreSQL & Redis

Kubernetes

Enterprise-scale deployment with auto-scaling and high availability

Horizontal Pod Autoscaling
Service mesh integration
Cloud-native storage

Multi-Cloud

Global deployment across AWS, Azure, and GCP with API gateway integration

Regional deployments
Global load balancing
Cross-cloud replication

Single-Server Deployment

Ideal for development, testing, and small-scale production environments.

Architecture Components

Internet

Nginx

Wag-Tail

PostgreSQL/Redis

Docker Compose Configuration

# docker-compose.yml
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    container_name: wagtail_nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - wagtail
    restart: unless-stopped

  wagtail:
    image: wagtail/ai-gateway:latest
    container_name: wagtail_app
    environment:
      - WAGTAIL_ENVIRONMENT=production
      - DB_HOST=postgres
      - REDIS_HOST=redis
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./config:/app/config
      - ./logs:/app/logs
    depends_on:
      - postgres
      - redis
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  postgres:
    image: postgres:15-alpine
    container_name: wagtail_postgres
    environment:
      - POSTGRES_DB=wagtail
      - POSTGRES_USER=wagtail
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    container_name: wagtail_redis
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Nginx Configuration

# nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream wagtail_backend {
        server wagtail:8000;
    }

    server {
        listen 80;
        server_name your-domain.com;
        return 301 https://$server_name$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name your-domain.com;

        ssl_certificate /etc/nginx/ssl/server.crt;
        ssl_certificate_key /etc/nginx/ssl/server.key;

        location / {
            proxy_pass http://wagtail_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        location /health {
            proxy_pass http://wagtail_backend/health;
            access_log off;
        }
    }
}

Quick Start Commands

docker-compose up -d Start all services

docker-compose logs -f wagtail View application logs

docker-compose exec wagtail /app/healthcheck.sh Check application health

Kubernetes Deployment

Enterprise-scale deployment with auto-scaling, high availability, and cloud-native features.

Kubernetes Architecture

Ingress Layer

Nginx Ingress Cert Manager TLS Termination

Application Layer

Deployment Service HPA ConfigMap Secrets

Data Layer

PostgreSQL Cluster Redis Cluster Persistent Volumes

Monitoring Layer

Prometheus Grafana Jaeger AlertManager

Core Kubernetes Manifests

Deployment Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wagtail-gateway
  namespace: wagtail
  labels:
    app: wagtail-gateway
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: wagtail-gateway
  template:
    metadata:
      labels:
        app: wagtail-gateway
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: wagtail-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: wagtail
        image: wagtail/ai-gateway:v1.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 8000
          protocol: TCP
        env:
        - name: WAGTAIL_ENVIRONMENT
          value: "production"
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wagtail-secrets
              key: db-password
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wagtail-secrets
              key: redis-password
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
          readOnly: true
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config-volume
        configMap:
          name: wagtail-config

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: wagtail-hpa
  namespace: wagtail
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: wagtail-gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wagtail-ingress
  namespace: wagtail
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
  tls:
  - hosts:
    - api.wagtail.ai
    secretName: wagtail-tls
  rules:
  - host: api.wagtail.ai
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: wagtail-service
            port:
              number: 80

Deployment Commands

kubectl apply -f k8s/ Deploy all manifests

kubectl get pods -n wagtail Check pod status

kubectl logs -f deployment/wagtail-gateway -n wagtail View application logs

kubectl port-forward svc/wagtail-service 8080:80 -n wagtail Local port forwarding

Multi-Cloud Deployment

Global deployment across AWS, Azure, and GCP with regional failover and API gateway integration.

Global Architecture

AWS US-East

EKS Cluster
RDS PostgreSQL
ElastiCache Redis
API Gateway

Azure EU-West

AKS Cluster
Azure Database
Azure Cache
API Management

GCP Asia-Pacific

GKE Cluster
Cloud SQL
Memorystore
Apigee

Global Services

CloudFlare DNS HashiCorp Vault Global Monitoring Cross-Region Backup

Terraform Infrastructure

# EKS Cluster
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  
  cluster_name    = "wagtail-cluster"
  cluster_version = "1.28"
  
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
  
  eks_managed_node_groups = {
    wagtail_nodes = {
      desired_size = 3
      max_size     = 10
      min_size     = 3
      
      instance_types = ["t3.large"]
      
      k8s_labels = {
        Environment = "production"
        Application = "wagtail"
      }
    }
  }
}

# RDS PostgreSQL
resource "aws_db_instance" "wagtail_db" {
  identifier = "wagtail-postgres"
  
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.r6g.large"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_encrypted     = true
  
  db_name  = "wagtail"
  username = "wagtail"
  password = var.db_password
  
  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.wagtail.name
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot = false
  final_snapshot_identifier = "wagtail-final-snapshot"
}

Kong API Gateway Integration

_format_version: "3.0"

services:
  - name: wagtail-gateway
    url: http://wagtail-service.wagtail.svc.cluster.local:80
    retries: 3
    connect_timeout: 10000
    read_timeout: 60000
    write_timeout: 60000

routes:
  - name: wagtail-chat
    service: wagtail-gateway
    paths:
      - /chat
    methods:
      - POST
    strip_path: false

plugins:
  # Rate limiting
  - name: rate-limiting
    service: wagtail-gateway
    config:
      minute: 100
      hour: 1000
      day: 10000
      policy: redis
      redis_host: redis-service.wagtail.svc.cluster.local

  # Authentication
  - name: key-auth
    service: wagtail-gateway
    config:
      key_names:
        - X-API-Key
      hide_credentials: true

  # CORS
  - name: cors
    service: wagtail-gateway
    config:
      origins:
        - "https://app.yourcompany.com"
      methods:
        - GET
        - POST
        - OPTIONS
      credentials: true
      max_age: 3600

Monitoring & Observability

Comprehensive monitoring, logging, and tracing for production deployments.

Monitoring Architecture

Metrics Collection

Prometheus Node Exporter cAdvisor Custom Metrics

Logging Pipeline

Fluentd Elasticsearch Logstash Kibana

Distributed Tracing

Jaeger Zipkin OpenTelemetry Collector

Visualization & Alerting

Grafana AlertManager PagerDuty

Prometheus Configuration

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "wagtail-rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'wagtail-gateway'
    static_configs:
      - targets: ['wagtail-service:8000']
    metrics_path: /metrics
    scrape_interval: 10s

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - wagtail
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

  - job_name: 'postgres-exporter'
    static_configs:
      - targets: ['postgres-exporter:9187']

  - job_name: 'redis-exporter'
    static_configs:
      - targets: ['redis-exporter:9121']

Key Metrics Dashboard

Request Rate

rate(wagtail_requests_total[5m])

Response Time

histogram_quantile(0.95, rate(wagtail_request_duration_seconds_bucket[5m]))

Error Rate

rate(wagtail_requests_total{status=~"4..|5.."}[5m])

LLM Response Times

wagtail_llm_request_duration_seconds

Alerting Rules

High Error Rate

Error rate > 5% for 5 minutes

High Response Time

95th percentile > 1s for 5 minutes

Pod Crash Loop

Pod restart count > 3 in 10 minutes

Database Connection Issues

Database connection pool exhausted

Security Architecture

Zero-trust security model with comprehensive protection layers.

Security Architecture Layers

Perimeter Security

CloudFlare DDoS Protection
Web Application Firewall
Rate Limiting

Identity & Access

OAuth 2.0 / OIDC
Multi-Factor Authentication
Role-Based Access Control

Network Security

Virtual Private Cloud
Private Subnets
Security Groups

Data Security

Encryption at Rest
Encryption in Transit
Secret Management

Istio Security Policies

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: wagtail-security-policy
  namespace: wagtail
spec:
  selector:
    matchLabels:
      app: wagtail-gateway
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/wagtail/sa/wagtail-service-account"]
  - to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/chat", "/health", "/metrics"]
  - when:
    - key: source.ip
      values: ["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]

---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: wagtail-mtls
  namespace: wagtail
spec:
  selector:
    matchLabels:
      app: wagtail-gateway
  mtls:
    mode: STRICT

Disaster Recovery & Backup

Daily Backup

Incremental PostgreSQL & Redis backups to S3

Weekly Full Backup

Complete system backup with configuration

Long-term Archive

Monthly backups archived to Glacier

Cross-Region DR

Standby environment in secondary region

Security Checklist

Enable TLS/SSL encryption for all endpoints

Configure API rate limiting and throttling

Set up secret management with HashiCorp Vault

Enable database encryption at rest

Configure network security policies

Set up audit logging and monitoring

Implement PII detection and masking

Configure automated backup and recovery

AI Gateway Built for Production Scale

What is Wag-Tail AI Gateway?

Quick Start

Core Problems Wag-Tail Solves

Security & Compliance Challenges

Performance & Cost Optimization

Enterprise Requirements

Architecture Overview

Your Applications

Wag-Tail AI Gateway

LLM Providers

Performance Benchmarks

Semantic Cache Performance

Security Processing

Throughput Capacity

Three Editions, Unlimited Possibilities

Basic Edition

Advanced Edition

Enterprise Edition

Real Stories, Real Results

The $50K Surprise

The Data Breach That Didn't Happen

2AM on Black Friday

The 10-Second Wait

Shadow AI

The Multi-Tenant Nightmare

Enterprise Security

Semantic Caching

Intelligent Routing

Observability

Rate Limiting

Plugin Pipeline

LLM Provider Integration Framework

No Coding Required

1-Minute Setup

Hot-Reload Support

Automatic Fallback

Configuration-Driven Provider Support

OpenAI (GPT Models)

Azure OpenAI (Enterprise GPT)

Google Gemini (Multimodal)

Anthropic Claude (Safety-First)

Ollama (Local Models)

Advanced Provider Features

Seamless Provider Switching

Multi-Provider Routing

Intelligent Failover

Header-Based Selection

Simple YAML Configuration = Unlimited Models

Which Approach Should I Use?

Config Only (90%)

Simple Mapping (8%)

Full Plugin (2%)

Real-World Configuration Examples

Adding New GPT Model (1 minute)

Adding Perplexity API (10 minutes)

Enterprise Custom Endpoint (5 minutes)

Complete YAML Configuration Reference

Environment Variables Setup

Testing Your Configuration

1. Validate YAML Syntax

2. Test Provider Connectivity

3. Hot-Reload Configuration (Advanced Edition)

Extensible Framework Architecture

Core Design Principles

1. Provider Abstraction

2. Unified Response Format

3. Plugin-Based Architecture

Framework Benefits

Adding New Models - Configuration Over Code

Choose Your Approach Based on Your Needs

Approach 1: Config Only (90%)

Approach 2: Simple Mapping (8%)

Approach 3: Full Plugin (2%)

Approach 1: Config Only (90% of users)

Edit YAML Configuration

Test Immediately

Approach 2: OpenAI-Compatible (8% of users)

Add Provider Configuration

Enable Provider (One-line change)

AI Gateway Built for
Production Scale