AI Gateway Built for
Production Scale
Wag-Tail is a FastAPI gateway that fronts multiple LLM providers with advanced security, semantic caching, intelligent routing, and comprehensive rate limit monitoring. Built for enterprise-grade AI applications.
What is Wag-Tail AI Gateway?
Wag-Tail AI Gateway is a comprehensive, enterprise-grade security and routing layer for Large Language Model (LLM) applications. It sits between your applications and LLM providers, providing advanced security filtering, intelligent routing, performance optimization, and enterprise-grade observability.
Whether you're building customer-facing AI applications, internal tools, or enterprise AI platforms, Wag-Tail ensures your LLM interactions are secure, fast, and compliant while giving you complete control over costs, routing, and data governance.
Quick Start
Get started with Wag-Tail in under 5 minutes:
import requests
# Replace your direct OpenAI calls
response = requests.post(
"https://your-wagtail-gateway.com/chat",
headers={
"X-API-Key": "your-api-key",
"Content-Type": "application/json"
},
json={"prompt": "What is machine learning?"}
)
# Get secure, filtered, and optimized responses
result = response.json()
print(result["response"]) # AI response with security filtering applied
print(result["cache_hit"]) # True if served from semantic cache (30x faster)
That's it. No complex integrations, no infrastructure changes. Just point your existing LLM calls to Wag-Tail and get enterprise-grade security and performance immediately.
Core Problems Wag-Tail Solves
Security & Compliance Challenges
The Problem: Direct LLM API calls expose your applications to:
- Prompt injection attacks
- Data leakage and PII exposure
- Malicious content generation
- Compliance violations (GDPR, HIPAA, SOX)
Wag-Tail Solution: Multi-layer security pipeline with:
- AI-powered threat detection using DistilBERT classification
- PII detection and masking with custom recognizers
- SQL injection and XSS protection with pattern-based filtering
- Output sanitization preventing harmful content generation
- Real-time security monitoring with webhook integrations
Performance & Cost Optimization
The Problem: LLM calls are:
- Expensive ($0.01-$0.30+ per request)
- Slow (2-10 second response times)
- Inconsistent (provider outages, rate limits)
Wag-Tail Solution: Intelligent optimization with:
- Semantic caching delivering 30x+ performance improvements
- Multi-provider routing with automatic failover
- Cost-optimized model selection based on prompt complexity
- Request prioritization for enterprise customers
Enterprise Requirements
The Problem: Production LLM deployments need:
- Centralized governance and control
- Detailed usage analytics and billing
- Multi-tenant isolation
- Audit trails and compliance reporting
Wag-Tail Solution: Enterprise-grade platform with:
- Multi-organization isolation with per-tenant quotas
- Comprehensive audit logging for compliance teams
- Real-time usage analytics for cost management
- Vault integration for secure credential management
- Role-based access control with group-level permissions
Architecture Overview
Your Applications
HTTP/HTTPS requests
Wag-Tail AI Gateway
- API Key Authentication
- Regex & Code Injection Filtering
- PII Detection & Masking
- AI Threat Classification (Advanced)
- Rate Limiting & Quotas (Advanced)
- Output Content Filtering
- Semantic Cache (Advanced) - 30x faster
- Priority Queue (Advanced) - Enterprise SLA
- Smart Routing & Failover
- Request/Response Logging
- Usage Analytics & Billing
- Langfuse Integration (Advanced)
- Webhook Events
LLM Providers
OpenAI, Azure, Gemini, Claude, Ollama
Performance Benchmarks
Semantic Cache Performance
Security Processing
Throughput Capacity
Three Editions, Unlimited Possibilities
Choose between Basic OSS, Advanced Licensed, or Enterprise editions based on your needs
Basic Edition
OPEN SOURCE- PII Protection & Filtering
- Code Detection & Security
- Multi-Provider Support (11+)
- Basic Authentication
- Plugin Architecture
- Production Ready
Advanced Edition
LICENSED- Everything in Basic +
- AI Classification & Routing
- Priority Queuing System
- Semantic Caching (Redis)
- Multi-Provider Failover
- Webhook Guardrail Integration
- Langfuse Integration
- Vault-Managed Secrets
- Group Management
- Admin API Access
Enterprise Edition
ENTERPRISE- Everything in Advanced +
- Complete Admin Portal
- Cost Management & Analytics
- Enterprise SSO (SAML/OIDC)
- Custom Branding
- Compliance Reporting
- Real-time Dashboards
- White-label Options
- 24/7 Dedicated Support
- Custom SLAs
- Enterprise Admin Portal
Enterprise Security
Advanced PII protection, code detection, and comprehensive audit trails for enterprise compliance.
Semantic Caching
Redis-powered semantic caching reduces costs and improves response times for similar queries.
Intelligent Routing
YAML-driven routing with health-based failover across multiple LLM providers.
Observability
Comprehensive monitoring with Langfuse integration, metrics, and distributed tracing.
Rate Limiting
Group-based rate limiting and usage tracking with priority queuing capabilities.
Plugin Pipeline
Extensible plugin architecture for custom security, processing, and integration needs.
LLM Provider Integration Framework
Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.
No Coding Required
90% of new model additions need only YAML configuration
1-Minute Setup
New models from existing providers in 1 minute
Hot-Reload Support
Configuration changes without restart needed
Automatic Fallback
Automatic error handling with fallback configurations
Configuration-Driven Provider Support
We ship with 5 major providers and carefully selected models, but you can easily add unlimited models through simple YAML configuration.
OpenAI (GPT Models)
Supported Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o
Use Cases: General purpose, code generation, content creation, complex reasoning
llm: provider: openai model: gpt-4 api_key: ${OPENAI_API_KEY} api_url: https://api.openai.com/v1/chat/completions openai: temperature: 0.7 max_tokens: 2048 timeout: 30
Azure OpenAI (Enterprise GPT)
Supported Models: gpt-4, gpt-35-turbo, text-embedding-ada-002
Enterprise Benefits: Data residency, private network, SOC 2/HIPAA compliance
llm: provider: azure model: gpt-4 azure: api_key: ${AZURE_OPENAI_KEY} endpoint: ${AZURE_OPENAI_ENDPOINT} deployment_name: ${AZURE_DEPLOYMENT_NAME} api_version: "2023-12-01-preview"
Google Gemini (Multimodal)
Supported Models: gemini-pro, gemini-pro-vision, gemini-ultra
Strengths: Advanced reasoning, multimodal capabilities, built-in safety filters
llm: provider: gemini model: gemini-pro gemini: api_key: ${GOOGLE_API_KEY} endpoint: https://generativelanguage.googleapis.com/v1 safety_settings: harassment: "block_medium_and_above" hate_speech: "block_medium_and_above"
Anthropic Claude (Safety-First)
Supported Models: claude-3-opus, claude-3-sonnet, claude-3-haiku
Features: Up to 200K tokens context, extensive safety training, ethical reasoning
llm: provider: anthropic model: claude-3-sonnet anthropic: api_key: ${ANTHROPIC_API_KEY} endpoint: https://api.anthropic.com/v1/messages max_tokens: 4000 system_message: "You are a helpful AI assistant."
Ollama (Local Models)
Supported Models: mistral, llama2, codellama, neural-chat, starcode
Benefits: Complete data privacy, no per-token charges, offline capability
llm: provider: ollama model: mistral ollama: api_url: http://localhost:11434/api/generate timeout: 60 context_length: 4096 # Installation: # brew install ollama # ollama pull mistral # ollama serve
Advanced Provider Features
Seamless Provider Switching
Switch providers without changing application code
# Development
llm:
provider: ollama
model: mistral
# Production
llm:
provider: azure
model: gpt-4
Multi-Provider Routing
Route different request types to optimal providers
routing_rules:
- condition: "request_type == 'code'"
provider: "anthropic"
model: "claude-3-opus"
- condition: "cost_sensitive == true"
provider: "ollama"
model: "mistral"
Intelligent Failover
Automatic failover ensures high availability
fallback_chain:
- provider: "azure" # Primary
- provider: "openai" # Backup
- provider: "gemini" # Secondary
- provider: "ollama" # Last resort
Header-Based Selection
Override provider and model via HTTP headers
curl -X POST /chat \
-H "X-LLM-Provider: openai" \
-H "X-LLM-Model: gpt-4" \
-d '{"prompt": "Hello!"}'
Simple YAML Configuration = Unlimited Models
Adding new LLM models is as simple as editing a YAML file! No coding required for 90% of use cases.
Which Approach Should I Use?
Config Only (90%)
New model from existing provider
Simple Mapping (8%)
OpenAI-compatible API
Full Plugin (2%)
Different API format
Real-World Configuration Examples
Adding New GPT Model (1 minute)
When OpenAI releases GPT-5, just add it to your YAML:
# config/sys_config.yaml - Just add to existing list! llm: provider: openai model: gpt-5 # NEW - just change the model name! openai: api_key: ${OPENAI_API_KEY} models: - gpt-4 # Existing - gpt-4-turbo # Existing - gpt-3.5-turbo # Existing - gpt-5 # NEW - just add to list! timeout: 30
Adding Perplexity API (10 minutes)
Perplexity uses OpenAI-compatible format:
# config/sys_config.yaml - OpenAI-compatible provider llm: provider: perplexity model: sonar-medium-online perplexity: api_key: ${PERPLEXITY_API_KEY} endpoint: https://api.perplexity.ai # Different endpoint provider_type: openai_compatible # Maps to OpenAI implementation models: - sonar-medium-online - sonar-small-chat
Enterprise Custom Endpoint (5 minutes)
Your company's custom OpenAI deployment:
# config/sys_config.yaml - Custom enterprise endpoint llm: provider: custom_enterprise model: custom-gpt-4-fine-tuned custom_enterprise: api_key: ${ENTERPRISE_API_KEY} endpoint: https://llm.yourcompany.com/v1 provider_type: openai_compatible models: - custom-gpt-4-fine-tuned - company-specific-model
Complete YAML Configuration Reference
Multi-provider configuration with failover chains:
# config/sys_config.yaml - Complete example with all providers llm: provider: openai # Default provider model: gpt-4 # Default model # OpenAI Configuration openai: api_key: ${OPENAI_API_KEY} models: [gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o] timeout: 30 # Anthropic Configuration anthropic: api_key: ${ANTHROPIC_API_KEY} models: [claude-3-opus, claude-3-sonnet, claude-3-haiku] timeout: 30 # Google Gemini Configuration gemini: api_key: ${GOOGLE_API_KEY} models: [gemini-pro, gemini-pro-vision, gemini-ultra] timeout: 30 # Azure OpenAI Configuration azure: api_key: ${AZURE_OPENAI_KEY} endpoint: ${AZURE_OPENAI_ENDPOINT} deployment_name: ${AZURE_DEPLOYMENT_NAME} api_version: "2023-12-01-preview" models: [gpt-4, gpt-35-turbo] # Ollama (Local) Configuration ollama: api_url: http://localhost:11434/api/generate models: [mistral, llama2, codellama, neural-chat] # Together.ai (OpenAI-compatible) together: api_key: ${TOGETHER_API_KEY} endpoint: https://api.together.xyz/inference provider_type: openai_compatible models: [meta-llama/Llama-2-70b-chat-hf, mistralai/Mistral-7B-Instruct-v0.1] # Failover Configuration (Advanced Edition) routing: fallback_chain: - provider: azure model: gpt-4 - provider: openai model: gpt-4 - provider: gemini model: gemini-pro - provider: ollama model: mistral
Environment Variables Setup
# .env file - Set up your API keys export OPENAI_API_KEY="sk-your-openai-key" export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key" export GOOGLE_API_KEY="your-google-api-key" export AZURE_OPENAI_KEY="your-azure-key" export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com" export AZURE_DEPLOYMENT_NAME="your-deployment-name" export TOGETHER_API_KEY="your-together-key" export PERPLEXITY_API_KEY="your-perplexity-key"
Testing Your Configuration
1. Validate YAML Syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"
2. Test Provider Connectivity
curl -X POST http://localhost:8000/chat \ -H "X-API-Key: your-api-key" \ -H "X-LLM-Provider: your-new-provider" \ -H "X-LLM-Model: your-new-model" \ -d '{"prompt": "Test message"}'
3. Hot-Reload Configuration (Advanced Edition)
curl -X POST http://localhost:8000/admin/reload_config \ -H "X-Admin-API-Key: your-admin-key"
Extensible Framework Architecture
Core Design Principles
1. Provider Abstraction
All providers implement a common interface
class BaseLLMProvider:
def generate(self, prompt: str, context: Dict) -> LLMResponse
def is_available(self) -> bool
def get_models(self) -> List[str]
def estimate_cost(self, prompt: str, response: str) -> float
2. Unified Response Format
Consistent response structure across all providers
@dataclass
class LLMResponse:
content: str
model: str
provider: str
usage: Dict[str, int]
latency_ms: int
success: bool
3. Plugin-Based Architecture
Providers are automatically discovered and registered
entry_points={
'wag_tail_llm_providers': [
'openai = wag_tail_llm_openai:OpenAIProvider',
'azure = wag_tail_llm_azure:AzureProvider',
'custom = wag_tail_llm_custom:CustomProvider',
],
}
Framework Benefits
Adding New Models - Configuration Over Code
Most users never need to write code! Here's how to add new LLM models using our configuration-driven approach:
Choose Your Approach Based on Your Needs
Approach 1: Config Only (90%)
EasyWhen to use: Adding new models from existing providers (OpenAI, Anthropic, Google, Azure, Ollama)
- OpenAI releases GPT-5
- Anthropic adds Claude-4
- New Ollama model available
Approach 2: Simple Mapping (8%)
MediumWhen to use: OpenAI-compatible APIs with different endpoints
- Together.ai
- Replicate
- Perplexity
- Your company's custom endpoint
Approach 3: Full Plugin (2%)
AdvancedWhen to use: Completely different API formats requiring custom code
- Cohere
- AI21 Labs
- Custom proprietary APIs
Approach 1: Config Only (90% of users)
Edit YAML Configuration
Just add the new model to your existing provider configuration
# config/sys_config.yaml llm: provider: openai model: gpt-5 # NEW - just change model name! openai: models: - gpt-4 # Existing - gpt-4-turbo # Existing - gpt-5 # NEW - add to list! # ... rest of config unchanged
Test Immediately
No restart required with hot-reload support
curl -X POST http://localhost:8000/chat \ -H "X-LLM-Provider: openai" \ -H "X-LLM-Model: gpt-5" \ -d '{"prompt": "Hello from new model!"}'
Approach 2: OpenAI-Compatible (8% of users)
Add Provider Configuration
Configure the new provider with OpenAI-compatible mapping
# config/sys_config.yaml llm: provider: perplexity model: sonar-medium-online perplexity: api_key: ${PERPLEXITY_API_KEY} endpoint: https://api.perplexity.ai # Different endpoint provider_type: openai_compatible # Key mapping models: - sonar-medium-online - sonar-small-chat
Enable Provider (One-line change)
Add to compatible providers list
# config/provider_mappings.py OPENAI_COMPATIBLE_PROVIDERS = [ 'together', 'replicate', 'perplexity' # Just add this line! ]
Approach 3: Full Plugin Development (2% of users)
Only needed for completely different API formats. See our Plugin Development Guide for detailed instructions.
Tip: Before building a full plugin, check if your provider has an OpenAI-compatible API. Many modern LLM providers now offer OpenAI compatibility!
Real-World Success Stories
Enterprise Success
5 minutesCompany: Fortune 500 Financial Services
Need: Private GPT-4 deployment behind corporate firewall
Solution: Added custom endpoint configuration - no coding required!
Startup Speed
2 minutesCompany: AI Startup
Need: Switch from OpenAI to Together.ai for cost savings
Solution: Simple provider mapping - saved 80% on API costs!
Research Lab
1 minuteOrganization: University AI Research Lab
Need: Test latest Claude-3.5-Sonnet model
Solution: Added to models list - immediate access to new capabilities!
Performance Benchmarks & Cost Analysis
Latency Comparison (Average Response Time)
Cost Comparison (per 1M tokens)
Provider Selection Guidelines
Development
Use Ollama for cost-effective testing and rapid iteration
Production
Use Azure OpenAI for enterprise reliability and SLA guarantees
High Volume
Mix of providers for load distribution and cost optimization
Cost Sensitive
Use Gemini Pro or local models for budget constraints
Complex Reasoning
Use Claude-3-opus or GPT-4 for analytical tasks
Speed Critical
Use GPT-3.5-turbo or Gemini Pro for low latency needs
Request Lifecycle Architecture
Every request flows through our secure, optimized pipeline
Authentication
API key validation and organization resolution
Security Filters
PII protection, code detection, content classification
Rate Limiting
Group-based limits and priority queue management
Semantic Cache
Redis-powered caching for similar queries
LLM Routing
Provider selection and failover handling
Response
Caching, metrics, and audit trail completion
StarToken Plugin Framework
Build custom plugins for the Wag-Tail AI Gateway with our powerful, extensible framework
The StarToken Plugin Framework is Wag-Tail's modular architecture that allows you to extend the AI Gateway with custom functionality. Whether you need specialized security filters, custom authentication, unique analytics, or integrations with third-party services, our plugin system provides the foundation to build exactly what you need.
Built-in Plugins (Included)
The Wag-Tail AI Gateway ships with a comprehensive set of production-ready plugins:
Security & Authentication
- Database + Redis cached API key validation
- Org/Group ID resolution
- Fast lookup with fallback
- SQL injection detection
- Code pattern filtering
- XSS protection
- System command blocking
- Personally Identifiable Information detection
- Phone number, email, SSN masking
- GDPR compliance support
- Custom Hong Kong ID recognition
- DistilBERT-powered intent classification
- Attack/jailbreak detection
- Offensive content filtering
- Custom threat model training
- LLM response filtering
- Sensitive information blocking
- Policy-based response modification
- Multi-layer output validation
Performance & Routing
- ChromaDB-powered similarity matching
- 30x+ performance improvement
- Intelligent cache invalidation
- Embedding-based retrieval
- Multi-provider load balancing
- Automatic failover chains
- Cost optimization routing
- Latency-based selection
- Weighted fair queuing
- Enterprise customer prioritization
- Anti-starvation algorithms
- SLA-based scheduling
Enterprise & Operations
- Per-organization quota management
- Hierarchical rate limiting
- Real-time usage tracking
- Monthly/daily/hourly limits
- HashiCorp Vault credential management
- Dynamic secret rotation
- Secure API key storage
- Enterprise key lifecycle
- Comprehensive observability
- Request/response tracing
- Performance analytics
- Custom metric collection
- External security system integration
- Real-time event streaming
- HMAC signature verification
- Configurable event filtering
Plugin Benefits by Use Case
Financial Services
- PII Guard: Automatic detection and masking of sensitive financial data
- AI Classifier: Advanced threat detection for financial prompt attacks
- Vault Integration: Secure credential management for regulatory compliance
- Priority Queue: VIP customer request prioritization
Healthcare
- PII Guard: HIPAA-compliant PHI detection and redaction
- Output Guard: Medical advice filtering and liability protection
- Semantic Cache: Fast retrieval while maintaining privacy
- Webhook GuardRail: Integration with healthcare monitoring systems
E-commerce
- Priority Queue: Premium customer service levels
- Group Rate Limit: Tiered API access based on subscription
- LLM Routing: Cost-optimized model selection
- Semantic Cache: Fast product recommendation responses
Enterprise SaaS
- Key Authentication: Multi-tenant API key management
- Langfuse Telemetry: Customer usage analytics and billing
- Vault Integration: Secure multi-environment deployments
- Group Rate Limit: Fair usage across customer tiers
Plugin Architecture
Plugin Lifecycle
on_request(request, context)
- Authentication & authorization
- Request validation & filtering
- Rate limiting & quota checks
- Pre-processing transformations
LLM Provider Call
- Semantic cache lookup (if available)
- LLM routing & provider selection
- Actual LLM API request
on_response(request, context, llm_output)
- Response filtering & validation
- Output guard & safety checks
- Telemetry & analytics collection
- Post-processing transformations
Building Custom Plugins
1. Plugin Structure
Create a standard Python package structure with setup.py, plugin class, and configuration.
my_custom_plugin/
setup.py # Package configuration
my_custom_plugin/
__init__.py # Plugin module
plugin.py # Main plugin class
config.py # Configuration loader
utils.py # Helper functions
README.md # Plugin documentation
2. Plugin Implementation
Extend the PluginBase class and implement the required methods:
from plugins.base import PluginBase
from fastapi.responses import JSONResponse
class MyCustomPlugin(PluginBase):
name = "my_custom_plugin"
def on_request(self, request, context):
# Validate requests before LLM processing
return None # Continue or return JSONResponse to block
def on_response(self, request, context, llm_output):
# Process responses after LLM
return llm_output, False # (response, modified)
3. Entry Point Registration
Register your plugin with the framework using Python entry points:
entry_points={
"wag_tail.plugins": [
"my_custom_plugin = my_package.module:PluginClass"
]
}
4. Installation & Configuration
Install and configure your plugin in the gateway environment:
# Install plugin
pip install my_custom_plugin
# Configure in sys_config.yaml
plugins:
enabled:
- my_custom_plugin
Plugin Examples
Custom Rate Limiter
IP-based rate limiting with sliding window
class CustomRateLimiterPlugin(PluginBase):
def on_request(self, request, context):
client_ip = context.get("client_ip")
if self.is_rate_limited(client_ip):
return JSONResponse(
{"error": "Rate limit exceeded"},
status_code=429
)
return None
Content Enrichment
Add metadata to responses
class ContentEnrichmentPlugin(PluginBase):
def on_response(self, request, context, llm_output):
enriched_output = llm_output.copy()
enriched_output["metadata"] = {
"processed_at": time.time(),
"plugin_version": "1.0.0"
}
return enriched_output, True
Audit Logging
Log all requests to database for compliance
class AuditLogPlugin(PluginBase):
def on_request(self, request, context):
self.log_to_database(
context.get("request_id"),
context.get("org_id"),
request.json().get("prompt")
)
return None
Getting Started
Ready to build your first plugin? Follow these steps:
Plan Your Plugin
Define the specific functionality you need, identify which plugin hooks to implement, and plan your configuration and dependencies.
Set Up Development Environment
Create plugin directory structure and install Wag-Tail development dependencies.
Implement Core Functionality
Start with basic plugin structure, implement required methods, add configuration loading, and include comprehensive logging.
Package and Deploy
Build wheel package, install in gateway environment, configure plugin in sys_config.yaml, and restart gateway.
Best Practices
Performance
- Keep processing time under 50ms for on_request hooks
- Set reasonable timeouts for external service calls
- Monitor plugin execution time and success rates
Security
- Use environment variables for secrets
- Validate input to prevent injection attacks
- Sanitize data before external API calls
Reliability
- Always handle exceptions and provide fallback behavior
- Include relevant context in all log messages
- Write comprehensive unit tests
Documentation
Everything you need to get started, configure, and deploy Wag-Tail at scale
Getting Started with Wag-Tail
Your journey to production-ready AI gateway deployment
5-Minute Quick Start
Perfect for development, prototyping, and getting started quickly
Clone Repository
# Contact support@wag-tail.com for source code access
cd wag-tail-ai-gateway
Setup Environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
System Requirements
Verification & Testing
Health Check
curl http://localhost:8000/admin/health \
-H "X-Admin-API-Key: your-admin-key"
{"status": "healthy", "version": "3.4.0", "edition": "basic"}
Security Test
curl -X POST http://localhost:8000/chat \
-H "X-API-Key: b6c91d9d2ff66624356f5e5cfd03dc784d80a2eedd6af0d94e908d7b19e25e85" \
-H "Content-Type: application/json" \
-d '{"prompt": "SELECT * FROM users; DROP TABLE users;"}'
{"flag": "blocked", "reason": "SQL injection pattern detected"}
Admin API - Advanced Edition Only
Comprehensive monitoring and management capabilities for Wag-Tail AI Gateway Advanced Edition
The Admin API provides comprehensive monitoring and management capabilities for Wag-Tail AI Gateway Advanced Edition. All admin endpoints require an Advanced Edition license and valid admin API key.
What you can do with the Admin API
Monitor System Health
Get real-time status of all critical services including Redis, Vault, Langfuse, and core application components
Analyze Usage Patterns
Access detailed analytics across organizations, endpoints, and time periods
Manage Cache Performance
Monitor semantic cache effectiveness and clear cache when needed
Automate Operations
Integrate with monitoring systems, CI/CD pipelines, and alerting platforms
Who can use the Admin API
Advanced Edition Users
The Admin API is exclusively available to Advanced Edition customers. With your Advanced license, you get full access to all administrative endpoints with enterprise-grade capabilities.
Basic Edition Users
Basic Edition (OSS) users do not have access to any Admin API functionality. All admin endpoints will return 403 Forbidden
for Basic Edition deployments.
- Application logs and standard logging frameworks
- External monitoring tools (Prometheus, Grafana, DataDog)
- Health check endpoints in your applications
- Standard observability and APM solutions
Getting Started
Prerequisites
Before using the Admin API, ensure you have:
Authentication
All admin endpoints require the x-admin-api-key
header:
curl -H "x-admin-api-key: YOUR_ADMIN_API_KEY" http://localhost:8000/admin/endpoint
The admin API key is configured in your system configuration:
admin:
api_key: "your_admin_api_key_here"
Rate Limit Monitoring Endpoints (NEW)
Get Organization Rate Limit Status
Monitor comprehensive rate limit status for any organization with real-time usage tracking.
curl -X GET "http://localhost:8000/admin/rate-limit/status/Enterprise%20Customer" \
-H "x-admin-api-key: YOUR_API_KEY"
{
"org_id": "Enterprise Customer",
"edition": "advanced",
"usage_stats": {
"requests_today": 150,
"requests_this_month": 2500,
"monthly_limit": 100000,
"remaining_requests": 97500,
"usage_percentage": 2.5,
"reset_date": "2025-09-01"
},
"groups": [
{
"group_id": "production",
"requests_today": 100,
"requests_this_month": 1800,
"monthly_limit": 60000,
"remaining_requests": 58200,
"usage_percentage": 3.0
}
],
"status": "healthy",
"warnings": []
}
healthy
: Usage below 90% of limitswarning
: Usage at 90-99% of limitsover_limit
: Usage at or above 100% of limits
Get All Organizations Status
Get rate limit status for all organizations in your system.
System Status Overview
Get comprehensive system statistics and health metrics.
{
"version": "4.0.0",
"edition": "advanced",
"uptime_seconds": 3600,
"total_organizations": 5,
"active_organizations": 3,
"total_requests_today": 1250,
"plugin_count": 12
}
System Health Endpoints
Health Check
Comprehensive system health check with dependency monitoring.
{
"status": "healthy",
"services": {
"fastapi": "ok",
"redis": "ok",
"vault": "ok",
"langfuse": "ok"
},
"edition": "advanced",
"uptime_seconds": 3600
}
License Information
Get current license information and validity.
Hot-reload License
Hot-reload license without server restart.
Analytics & Usage Endpoints
Usage Statistics
Detailed usage statistics across all organizations.
Cache Statistics
Semantic cache performance metrics and analytics.
{
"total_entries": 150,
"hit_rate": 85.0,
"total_hits": 1500,
"total_requests": 1765,
"memory_usage_mb": 15.0
}
Operations & Maintenance
Clear Cache
Clear all semantic cache entries.
Reset Usage Counters
Reset all usage counters and statistics.
Usage Testing (Development)
Manually increment usage counters for testing purposes.
Monthly Limits (NEW)
Advanced Edition includes 100,000 monthly request limits with:
- Organization-level tracking
- Group-level sub-limits
- Automatic monthly resets
- Real-time usage monitoring
- Warning thresholds at 90%
- Over-limit protection
Database & Storage (NEW)
SQLite Database
Advanced Edition automatically creates and maintains a SQLite database for persistent usage tracking:
- Location:
data/wag_tail.db
- Tables:
api_keys
,org_usage
,group_usage
- Automatic Backup: Recommended for production
Monitoring Examples
Check High Usage Organizations
curl -X GET "http://localhost:8000/admin/rate-limit/status" \
-H "x-admin-api-key: YOUR_KEY" | \
jq '.[] | select(.status != "healthy")'
Monitor Specific Group Usage
curl -X GET "http://localhost:8000/admin/rate-limit/status/Enterprise%20Customer" \
-H "x-admin-api-key: YOUR_KEY" | \
jq '.groups[] | select(.usage_percentage > 80)'
Error Handling
- 403 Forbidden: Invalid admin API key or Basic Edition license
- 404 Not Found: Organization not found
- 500 Internal Server Error: Database or system error
Security & Access
- Admin API keys should be rotated regularly
- All admin operations are logged for audit trails
- Advanced Edition license required for all endpoints
- Rate limit data stored securely in local SQLite database
Getting Started
/admin/health
endpoint
group_config.yaml
Ready to monitor your AI Gateway like a pro? Start with the health check endpoint and explore the comprehensive monitoring capabilities!
System Configuration
Comprehensive configuration guide for both OSS and Enterprise editions
Wag-Tail uses a hierarchical configuration system that supports YAML-based files, environment variable overrides, runtime updates, and edition-specific features with automatic capability detection.
YAML Configuration
Structured settings with clear hierarchy
Environment Overrides
Flexible deployment configuration
Runtime Updates
Dynamic configuration changes
Edition-Specific
Automatic capability detection
Configuration File Structure
config/
sys_config.yaml # Main configuration file
plugins_config.yaml # Plugin-specific settings
security_config.yaml # Security policies
llm_providers.yaml # LLM provider configurations
environments/
development.yaml # Development overrides
staging.yaml # Staging environment settings
production.yaml # Production configuration
Core Configuration Sections
Basic Application Configuration
# Basic sys_config.yaml
edition: "enterprise" # or "oss"
version: "1.0.0"
environment: "production"
app:
name: "Wag-Tail AI Gateway"
host: "0.0.0.0"
port: 8000
debug: false
workers: 4
max_request_size_mb: 10
request_timeout: 300
database:
type: "postgresql" # sqlite, postgresql, mysql
postgresql:
host: "${DB_HOST:localhost}"
port: "${DB_PORT:5432}"
database: "${DB_NAME:wagtail}"
username: "${DB_USER:wagtail}"
password: "${DB_PASSWORD}"
pool_size: 10
logging:
level: "${LOG_LEVEL:INFO}"
format: "json"
file:
enabled: true
path: "logs/wagtail.log"
max_size_mb: 100
Security Configuration
security:
# API Authentication
api_keys:
enabled: true
header_name: "X-API-Key"
allow_query_param: false # Security: disable for production
default_key: "${DEFAULT_API_KEY}"
# Rate limiting
rate_limiting:
enabled: true
per_minute: 100
per_hour: 1000
per_day: 10000
burst_limit: 20
# Content filtering
content_filtering:
enabled: true
block_code_execution: true
block_sql_injection: true
block_xss_attempts: true
# PII protection
pii_protection:
enabled: true
detection_confidence: 0.8
anonymization_method: "mask" # mask, replace, redact
entities: ["EMAIL_ADDRESS", "PHONE_NUMBER", "SSN", "CREDIT_CARD"]
# TLS/SSL settings
tls:
enabled: true
cert_file: "${TLS_CERT_FILE:certs/server.crt}"
key_file: "${TLS_KEY_FILE:certs/server.key}"
LLM Provider Configuration
llm:
default_provider: "openai"
default_model: "gpt-3.5-turbo"
providers:
ollama:
enabled: true
api_url: "${OLLAMA_URL:http://localhost:11434/api/generate}"
models: ["mistral", "llama2", "codellama"]
timeout: 60
max_retries: 3
openai:
enabled: true
api_key: "${OPENAI_API_KEY}"
api_url: "https://api.openai.com/v1"
models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
timeout: 120
max_tokens: 4000
temperature: 0.7
gemini:
enabled: true
api_key: "${GEMINI_API_KEY}"
api_url: "https://generativelanguage.googleapis.com/v1"
models: ["gemini-pro", "gemini-pro-vision"]
timeout: 90
azure:
enabled: true
api_key: "${AZURE_OPENAI_API_KEY}"
api_url: "${AZURE_OPENAI_ENDPOINT}"
api_version: "2023-12-01-preview"
deployment_name: "${AZURE_DEPLOYMENT_NAME}"
Enterprise Features Configuration
# Redis configuration (Enterprise)
redis:
enabled: true
host: "${REDIS_HOST:localhost}"
port: "${REDIS_PORT:6379}"
password: "${REDIS_PASSWORD}"
database: 0
max_connections: 20
# Semantic caching (Enterprise)
caching:
semantic:
enabled: true
provider: "redis"
ttl: 3600 # seconds
similarity_threshold: 0.85
max_cache_size_mb: 1000
response:
enabled: true
default_ttl: 300
max_ttl: 86400
# Monitoring & observability
monitoring:
metrics:
enabled: true
endpoint: "/metrics"
format: "prometheus"
tracing:
enabled: true
provider: "jaeger"
endpoint: "${TRACING_ENDPOINT}"
service_name: "wagtail-gateway"
sample_rate: 0.1
apm:
enabled: true
provider: "newrelic"
license_key: "${APM_LICENSE_KEY}"
Environment Configuration
Development
# environments/development.yaml
app:
debug: true
reload: true
workers: 1
logging:
level: "DEBUG"
console:
colored: true
database:
type: "sqlite"
sqlite:
path: "data/dev.db"
security:
rate_limiting:
enabled: false
tls:
enabled: false
Production
# environments/production.yaml
app:
debug: false
reload: false
workers: 8
security:
rate_limiting:
enabled: true
per_minute: 60
tls:
enabled: true
verify_client: true
logging:
level: "INFO"
aggregation:
enabled: true
monitoring:
metrics:
enabled: true
tracing:
enabled: true
apm:
enabled: true
Environment Variables
Application
WAGTAIL_ENVIRONMENT
- Environment nameWAGTAIL_HOST
- Bind hostWAGTAIL_PORT
- Bind portWAGTAIL_WORKERS
- Worker processesDatabase
DB_HOST
- Database hostDB_PORT
- Database portDB_NAME
- Database nameDB_USER
- Database userDB_PASSWORD
- Database passwordLLM APIs
OPENAI_API_KEY
- OpenAI API keyGEMINI_API_KEY
- Google Gemini keyAZURE_OPENAI_API_KEY
- Azure OpenAI keyANTHROPIC_API_KEY
- Anthropic API keySecurity
DEFAULT_API_KEY
- Default API keyJWT_SECRET
- JWT signing secretTLS_CERT_FILE
- TLS certificateWEBHOOK_SECRET
- Webhook secretConfiguration Loading Hierarchy
Default Values
Built-in defaults (lowest priority)
Base Config
sys_config.yaml file
Environment Files
environments/{env}.yaml
Plugin Configs
Plugin-specific settings
Environment Variables
Runtime overrides (highest priority)
Configuration Best Practices
Security
- Use environment variables for secrets
- Never commit secrets to version control
- Implement configuration validation
- Rotate secrets regularly
- Use secure file permissions (600/640)
Performance
- Cache configuration in memory
- Use lazy loading for large configs
- Optimize configuration parsing
- Monitor configuration load times
- Minimize configuration file size
Operations
- Version control configuration files
- Test changes in staging first
- Implement rollback procedures
- Document all configuration options
- Use configuration templates
Testing
- Validate configuration syntax
- Test in multiple environments
- Implement configuration test suites
- Use configuration smoke tests
- Check for drift detection
Configuration Troubleshooting
Validation Commands
# Check file permissions
ls -la config/sys_config.yaml
# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/sys_config.yaml'))"
# Check environment variables
env | grep WAGTAIL
# Test database connectivity
python -c "import psycopg2; conn = psycopg2.connect(host='localhost', database='wagtail', user='wagtail', password='password'); print('Connected')"
# Debug configuration loading
python -c "from config_loader import load_configuration; print(load_configuration())"
Enterprise Reference Architecture
Wag-Tail AI Gateway is designed for flexible deployment across various infrastructure environments, from simple single-server deployments to complex multi-cloud, multi-region enterprise architectures.
Edge Layer
API Gateway Layer
Application Layer
Data Layer
Single Server
Perfect for development and small-scale deployments using Docker Compose
- Docker containers
- Nginx reverse proxy
- Local PostgreSQL & Redis
Kubernetes
Enterprise-scale deployment with auto-scaling and high availability
- Horizontal Pod Autoscaling
- Service mesh integration
- Cloud-native storage
Multi-Cloud
Global deployment across AWS, Azure, and GCP with API gateway integration
- Regional deployments
- Global load balancing
- Cross-cloud replication
Single-Server Deployment
Ideal for development, testing, and small-scale production environments.
Architecture Components
Docker Compose Configuration
# docker-compose.yml
version: '3.8'
services:
nginx:
image: nginx:alpine
container_name: wagtail_nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- wagtail
restart: unless-stopped
wagtail:
image: wagtail/ai-gateway:latest
container_name: wagtail_app
environment:
- WAGTAIL_ENVIRONMENT=production
- DB_HOST=postgres
- REDIS_HOST=redis
- OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
- ./config:/app/config
- ./logs:/app/logs
depends_on:
- postgres
- redis
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
postgres:
image: postgres:15-alpine
container_name: wagtail_postgres
environment:
- POSTGRES_DB=wagtail
- POSTGRES_USER=wagtail
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
restart: unless-stopped
redis:
image: redis:7-alpine
container_name: wagtail_redis
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_data:/data
restart: unless-stopped
volumes:
postgres_data:
redis_data:
Nginx Configuration
# nginx.conf
events {
worker_connections 1024;
}
http {
upstream wagtail_backend {
server wagtail:8000;
}
server {
listen 80;
server_name your-domain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/ssl/server.key;
location / {
proxy_pass http://wagtail_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /health {
proxy_pass http://wagtail_backend/health;
access_log off;
}
}
}
Quick Start Commands
docker-compose up -d
Start all services
docker-compose logs -f wagtail
View application logs
docker-compose exec wagtail /app/healthcheck.sh
Check application health
Kubernetes Deployment
Enterprise-scale deployment with auto-scaling, high availability, and cloud-native features.
Kubernetes Architecture
Ingress Layer
Nginx Ingress Cert Manager TLS TerminationApplication Layer
Deployment Service HPA ConfigMap SecretsData Layer
PostgreSQL Cluster Redis Cluster Persistent VolumesMonitoring Layer
Prometheus Grafana Jaeger AlertManagerCore Kubernetes Manifests
Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: wagtail-gateway
namespace: wagtail
labels:
app: wagtail-gateway
version: v1.0.0
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: wagtail-gateway
template:
metadata:
labels:
app: wagtail-gateway
version: v1.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: wagtail-service-account
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: wagtail
image: wagtail/ai-gateway:v1.0.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8000
protocol: TCP
env:
- name: WAGTAIL_ENVIRONMENT
value: "production"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: wagtail-secrets
key: db-password
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: wagtail-secrets
key: redis-password
volumeMounts:
- name: config-volume
mountPath: /app/config
readOnly: true
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config-volume
configMap:
name: wagtail-config
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: wagtail-hpa
namespace: wagtail
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: wagtail-gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: wagtail-ingress
namespace: wagtail
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
tls:
- hosts:
- api.wagtail.ai
secretName: wagtail-tls
rules:
- host: api.wagtail.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: wagtail-service
port:
number: 80
Deployment Commands
kubectl apply -f k8s/
Deploy all manifests
kubectl get pods -n wagtail
Check pod status
kubectl logs -f deployment/wagtail-gateway -n wagtail
View application logs
kubectl port-forward svc/wagtail-service 8080:80 -n wagtail
Local port forwarding
Multi-Cloud Deployment
Global deployment across AWS, Azure, and GCP with regional failover and API gateway integration.
Global Architecture
AWS US-East
- EKS Cluster
- RDS PostgreSQL
- ElastiCache Redis
- API Gateway
Azure EU-West
- AKS Cluster
- Azure Database
- Azure Cache
- API Management
GCP Asia-Pacific
- GKE Cluster
- Cloud SQL
- Memorystore
- Apigee
Global Services
Terraform Infrastructure
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "wagtail-cluster"
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
wagtail_nodes = {
desired_size = 3
max_size = 10
min_size = 3
instance_types = ["t3.large"]
k8s_labels = {
Environment = "production"
Application = "wagtail"
}
}
}
}
# RDS PostgreSQL
resource "aws_db_instance" "wagtail_db" {
identifier = "wagtail-postgres"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.r6g.large"
allocated_storage = 100
max_allocated_storage = 1000
storage_encrypted = true
db_name = "wagtail"
username = "wagtail"
password = var.db_password
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.wagtail.name
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
skip_final_snapshot = false
final_snapshot_identifier = "wagtail-final-snapshot"
}
Kong API Gateway Integration
_format_version: "3.0"
services:
- name: wagtail-gateway
url: http://wagtail-service.wagtail.svc.cluster.local:80
retries: 3
connect_timeout: 10000
read_timeout: 60000
write_timeout: 60000
routes:
- name: wagtail-chat
service: wagtail-gateway
paths:
- /chat
methods:
- POST
strip_path: false
plugins:
# Rate limiting
- name: rate-limiting
service: wagtail-gateway
config:
minute: 100
hour: 1000
day: 10000
policy: redis
redis_host: redis-service.wagtail.svc.cluster.local
# Authentication
- name: key-auth
service: wagtail-gateway
config:
key_names:
- X-API-Key
hide_credentials: true
# CORS
- name: cors
service: wagtail-gateway
config:
origins:
- "https://app.yourcompany.com"
methods:
- GET
- POST
- OPTIONS
credentials: true
max_age: 3600
Monitoring & Observability
Comprehensive monitoring, logging, and tracing for production deployments.
Monitoring Architecture
Metrics Collection
Prometheus Node Exporter cAdvisor Custom MetricsLogging Pipeline
Fluentd Elasticsearch Logstash KibanaDistributed Tracing
Jaeger Zipkin OpenTelemetry CollectorVisualization & Alerting
Grafana AlertManager PagerDutyPrometheus Configuration
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "wagtail-rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'wagtail-gateway'
static_configs:
- targets: ['wagtail-service:8000']
metrics_path: /metrics
scrape_interval: 10s
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- wagtail
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- job_name: 'postgres-exporter'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'redis-exporter'
static_configs:
- targets: ['redis-exporter:9121']
Key Metrics Dashboard
Request Rate
rate(wagtail_requests_total[5m])
Response Time
histogram_quantile(0.95, rate(wagtail_request_duration_seconds_bucket[5m]))
Error Rate
rate(wagtail_requests_total{status=~"4..|5.."}[5m])
LLM Response Times
wagtail_llm_request_duration_seconds
Alerting Rules
Error rate > 5% for 5 minutes
95th percentile > 1s for 5 minutes
Pod restart count > 3 in 10 minutes
Database connection pool exhausted
Security Architecture
Zero-trust security model with comprehensive protection layers.
Security Architecture Layers
Perimeter Security
- CloudFlare DDoS Protection
- Web Application Firewall
- Rate Limiting
Identity & Access
- OAuth 2.0 / OIDC
- Multi-Factor Authentication
- Role-Based Access Control
Network Security
- Virtual Private Cloud
- Private Subnets
- Security Groups
Data Security
- Encryption at Rest
- Encryption in Transit
- Secret Management
Istio Security Policies
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: wagtail-security-policy
namespace: wagtail
spec:
selector:
matchLabels:
app: wagtail-gateway
rules:
- from:
- source:
principals: ["cluster.local/ns/wagtail/sa/wagtail-service-account"]
- to:
- operation:
methods: ["GET", "POST"]
paths: ["/chat", "/health", "/metrics"]
- when:
- key: source.ip
values: ["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: wagtail-mtls
namespace: wagtail
spec:
selector:
matchLabels:
app: wagtail-gateway
mtls:
mode: STRICT
Disaster Recovery & Backup
Daily Backup
Incremental PostgreSQL & Redis backups to S3
Weekly Full Backup
Complete system backup with configuration
Long-term Archive
Monthly backups archived to Glacier
Cross-Region DR
Standby environment in secondary region