AI-Switch
Enterprise AI Resource Governance Platform
Unified access to local models GLM / Minimax / Qwen / DeepSeek or cloud models OpenAI / Claude / Gemini / Grok, compatible with Claude Code, Codex, OpenCode, Cursor and other programming tools.
Core Capabilities
One platform to manage enterprise AI resource access, permissions, usage, and security
Multi-Provider Unified Access
Local models GLM / Minimax / Qwen / DeepSeek or cloud models OpenAI / Claude / Gemini / Grok, with automatic request/response format conversion, directly integrating with programming tools.
Organizational Permissions
4-tier department tree + 4-level role system (root / company admin / department admin / member), all queries automatically filtered by permission scope.
Three-Tier Quota System
Personal quota → Project quota → Supplementary quota, with customizable consumption order. Pre-deduction/settlement mechanism ensures idempotency, with 4 token types independently metered.
Streaming Response & Smart Cache
SSE real-time streaming output, three-tier response cache (in-memory LRU + SQLite + semantic vector), cache hits skip quota pre-deduction, significantly reducing costs.
Security Protection
API Key SHA256 encryption, AES-256-GCM key storage, command interception (5 matching modes), brute-force protection, CIDR whitelist.
Observability
Prometheus metrics endpoint, Grafana Dashboard, Alertmanager alert rules. Full coverage of key metrics including request latency, active connections, and quota pre-deduction.
Technical Architecture
Modular design, full-chain control from request entry to model forwarding
Request Entry
Gin router + middleware chain: Request ID → Latency logging → Auth (with brute-force protection) → Permission isolation → Rate limiting (Enterprise)
Governance Layer
Command interception → Response cache lookup → Quota pre-deduction (idempotency guarantee) → Request queuing / backpressure control
Model Forwarding
Load balancing selects resource → Adapter auto-converts format → Non-streaming auto-retry → Streaming SSE forwarding
Settlement & Audit
Quota settlement (refund/charge difference) → Access log async batch write → Audit event reliable delivery (Kafka) → Response cache write
Choose Your Edition
Same codebase, two deployment modes. From small teams to thousand-person enterprises, choose as needed
Small Teams
Out-of-the-box, zero-config startup, ideal for 10-100 person teams
- ✓ Org permissions & three-tier quota
- ✓ Multi-provider model access
- ✓ Streaming response & three-tier cache
- ✓ Web management console
- ✓ SQLite default, PostgreSQL supported
- ✓ Prometheus observability
Mid-to-Large Enterprises
Distributed architecture, high-availability deployment, ideal for 100-1000+ person enterprises
- ✓ All SMB Edition features
- ★ Distributed rate limiting (Redis)
- ★ Async audit events (Kafka)
- ★ Reliable audit delivery (DurablePublisher)
- ★ Model session query & enterprise audit API
- ★ K8s HPA auto-scaling