AI-Switch

Enterprise AI Resource Governance Platform

Unified access to local models GLM / Minimax / Qwen / DeepSeek or cloud models OpenAI / Claude / Gemini / Grok, compatible with Claude Code, Codex, OpenCode, Cursor and other programming tools.

4-Tier

Org Permissions

3-Tier

Quota System

3-Tier

Response Cache

1000+

Concurrent Users

Core Capabilities

One platform to manage enterprise AI resource access, permissions, usage, and security

⚡

Multi-Provider Unified Access

Local models GLM / Minimax / Qwen / DeepSeek or cloud models OpenAI / Claude / Gemini / Grok, with automatic request/response format conversion, directly integrating with programming tools.

👥

Organizational Permissions

4-tier department tree + 4-level role system (root / company admin / department admin / member), all queries automatically filtered by permission scope.

💳

Three-Tier Quota System

Personal quota → Project quota → Supplementary quota, with customizable consumption order. Pre-deduction/settlement mechanism ensures idempotency, with 4 token types independently metered.

🚀

Streaming Response & Smart Cache

SSE real-time streaming output, three-tier response cache (in-memory LRU + SQLite + semantic vector), cache hits skip quota pre-deduction, significantly reducing costs.

🛡️

Security Protection

API Key SHA256 encryption, AES-256-GCM key storage, command interception (5 matching modes), brute-force protection, CIDR whitelist.

📊

Observability

Prometheus metrics endpoint, Grafana Dashboard, Alertmanager alert rules. Full coverage of key metrics including request latency, active connections, and quota pre-deduction.

Technical Architecture

Modular design, full-chain control from request entry to model forwarding

1

Request Entry

Gin router + middleware chain: Request ID → Latency logging → Auth (with brute-force protection) → Permission isolation → Rate limiting (Enterprise)

2

Governance Layer

Command interception → Response cache lookup → Quota pre-deduction (idempotency guarantee) → Request queuing / backpressure control

3

Model Forwarding

Load balancing selects resource → Adapter auto-converts format → Non-streaming auto-retry → Streaming SSE forwarding

4

Settlement & Audit

Quota settlement (refund/charge difference) → Access log async batch write → Audit event reliable delivery (Kafka) → Response cache write

Go 1.24 Gin GORM SQLite / PostgreSQL Redis Kafka Prometheus Docker / K8s

Choose Your Edition

Same codebase, two deployment modes. From small teams to thousand-person enterprises, choose as needed

SMB Edition

Small Teams

Out-of-the-box, zero-config startup, ideal for 10-100 person teams

✓ Org permissions & three-tier quota
✓ Multi-provider model access
✓ Streaming response & three-tier cache
✓ Web management console
✓ SQLite default, PostgreSQL supported
✓ Prometheus observability

Enterprise Edition

Mid-to-Large Enterprises

Distributed architecture, high-availability deployment, ideal for 100-1000+ person enterprises

✓ All SMB Edition features
★ Distributed rate limiting (Redis)
★ Async audit events (Kafka)
★ Reliable audit delivery (DurablePublisher)
★ Model session query & enterprise audit API
★ K8s HPA auto-scaling