AI-Switch

Enterprise AI Resource Governance Platform

Unified access to local models GLM / Minimax / Qwen / DeepSeek or cloud models OpenAI / Claude / Gemini / Grok, compatible with Claude Code, Codex, OpenCode, Cursor and other programming tools.

4-Tier
Org Permissions
3-Tier
Quota System
3-Tier
Response Cache
1000+
Concurrent Users

Core Capabilities

One platform to manage enterprise AI resource access, permissions, usage, and security

Multi-Provider Unified Access

Local models GLM / Minimax / Qwen / DeepSeek or cloud models OpenAI / Claude / Gemini / Grok, with automatic request/response format conversion, directly integrating with programming tools.

👥

Organizational Permissions

4-tier department tree + 4-level role system (root / company admin / department admin / member), all queries automatically filtered by permission scope.

💳

Three-Tier Quota System

Personal quota → Project quota → Supplementary quota, with customizable consumption order. Pre-deduction/settlement mechanism ensures idempotency, with 4 token types independently metered.

🚀

Streaming Response & Smart Cache

SSE real-time streaming output, three-tier response cache (in-memory LRU + SQLite + semantic vector), cache hits skip quota pre-deduction, significantly reducing costs.

🛡️

Security Protection

API Key SHA256 encryption, AES-256-GCM key storage, command interception (5 matching modes), brute-force protection, CIDR whitelist.

📊

Observability

Prometheus metrics endpoint, Grafana Dashboard, Alertmanager alert rules. Full coverage of key metrics including request latency, active connections, and quota pre-deduction.

Technical Architecture

Modular design, full-chain control from request entry to model forwarding

1

Request Entry

Gin router + middleware chain: Request ID → Latency logging → Auth (with brute-force protection) → Permission isolation → Rate limiting (Enterprise)

2

Governance Layer

Command interception → Response cache lookup → Quota pre-deduction (idempotency guarantee) → Request queuing / backpressure control

3

Model Forwarding

Load balancing selects resource → Adapter auto-converts format → Non-streaming auto-retry → Streaming SSE forwarding

4

Settlement & Audit

Quota settlement (refund/charge difference) → Access log async batch write → Audit event reliable delivery (Kafka) → Response cache write

Go 1.24 Gin GORM SQLite / PostgreSQL Redis Kafka Prometheus Docker / K8s
Client Claude Code / Codex / Cursor / SDK /v1/chat/completions Auth Intercept Cache? Quota Reserve Queue Load Balance Select Resource Local Models or Cloud Models Quota Finalize Log Audit Response SSE Stream / JSON

Choose Your Edition

Same codebase, two deployment modes. From small teams to thousand-person enterprises, choose as needed

SMB Edition

Small Teams

Out-of-the-box, zero-config startup, ideal for 10-100 person teams

  • Org permissions & three-tier quota
  • Multi-provider model access
  • Streaming response & three-tier cache
  • Web management console
  • SQLite default, PostgreSQL supported
  • Prometheus observability