System Design
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Client Interface │
│ (CLI, REST API, WebSocket, Library) │
└────────────────────┬────────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────────┐
│ API Gateway │
│ (Auth, Rate Limiting, Request Routing) │
└────────────────────┬────────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────────┐
│ Core Engine │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Prompt │ │ Tool │ │ Conversation │ │
│ │ Manager │ │ Registry │ │ Manager │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└────────────────────┬────────────────────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
┌───────▼───┐ ┌────▼────┐ ┌─────▼──────┐
│ LLM │ │ Code │ │ State │
│ Adapters │ │ Executor│ │ Store │
└───────────┘ └────┬────┘ └────────────┘
│
┌───────────┴───────────┐
│ │
┌───────▼────────┐ ┌──────▼───────┐
│ Container │ │ WASM │
│ Runtime │ │ Sandbox │
│ (Docker/gVisor)│ │ (Wasmtime) │
└────────────────┘ └──────────────┘
Component Breakdown
1. Client Interface
Multiple entry points for different use cases:
- CLI: Interactive terminal interface
- REST API: HTTP-based programmatic access
- WebSocket: Real-time streaming for interactive applications
- Rust Library: Direct embedding in Rust applications
2. API Gateway
Central entry point handling cross-cutting concerns:
- Authentication (API keys, JWT, mTLS)
- Rate limiting and quota management
- Request validation and routing
- Load balancing
3. Core Engine
The heart of the system:
Prompt Manager
- Template management and variable substitution
- Prompt versioning and A/B testing support
- Multi-turn conversation handling
Tool Registry
- Dynamic tool registration and discovery
- Schema validation (JSON Schema for function calling)
- Tool permission enforcement
Conversation Manager
- Session state management
- Context window optimization
- Conversation persistence
4. LLM Adapters
Pluggable backends for different providers:
- OpenAI GPT-4/GPT-3.5
- Anthropic Claude
- Local models (via llama.cpp, Ollama)
- Custom self-hosted models
5. Code Executor
The security-critical component:
- Receives code/tool calls from LLM
- Validates against allowed operations
- Executes in sandboxed environment
- Returns results to conversation
6. State Store
Persistent storage for:
- Conversation history
- User preferences and settings
- Audit logs
- Tool definitions
Deployment Architecture
Single-Node Deployment
┌────────────────────────────────────┐
│ Host System │
│ ┌────────────────────────────┐ │
│ │ LLM Harness Service │ │
│ │ ┌─────────────────────┐ │ │
│ │ │ Core Engine │ │ │
│ │ └─────────────────────┘ │ │
│ └────────────────────────────┘ │
│ ┌────────────────────────────┐ │
│ │ Container Runtime │ │
│ │ (isolated sandboxes) │ │
│ └────────────────────────────┘ │
└────────────────────────────────────┘
Distributed Deployment
┌────────────────────────────────────────────────────────┐
│ Load Balancer │
└──────────────┬───────────────────────┬─────────────────┘
│ │
┌──────────▼──────────┐ ┌────────▼────────┐
│ API Gateway │ │ API Gateway │
│ (Instance 1) │ │ (Instance 2) │
└──────────┬──────────┘ └────────┬────────┘
│ │
┌──────────▼───────────────────────▼────────┐
│ Message Queue │
│ (Redis/RabbitMQ/NATS) │
└──────────┬───────────────────────┬─────────┘
│ │
┌──────────▼──────────┐ ┌────────▼────────┐
│ Worker Node 1 │ │ Worker Node 2 │
│ (Core + Executor) │ │ (Core + Exec) │
└─────────────────────┘ └─────────────────┘
Data Flow
Typical Request Flow
-
Request Received
- Client sends prompt via API
- Gateway authenticates and validates
-
Prompt Processing
- Core Engine loads conversation context
- Prompt Manager applies templates
-
LLM Interaction
- Request sent to configured LLM adapter
- LLM generates response (potentially with tool calls)
-
Tool Execution (if needed)
- Tool Registry validates tool calls
- Code Executor runs in sandbox
- Results returned to LLM
-
Response Delivery
- Final response formatted
- Conversation state persisted
- Response returned to client
-
Audit Logging
- All operations logged asynchronously
- Metrics recorded for observability