System Design

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Client Interface                        │
│           (CLI, REST API, WebSocket, Library)              │
└────────────────────┬────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────────┐
│                   API Gateway                               │
│         (Auth, Rate Limiting, Request Routing)             │
└────────────────────┬────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────────┐
│                  Core Engine                                │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Prompt    │  │   Tool      │  │   Conversation      │  │
│  │   Manager   │  │   Registry  │  │   Manager           │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└────────────────────┬────────────────────────────────────────┘
                     │
        ┌────────────┼────────────┐
        │            │            │
┌───────▼───┐  ┌────▼────┐  ┌─────▼──────┐
│   LLM     │  │  Code   │  │   State    │
│ Adapters  │  │ Executor│  │   Store    │
└───────────┘  └────┬────┘  └────────────┘
                    │
        ┌───────────┴───────────┐
        │                       │
┌───────▼────────┐      ┌──────▼───────┐
│  Container     │      │    WASM      │
│  Runtime       │      │   Sandbox    │
│ (Docker/gVisor)│      │  (Wasmtime)   │
└────────────────┘      └──────────────┘

Component Breakdown

1. Client Interface

Multiple entry points for different use cases:

CLI: Interactive terminal interface
REST API: HTTP-based programmatic access
WebSocket: Real-time streaming for interactive applications
Rust Library: Direct embedding in Rust applications

2. API Gateway

Central entry point handling cross-cutting concerns:

Authentication (API keys, JWT, mTLS)
Rate limiting and quota management
Request validation and routing
Load balancing

3. Core Engine

The heart of the system:

Prompt Manager

Template management and variable substitution
Prompt versioning and A/B testing support
Multi-turn conversation handling

Tool Registry

Dynamic tool registration and discovery
Schema validation (JSON Schema for function calling)
Tool permission enforcement

Conversation Manager

Session state management
Context window optimization
Conversation persistence

4. LLM Adapters

Pluggable backends for different providers:

OpenAI GPT-4/GPT-3.5
Anthropic Claude
Local models (via llama.cpp, Ollama)
Custom self-hosted models

5. Code Executor

The security-critical component:

Receives code/tool calls from LLM
Validates against allowed operations
Executes in sandboxed environment
Returns results to conversation

6. State Store

Persistent storage for:

Conversation history
User preferences and settings
Audit logs
Tool definitions

Deployment Architecture

Single-Node Deployment

┌────────────────────────────────────┐
│           Host System              │
│  ┌────────────────────────────┐   │
│  │    LLM Harness Service     │   │
│  │  ┌─────────────────────┐   │   │
│  │  │   Core Engine       │   │   │
│  │  └─────────────────────┘   │   │
│  └────────────────────────────┘   │
│  ┌────────────────────────────┐   │
│  │   Container Runtime        │   │
│  │  (isolated sandboxes)      │   │
│  └────────────────────────────┘   │
└────────────────────────────────────┘

Distributed Deployment

┌────────────────────────────────────────────────────────┐
│                     Load Balancer                       │
└──────────────┬───────────────────────┬─────────────────┘
               │                       │
    ┌──────────▼──────────┐  ┌────────▼────────┐
    │  API Gateway        │  │  API Gateway    │
    │  (Instance 1)       │  │  (Instance 2)   │
    └──────────┬──────────┘  └────────┬────────┘
               │                       │
    ┌──────────▼───────────────────────▼────────┐
    │           Message Queue                    │
    │     (Redis/RabbitMQ/NATS)                  │
    └──────────┬───────────────────────┬─────────┘
               │                       │
    ┌──────────▼──────────┐  ┌────────▼────────┐
    │  Worker Node 1      │  │  Worker Node 2  │
    │  (Core + Executor)  │  │  (Core + Exec)  │
    └─────────────────────┘  └─────────────────┘

Data Flow

Typical Request Flow

Request Received
- Client sends prompt via API
- Gateway authenticates and validates
Prompt Processing
- Core Engine loads conversation context
- Prompt Manager applies templates
LLM Interaction
- Request sent to configured LLM adapter
- LLM generates response (potentially with tool calls)
Tool Execution (if needed)
- Tool Registry validates tool calls
- Code Executor runs in sandbox
- Results returned to LLM
Response Delivery
- Final response formatted
- Conversation state persisted
- Response returned to client
Audit Logging
- All operations logged asynchronously
- Metrics recorded for observability

Keyboard shortcuts

Robit