Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

System Design

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Client Interface                        │
│           (CLI, REST API, WebSocket, Library)              │
└────────────────────┬────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────────┐
│                   API Gateway                               │
│         (Auth, Rate Limiting, Request Routing)             │
└────────────────────┬────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────────┐
│                  Core Engine                                │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Prompt    │  │   Tool      │  │   Conversation      │  │
│  │   Manager   │  │   Registry  │  │   Manager           │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└────────────────────┬────────────────────────────────────────┘
                     │
        ┌────────────┼────────────┐
        │            │            │
┌───────▼───┐  ┌────▼────┐  ┌─────▼──────┐
│   LLM     │  │  Code   │  │   State    │
│ Adapters  │  │ Executor│  │   Store    │
└───────────┘  └────┬────┘  └────────────┘
                    │
        ┌───────────┴───────────┐
        │                       │
┌───────▼────────┐      ┌──────▼───────┐
│  Container     │      │    WASM      │
│  Runtime       │      │   Sandbox    │
│ (Docker/gVisor)│      │  (Wasmtime)   │
└────────────────┘      └──────────────┘

Component Breakdown

1. Client Interface

Multiple entry points for different use cases:

  • CLI: Interactive terminal interface
  • REST API: HTTP-based programmatic access
  • WebSocket: Real-time streaming for interactive applications
  • Rust Library: Direct embedding in Rust applications

2. API Gateway

Central entry point handling cross-cutting concerns:

  • Authentication (API keys, JWT, mTLS)
  • Rate limiting and quota management
  • Request validation and routing
  • Load balancing

3. Core Engine

The heart of the system:

Prompt Manager

  • Template management and variable substitution
  • Prompt versioning and A/B testing support
  • Multi-turn conversation handling

Tool Registry

  • Dynamic tool registration and discovery
  • Schema validation (JSON Schema for function calling)
  • Tool permission enforcement

Conversation Manager

  • Session state management
  • Context window optimization
  • Conversation persistence

4. LLM Adapters

Pluggable backends for different providers:

  • OpenAI GPT-4/GPT-3.5
  • Anthropic Claude
  • Local models (via llama.cpp, Ollama)
  • Custom self-hosted models

5. Code Executor

The security-critical component:

  • Receives code/tool calls from LLM
  • Validates against allowed operations
  • Executes in sandboxed environment
  • Returns results to conversation

6. State Store

Persistent storage for:

  • Conversation history
  • User preferences and settings
  • Audit logs
  • Tool definitions

Deployment Architecture

Single-Node Deployment

┌────────────────────────────────────┐
│           Host System              │
│  ┌────────────────────────────┐   │
│  │    LLM Harness Service     │   │
│  │  ┌─────────────────────┐   │   │
│  │  │   Core Engine       │   │   │
│  │  └─────────────────────┘   │   │
│  └────────────────────────────┘   │
│  ┌────────────────────────────┐   │
│  │   Container Runtime        │   │
│  │  (isolated sandboxes)      │   │
│  └────────────────────────────┘   │
└────────────────────────────────────┘

Distributed Deployment

┌────────────────────────────────────────────────────────┐
│                     Load Balancer                       │
└──────────────┬───────────────────────┬─────────────────┘
               │                       │
    ┌──────────▼──────────┐  ┌────────▼────────┐
    │  API Gateway        │  │  API Gateway    │
    │  (Instance 1)       │  │  (Instance 2)   │
    └──────────┬──────────┘  └────────┬────────┘
               │                       │
    ┌──────────▼───────────────────────▼────────┐
    │           Message Queue                    │
    │     (Redis/RabbitMQ/NATS)                  │
    └──────────┬───────────────────────┬─────────┘
               │                       │
    ┌──────────▼──────────┐  ┌────────▼────────┐
    │  Worker Node 1      │  │  Worker Node 2  │
    │  (Core + Executor)  │  │  (Core + Exec)  │
    └─────────────────────┘  └─────────────────┘

Data Flow

Typical Request Flow

  1. Request Received

    • Client sends prompt via API
    • Gateway authenticates and validates
  2. Prompt Processing

    • Core Engine loads conversation context
    • Prompt Manager applies templates
  3. LLM Interaction

    • Request sent to configured LLM adapter
    • LLM generates response (potentially with tool calls)
  4. Tool Execution (if needed)

    • Tool Registry validates tool calls
    • Code Executor runs in sandbox
    • Results returned to LLM
  5. Response Delivery

    • Final response formatted
    • Conversation state persisted
    • Response returned to client
  6. Audit Logging

    • All operations logged asynchronously
    • Metrics recorded for observability