Why LLM Wrappers Fail (And What to Build Instead)

Published on June 12, 2024

Target Audience: Infrastructure engineers, AI developers, technical decision-makers

Goal: Establish technical credibility, explain the fundamental problem Artemis City solves

CTA: Read whitepaper, star GitHub, join Discord

Distribution: HackerNews, Reddit r/MachineLearning, personal blog, LinkedIn


Why LLM Wrappers Fail (And What to Build Instead)

The fundamental architectural problem with current agent frameworks


Every week, a new "AI agent framework" launches on HackerNews. They all promise the same thing: autonomous AI agents that Just Workβ„’.

Most fail within weeks of launch. Here's why.


The Illusion of Autonomy

The typical agent framework architecture looks like this:

User Request β†’ LLM β†’ Tool Call β†’ LLM β†’ Next Action β†’ LLM β†’ ...

Sounds reasonable, right? The LLM decides everything.

This is the fundamental mistake.


Problem 1: Non-Deterministic Routing

Scenario: You ask an agent to "build a REST API."

Run 1:

LLM: "Let me plan the architecture first"
β†’ Creates design doc
β†’ Writes code
β†’ Success

Run 2 (same input):

LLM: "I'll start coding immediately"
β†’ Writes spaghetti code
β†’ No architecture
β†’ Technical debt

Run 3 (same input):

LLM: "Let me research best practices"
β†’ Web searches for 20 minutes
β†’ Burns API credits
β†’ Never writes code

Same input. Three different workflows. Zero predictability.

This isn't a bug. It's the architecture.


Problem 2: Runaway Loops

LLM-driven agents don't know when to stop.

Real example from Auto-GPT:

1. User: "Research Python best practices"
2. Agent: Searches web
3. Agent: Finds article
4. Agent: Decides it needs more context
5. Agent: Searches again
6. Agent: Finds another article
7. Agent: Decides previous search wasn't good enough
8. Agent: Searches again
...
9. User: Ctrl+C

$47 in API costs. Zero useful output.

Why? Because the LLM has no concept of "done." It just keeps generating next actions until you kill it.


Problem 3: No Governance

LLMs decide which tools to call. You don't.

What could go wrong?

# Agent decides to "clean up" your codebase
β†’ Calls filesystem_delete("/")
β†’ Your production database: gone
β†’ Your source code: gone
β†’ Your career: questioning life choices

No permissions system. No audit trail. No safety rails.

You're trusting a probabilistic model to make deterministic infrastructure decisions.


Problem 4: Memory Theater

Most frameworks claim "persistent memory." Here's what they actually do:

Session-based memory:

Session 1: "I built a login system"
Session 2: "What login system?"

Vector database memory:

β†’ Stores everything as embeddings
β†’ Retrieves via semantic similarity
β†’ Hallucinates connections between unrelated memories
β†’ "I remember building that feature!" (Narrator: It didn't)

The problem: Memory is unstructured, unvalidated, and unreliable.

No trust-decay. No accountability. Just vibes.


Problem 5: Vendor Lock-In

"Use our cloud memory storage!"

Translation: Your agent's knowledge lives in our database. Forever.

  • Can't export in a usable format
  • Can't use with other tools
  • Can't audit what's stored
  • Can't switch providers

Your AI agent is only as smart as their API availability.


The Pattern: Backwards Architecture

Current frameworks put the LLM at the center:

       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚     LLM     β”‚ ← Decides everything
       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
              β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β–Ό           β–Ό           β–Ό
Tools      Memory      Agents

This is like letting your CPU decide which process to run next. Chaos.


What to Build Instead: Kernel-Driven Architecture

Invert the model. Put a kernel at the center:

       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚   KERNEL    β”‚ ← Decides routing, permissions, state
       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
              β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β–Ό           β–Ό           β–Ό
Agents      Memory      Tools
  β”‚           β”‚           β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
          LLMs ← Compute resources, not decision makers

The kernel controls:

  • Routing: YAML config, not LLM guessing
  • Permissions: What tools each agent can access
  • State: Persistent, queryable, auditable
  • Memory: User-owned, structured, with trust-decay
  • Governance: Audit logs, rate limits, cost tracking

LLMs provide intelligence. The kernel provides reliability.


Concrete Example: Building a REST API

LLM-Wrapper Framework:

User: "Build a REST API"
  ↓
LLM: *flips coin* "I'll start with research"
  ↓
*30 minutes of web searches*
  ↓
LLM: "Let me think about architecture"
  ↓
*Generates 5-page design doc you didn't ask for*
  ↓
LLM: "Now I'll code"
  ↓
*API key rate limit hit*
  ↓
Result: Incomplete, unpredictable, expensive

Kernel-Driven Framework (Artemis City):

User: "Build a REST API"
  ↓
Kernel: *matches pattern "build|create"* β†’ routes to coder agent
  ↓
Coder agent: Generates clean Flask/FastAPI code
  ↓
Kernel: Stores result in memory with metadata
  ↓
Result: Complete, deterministic, predictable

Same input β†’ same output. Every time.


Why This Matters for Production

If you're building demos, LLM wrappers are fine.

If you're building production systems, you need:

  1. Determinism: Same input β†’ same workflow
  2. Governance: Audit trails, permissions, safety
  3. Reliability: Agents that don't go rogue
  4. Observability: Why did agent X do Y?
  5. User ownership: Your memory, your control

You need a kernel, not a wrapper.


The Unix Parallel

Unix doesn't let processes decide their own scheduling priority.

Imagine if it did:

Process: "I'm the most important! Give me all the CPU!"
Kernel: "Okay!"
β†’ System hangs
β†’ Nothing else runs
β†’ You reboot

Instead, Unix has a kernel that manages:

  • Process scheduling
  • Memory allocation
  • I/O operations
  • Permissions

Processes provide compute. The kernel provides governance.

Agentic systems need the same architecture.


What We Built: Artemis City

We took the kernel-driven approach seriously:

Routing: YAML-defined patterns

routes:
  - pattern: "build|create|code"
    agent: coder
    priority: high

Governance: RBAC + audit logs

tool_permissions:
  coder:
    - filesystem_read
    - filesystem_write
  researcher:
    - web_search  # No write access

Memory: User-owned (Obsidian + Supabase)

codex memory connect obsidian ~/my-vault
# Your memory, your files, your control

Trust-decay: Memory reliability decreases without validation

trust_score = 1.0 - (0.01 * days_old) - (0.05 * days_since_validation)

The Results

LLM Wrapper Framework:

  • Non-deterministic routing
  • $50+ in burned API credits
  • Unpredictable agent behavior
  • No audit trail
  • Vendor lock-in

Artemis City (Kernel-Driven):

  • Deterministic routing (same input β†’ same result)
  • Predictable costs (no runaway loops)
  • Governed agent execution (RBAC + audit)
  • User-owned memory (Obsidian + Supabase)
  • Production-ready reliability

Try It Yourself

pip install artemis-city
codex init my-agent-system
codex run coder "Build a simple REST API"

See the difference between wrappers and kernels.


The Bottom Line

LLM wrappers fail because they give control to the LLM.

Production systems need governance, not autonomy.

Build a kernel. Let agents be compute resources, not decision-makers.

Your production environment will thank you.


Learn More


β€” Prinston Palmer

Ex-Morgan Stanley derivatives engineer

Founder, Artemis City

P.S. If you've burned $500 in OpenAI credits watching Auto-GPT spin in circles, you're not alone. There's a better way.