Why LLM Wrappers Fail (And What to Build Instead)
Published on December 10, 2025
Why LLM Wrappers Fail (And What to Build Instead)
The fundamental architectural problem with current agent frameworks
Every week, a new "AI agent framework" launches on HackerNews. They all promise the same thing: Autonomous AI agents that Just Workβ’.
Most fail within weeks of launch. Here's why.
The Illusion of Autonomy
The typical agent framework architecture looks like this:
User Request β LLM β Tool Call β LLM β Next Action β LLM β ...
Sounds reasonable, right? The LLM decides everything.
This is the fundamental mistake.
Problem 1: Non-Deterministic Routing
Scenario: You ask an agent to "build a REST API."
Run 1:
LLM: "Let me plan the architecture first"
β Creates design doc
β Writes code
β Success
Run 2 (same input):
LLM: "I'll start coding immediately"
β Writes spaghetti code
β No architecture
β Technical debt
Run 3 (same input):
LLM: "Let me research best practices"
β Web searches for 20 minutes
β Burns API credits
β Never writes code
Same input. Three different workflows. Zero predictability.
This isn't a bug. It's the architecture.
Problem 2: Runaway Loops
LLM-driven agents don't know when to stop.
Real example from Auto-GPT:
1. User: "Research Python best practices"
2. Agent: Searches web
3. Agent: Finds article
4. Agent: Decides it needs more context
5. Agent: Searches again
6. Agent: Finds another article
7. Agent: Decides previous search wasn't good enough
8. Agent: Searches again
...
9. User: Ctrl+C
$47 in API costs. Zero useful output.
Why? Because the LLM has no concept of "done." It just keeps generating next actions until you kill it.
Problem 3: No Governance
LLMs decide which tools to call. You don't.
What could go wrong?
# Agent decides to "clean up" your codebase
β Calls filesystem_delete("/")
β Your production database: gone
β Your source code: gone
β Your career: questioning life choices
No permissions system. No audit trail. No safety rails.
You're trusting a probabilistic model to make deterministic infrastructure decisions.
Problem 4: Memory Theater
Most frameworks claim "persistent memory." Here's what they actually do:
Session-based memory:
Session 1: "I built a login system"
Session 2: "What login system?"
Vector database memory:
β Stores everything as embeddings
β Retrieves via semantic similarity
β Hallucinates connections between unrelated memories
β "I remember building that feature!" (Narrator: It didn't)
The problem: Memory is unstructured, unvalidated, and unreliable.
No trust-decay. No accountability. Just vibes.
Problem 5: Vendor Lock-In
"Use our cloud memory storage!"
Translation: Your agent's knowledge lives in our database. Forever.
- Can't export in a usable format
- Can't use with other tools
- Can't audit what's stored
- Can't switch providers
Your AI agent is only as smart as their API availability.
The Pattern: Backwards Architecture
Current frameworks put the LLM at the center:
βββββββββββββββ
β LLM β β Decides everything
ββββββββ¬βββββββ
β
βββββββββββββΌββββββββββββ
βΌ βΌ βΌ
Tools Memory Agents
This is like letting your CPU decide which process to run next. Chaos.
What to Build Instead: Kernel-Driven Architecture
Invert the model. Put a kernel at the center:
βββββββββββββββ
β KERNEL β β Decides routing, permissions, state
ββββββββ¬βββββββ
β
βββββββββββββΌββββββββββββ
βΌ βΌ βΌ
Agents Memory Tools
β β β
βββββββββββ¬ββ΄ββββββββββββ
β
LLMs β Compute resources, not decision makers
The kernel controls:
- Routing: YAML config, not LLM guessing
- Permissions: What tools each agent can access
- State: Persistent, queryable, auditable
- Memory: User-owned, structured, with trust-decay
- Governance: Audit logs, rate limits, cost tracking
LLMs provide intelligence. The kernel provides reliability.
Concrete Example: Building a REST API
LLM-Wrapper Framework:
User: "Build a REST API"
β
LLM: *flips coin* "I'll start with research"
β
*30 minutes of web searches*
β
LLM: "Let me think about architecture"
β
*Generates 5-page design doc you didn't ask for*
β
LLM: "Now I'll code"
β
*API key rate limit hit*
β
Result: Incomplete, unpredictable, expensive
Kernel-Driven Framework (Artemis City):
User: "Build a REST API"
β
Kernel: *matches pattern "build|create"* β routes to coder agent
β
Coder agent: Generates clean Flask/FastAPI code
β
Kernel: Stores result in memory with metadata
β
Result: Complete, deterministic, predictable
Same input β same output. Every time.
Why This Matters for Production
If you're building demos, LLM wrappers are fine.
If you're building production systems, you need:
- Determinism: Same input β same workflow
- Governance: Audit trails, permissions, safety
- Reliability: Agents that don't go rogue
- Observability: Why did agent X do Y?
- User ownership: Your memory, your control
You need a kernel, not a wrapper.
The Unix Parallel
Unix doesn't let processes decide their own scheduling priority.
Imagine if it did:
Process: "I'm the most important! Give me all the CPU!"
Kernel: "Okay!"
β System hangs
β Nothing else runs
β You reboot
Instead, Unix has a kernel that manages:
- Process scheduling
- Memory allocation
- I/O operations
- Permissions
Processes provide compute. The kernel provides governance.
Agentic systems need the same architecture.
What We Built: Artemis City
We took the kernel-driven approach seriously:
Routing: YAML-defined patterns
routes:
- pattern: "build|create|code"
agent: coder
priority: high
Governance: RBAC + audit logs
tool_permissions:
coder:
- filesystem_read
- filesystem_write
researcher:
- web_search # No write access
Memory: User-owned (Obsidian + Supabase)
codex memory connect obsidian ~/my-vault # Your memory, your files, your control
Trust-decay: Memory reliability decreases without validation
trust_score = 1.0 - (0.01 * days_old) - (0.05 * days_since_validation)
The Results
LLM Wrapper Framework:
- Non-deterministic routing
- $50+ in burned API credits
- Unpredictable agent behavior
- No audit trail
- Vendor lock-in
Artemis City (Kernel-Driven):
- Deterministic routing (same input β same result)
- Predictable costs (no runaway loops)
- Governed agent execution (RBAC + audit)
- User-owned memory (Obsidian + Supabase)
- Production-ready reliability
Try It Yourself
pip install artemis-city codex init my-agent-system codex run coder "Build a simple REST API"
See the difference between wrappers and kernels.
The Bottom Line
LLM wrappers fail because they give control to the LLM.
Production systems need governance, not autonomy.
Build a kernel. Let agents be compute resources, not decision-makers.
Your production environment will thank you.
Learn More
- GitHub β Production-ready kernel + router
β Prinston Palmer
Founder, Artemis City
P.S. If you've burned $500 in OpenAI credits watching Auto-GPT spin in circles, you're not alone. There's a better way.