Skip to main content

Abstract

Artemis City is a multi-agent orchestration platform that tightly integrates a network of AI agents with a persistent knowledge base (an Obsidian vault) for memory. Agents read task specifications from Markdown notes, execute them, and write results back to the vault, creating a human-readable persistent memory system for the AI collective. This design enables the agents to share context and learn cumulatively over time, treating the knowledge vault as an extension of their working memory. Artemis City represents a new category of agentic operating infrastructure, combining large language model (LLM) capabilities with a structured memory graph, continuous learning, and robust safety governance. Whitebook Version 3 introduces the most significant architectural advancement since the platform’s inception: the Domain-Locked Hebbian Marketplace. Where v2 formalized the basic Hebbian weight update as a ±1 binary rule, v3 replaces this with a rigorous morphological learning formula — ΔW = tanh(a · x · y) — and proves through simulation that domain-locked agent specialization with Hebbian routing outperforms unconstrained routing by 81.2%, monolithic single-model inference by 79.8%, and k-Nearest Neighbor lookup by 80.8%, while operating at 180× lower computational cost than k-NN.

1. Introduction

Beyond raw performance, v3 introduces five new architectural capabilities validated by quantitative simulation: Domain-Locked Hebbian Marketplace: Agents are hard-constrained to ATP ActionType domains (Execute, Scaffold, Summarize, Reflect). Each domain maps to a stable generating function. Within each domain, agents compete on Hebbian weight — the best performer monopolizes routing. This eliminates cross-domain pollution and yields an 81.2% MAE improvement over unconstrained baselines. Active Sentinel & Immune System: The sentinel layer evolves from a passive alert mechanism to an active rerouting system. When oscillation is detected (sign-change rate exceeds threshold), the sentinel penalizes the dominant agent’s weight and forces exploration. This yields measurable improvement and concentrates interventions in high-volatility domains. Hebbian + k-NN Reconciliation Layer: Hebbian routing serves as a cheap elimination layer (O(1) per decision) that feeds into expensive k-NN verification (O(W) per decision) only when the two systems disagree. In disagreements, Hebbian is right 94% of the time. The reconciled system operates at 71.9% of pure k-NN cost. Distributed Weight Resilience: The Hebbian weight graph provides intrinsic antifragility. When a single agent’s corpus is corrupted, the routing graph automatically deselects it — the poisoned agent receives 0 task assignments while the system absorbs only -1.0% damage (versus +0.8% for a monolithic MLP). Missing agent flows produce a 7.2× failure-rate spike that is immediately detectable, triggering kernel-level expansion workflows. Learning Velocity: Hebbian agents recover from failure events in 4.1–4.6 steps versus 17–24 steps for monolithic MLPs, demonstrating 4–5× faster adaptation. Every failure teaches — the distributed weight graph IS the resilience. Human Validators as Weighted Commodity: Humans displaced by AI return as credentialed validators. The LLM with the most real credential-backed reviews is weighted higher than hobbyist feedback. Access to validated reviews becomes a tradeable commodity within the marketplace. This whitebook details each of these advancements with simulation evidence, formal specifications, and architectural rationale. The result is a trustworthy, efficient, and self-improving AI infrastructure that serves as a blueprint for next-generation multi-agent systems.

2. Memory Bus Architecture and Synchronization

The Memory Bus is the backbone of Artemis City’s knowledge infrastructure, mediating all read/write operations between agents and the persistent memory stores. In v2, we introduced a rigorous Memory Bus Consistency & Synchronization API that guarantees updates propagate reliably to both the graph store (Obsidian vault) and the vector index (Supabase) with minimal latency. This section describes the write-through protocol, read query hierarchy, consistency guarantees, and performance characteristics of the updated memory bus.

2.1 Write-Through Memory Sync Protocol

Write Path: When an agent commits new knowledge (e.g. creating or updating a Markdown note), the memory bus performs an atomic write-through to both the Obsidian vault and the Supabase vector store. The operation is orchestrated as a single transaction: the bus first generates an embedding for the knowledge node and upserts it into the vector index, then writes the note and metadata to the Obsidian vault, and finally confirms the write back to the agent. This ensures that vector search data is always updated before the knowledge becomes visible to agents, maintaining consistency. The entire commit sequence is acknowledged only after both the graph file and vector index have been successfully updated, so agents never see a partial write. Agent Write Request → Memory Bus → [Parallel: Obsidian Write + Supabase Embed] → Wait for Both → Confirm to Agent To support durability, the memory bus uses a write-ahead log (WAL) and batching mechanism. All pending writes are logged so that if a crash occurs mid-transaction, the system can recover to a consistent state. Writes may be batched for efficiency — e.g. accumulate writes for up to 100 ms or 50 operations before flushing — and periodic checkpoints are taken (every 1000 operations or 60 seconds) to cap recovery time. These measures improve throughput while guaranteeing that either all parts of a batched update are applied or none are (atomicity). If any step of the multi-stage write fails (for example, the vector DB is temporarily unreachable), the memory bus will rollback the transaction and retry according to a backoff policy. This prevents divergent states between the vault and index. Together, the write-through protocol and WAL provide strong write consistency — all agents see the same latest knowledge, and no acknowledged write is ever lost.

2.2 Read Path and Consistency Guarantees

Hierarchical Read Protocol: To serve queries efficiently, the memory bus implements a tiered lookup strategy. A read request (e.g. when an agent searches or references knowledge) progresses through three levels: Exact Lookup (O(1)): Check for an exact note match by unique ID or title hash — this is a constant-time hash map lookup and returns immediately if found. Structured Search (O(log n)): If no exact match, perform a structured keyword search across the knowledge graph’s indices (e.g. tags, titles, links) — leveraging sorted indices for logarithmic-time retrieval. Vector Similarity (O(n)): As a fallback, execute a semantic vector similarity search against the Supabase pgvector index to find relevant notes by content embedding. This is the most expensive step, treated as last resort when other methods fail to find an answer. By default, the memory bus returns results with staleness metadata indicating the sync status between the Obsidian vault and Supabase index. The system guarantees read-your-writes consistency: an agent will immediately see its own writes on subsequent reads. To further optimize common reads, Artemis City employs aggressive caching on the memory bus. Exact lookups and recent query results are cached in-memory, yielding fast (~50 ms) responses for frequent queries. The cache is invalidated on writes to maintain correctness, and a cache hit rate >80% is targeted for repeat queries. This layered approach provides both speed and completeness in querying the knowledge base.

2.3 Conflict Handling and Consistency

In a concurrent multi-agent environment, it is possible for two agents to attempt overlapping writes (e.g. editing the same note) at nearly the same time. The memory bus includes a concurrent write resolution mechanism to handle such conflicts deterministically. The strategy is a last-write-wins arbitration: each knowledge node update carries a timestamp, and the latest timestamp prevails if write contention is detected. In practice, writes acquire an optimistic lock; if a race is detected (a node was modified since read), the later write is allowed but the conflict is logged for audit. To strengthen this, each knowledge node can maintain a simple version vector or incrementing version number. If an agent’s write is based on an outdated version, the memory bus can detect it and either reject the write or queue it for manual review. For semantic conflicts (where the content changes are non-trivial merges), Artemis City defers to a manual resolution queue — flagging the node for a human or higher-level agent to reconcile the changes. These measures ensure the knowledge graph remains logically consistent and that no update is lost without notice. All conflicts and resolutions are recorded in the governance log for transparency.

2.4 Performance and Reliability

The updated memory bus is designed to meet strict performance targets for a smooth real-time experience. Documented SLOs include write latency under 200 ms at p95 (500 ms at p99) and read latency under 50 ms p95 for cache hits (150 ms for complex vector queries). Typical simple queries return in tens of milliseconds, thanks to the hierarchical lookup and caching, while even vector searches complete well below a quarter second. The sync lag between the vault and vector index is kept below 300 ms on average (500 ms worst-case), which underpins the read-your-writes guarantee. Moreover, the system targets a failure rate 0.01% for memory operations — achieved via comprehensive error handling, retries, and the WAL safety net.” To validate these performance claims, the team has implemented unit tests and load tests on the memory layer. For example, concurrent write tests confirm the system can handle 100+ writes per second with proper locking. A Prometheus metric artemis_memory_write_latency_ms is recorded for every write operation, and a Grafana dashboard tracks these distributions in real-time. Empirical results show the p95 write latency holding around 120–150 ms in a typical deployment, leaving headroom within the 200 ms budget. This robust performance of the memory bus under load is a key enabler for the overall scalability of Artemis City.

(The Memory Bus API is further illustrated in the configuration example in Appendix A, including sample code snippets for initialization and error handling.)

3. Hebbian Learning Engine

A cornerstone of Artemis City’s adaptive memory is the Hebbian Learning Engine, which imbues the system with a form of experience-based plasticity. This engine tracks the co-activation of agents and tasks, strengthening or weakening associations based on outcomes, in alignment with the principle that “agents that succeed together, learn to associate”. Whitebook v3 introduces a fundamental upgrade to the Hebbian mechanism: replacing the binary ±1 update rule with a morphological learning formula, introducing domain-locked agent specialization, and presenting comprehensive simulation evidence that validates the architecture’s superiority over unconstrained routing, monolithic models, and k-NN inference.

3.1 Mechanism and Update Rules

v3 Hebbian Update Formula. In v2, the simplest implementation used a binary ±1 update rule: each successful completion adds +1 to the weight, and each failure subtracts 1. Whitebook v3 replaces this with a bounded morphological update: ΔW = tanh(a · x · y) Where: a = learning rate (default 0.1) — controls sensitivity x = input signal magnitude (task complexity or confidence) y = outcome signal (+1 for success, -1 for failure) tanh = hyperbolic tangent — bounds the update to [-1, +1], preventing runaway weight growth Anti-Hebbian update on failure: ΔW = -η (η = 0.1) This is explicitly morphological, not neural backpropagation. The tanh bounding ensures that weights accumulate routing intelligence gradually without exploding. A single catastrophic failure does not destroy accumulated trust; conversely, a single success does not grant permanent monopoly. Weight Deviation Signal. The accumulated routing intelligence of an agent is measured by the deviation from cold-start: Intelligence = |W - 1.0| At cold start, all agents begin at W = 1.0 (equiprobable selection). As the system learns, weights diverge — high-performing agents accumulate W >> 1.0, poor performers decay toward 0. The magnitude of this deviation is the learned routing signal. Weight Decay. To prevent cementing outdated associations, weights decay per timestep: W ← 1.0 + (W - 1.0) × α (α = 0.995) This pulls weights back toward the cold-start baseline at a rate of 0.5% per step, ensuring that agents must continuously prove their value. A connection unused for 30 days loses approximately 5% of its accumulated signal. This keeps the system “plastic and focused” — able to learn new patterns while shedding old habits.

3.2 Domain-Locked Agent Architecture

The central architectural innovation of v3 is the recognition that ATP ActionType is not merely a metadata label — it is a domain boundary. Each ActionType maps to a structurally distinct class of computation:
ActionTypeDomain FunctionStructural Character
Executef(x) = 2x₀ + 3x₁Linear — direct computation
Scaffoldf(x) = -2x₀² + x₁Quadratic — structural planning
Summarizef(x) = 5·sin(x₂) + x₀Sinusoidal — periodic pattern extraction
Reflectf(x) = x₀² + sin(x₁) + x₂Mixed nonlinear — meta-cognitive analysis
This is the key insight: a summarizer does not research, a planner does not execute. In the Artemis City marketplace, “concept drift” means the distribution of task types changes over time (e.g. more Scaffold tasks during a planning phase, more Execute tasks during deployment) — but the generating function within each domain remains stable. An Execute specialist gets better at Execute tasks; it never needs to learn Scaffold. Domain-Locked Selection Rule: P(select_i | task_type_t) = 1 if W_i,t = max(W_domain_t) Agents are HARD-CONSTRAINED to their ActionType domain. When a task arrives with ActionType = Execute, only Execute-domain agents are eligible. Among those, the agent with the highest Hebbian weight is selected. This eliminates cross-domain pollution entirely. Architecture: 4 domains × N agents per domain. Default configuration: 3 agents per domain = 12 total agents. The ActionType is declared in the ATP payload, not inferred. The parser reads the ActionType field from the structured message and routes directly to the appropriate domain pool. This is O(1) routing — a hash table lookup followed by a max-weight selection within a small pool.

3.3 Marketplace Simulation Results

Whitebook v3 includes comprehensive simulation evidence validating the domain-locked marketplace architecture. All simulations use: 1000 tasks with concept drift across 3 phases Phase 1 (tasks 0–334): Execute=55%, Scaffold=20%, Summarize=15%, Reflect=10% Phase 2 (tasks 335–667): Execute=15%, Scaffold=40%, Summarize=15%, Reflect=30% Phase 3 (tasks 668–1000): Execute=10%, Scaffold=15%, Summarize=35%, Reflect=40% Random seed 42 for reproducibility 600-sample pre-training corpus per agent, scoped to domain function

3.3.1 Performance Comparison (v4 Simulation)

ConditionTotal MAEvs DL Trained
DL Trained (3/dom)1,938baseline
DL Cold (3/dom)1,967+1.5%
Unconstrained Marketplace10,289+431%
Single MLP (monolithic)9,617+396%
k-NN Optimized10,087+420%
Key findings: Domain-locked trained routing achieves 81.2% lower MAE than unconstrained marketplace 79.8% lower MAE than single monolithic MLP 80.8% lower MAE than k-NN optimized lookup 180× lower computational cost than k-NN (O(N) vs O(W·N)) Even DL Cold (untrained weights, domain lock only) outperforms all baselines

3.3.2 Decomposition of Gains

Starting from the unconstrained baseline (10,289 MAE): Domain-locking alone (cold start, no pre-training): reduces to 1,967 MAE — the architectural constraint is responsible for the majority of the improvement
  • Scoped pre-training (600 samples per agent, domain-specific): reduces to 1,938 MAE — pre-training provides incremental refinement This demonstrates that the architecture is the primary driver, not the training data volume. Domain boundaries prevent cross-domain interference that degrades unconstrained systems.

3.3.3 Boundary Conditions

ConditionMAEvs Trained BaseStill Beats MLP?
DL Trained (base)1,938
20% Mislabel~1,980+2.2%✓ (9,617)
40% Mislabel~2,100+8.4%✓ (9,617)
80% Skewed distribution~2,180+12.5%✓ (9,617)
Mislabel tolerance: Even with 40% of tasks assigned to the wrong ActionType domain, the domain-locked system still outperforms a monolithic MLP. This implies the system needs >60% classification accuracy to retain its advantage over generalist baselines — a practical threshold achievable by any competent ATP parser.

3.3.4 Within-Domain Competition

Agents/DomainMAEvs 3/dom
1/dom (monopoly)1,967+1.5%
3/dom (default)1,938baseline
5/dom (competitive)1,906-1.7%
Within each domain, 100% monopoly emerges: one agent captures all routing weight through consistent performance, winning every selection. The 1→3 jump (1.5% improvement) shows that competition provides selection pressure; the 3→5 jump (additional 1.7%) shows diminishing returns — 3 agents per domain is the practical sweet spot.

3.4 Decay, Retention, and Pruning of Knowledge

While weights increase or decrease with each interaction, the Hebbian engine also applies a decay model to ensure the network adapts to changing conditions and does not cement outdated knowledge. If a connection hasn’t been activated in a long time, its weight is automatically decayed by a factor over time. In the current policy, any link unused for 30 days loses ~5% of its strength (α ≈ 0.95 per month). This gradual fading means the system “forgets” associations that are no longer relevant, unless they are reinforced again by new successes.
In addition to continuous decay, Artemis City implements an explicit retention policy for long-term memory management. Knowledge nodes (e.g. a Markdown note in the vault) that have not been accessed or updated in over 180 days are moved to an archival state (read-only but still available) to declutter active memory. Archived nodes can be restored to active status manually (via a UI called the "Visual Cortex") or automatically if accessed by an agent, which gives them a weight boost to signal renewed relevance. Finally, any node whose weight decays below a critical threshold (e.g. weight <0.01 after a year of disuse) may be scheduled for permanent deletion. This ensures the memory footprint remains sustainable and focused.
To prevent losing potentially important knowledge, provenance logs are kept whenever decay or deletion events occur. An example log entry for a decay event is shown below, recording the node ID, old and new weights, timestamp, and reason for change:
"event": "memory_decay",
"timestamp": "2025-12-08T14:35:00Z",
"node_id": "regulation_2019_v2",
"previous_weight": 0.45,
"new_weight": 0.38,
"reason": "unused_60_days",
"last_access": "2025-10-09T08:20:15Z"
In practice, the combination of Hebbian learning with decay and retention policies yields a self-curating knowledge base. High-value information and successful agent-task pairings are reinforced and retained, while stale or low-value ones phase out gracefully. Finally, the Hebbian module provides a maintenance function to prune weak connections on demand. For example, an admin or a periodic job can remove all links with weight below a threshold (say 0.5) to simplify the graph. In v2, a one-line utility prune_weak_connections(threshold) is available to perform this cleanup, returning the count of pruned links. This helps in keeping the memory graph performant by eliminating noise.

3.5 Vectorization and Semantic Inference

Beyond improving task routing through weight-based preferences, the weighted graph produced by the Hebbian learning engine can itself be leveraged for emergent semantic insights. Artemis City can project an agent’s connection profile into a vector space to measure similarities and discover clusters. Concretely, for each agent we construct an “agent embedding” vector where each dimension corresponds to a particular task or node, and the value is the agent’s weight for that connection. By comparing these embedding vectors between agents, we can quantify which agents have similar skill profiles or find redundancy in capabilities via cosine similarity. For instance, if AnalystAgent and ResearchAgent end up with very similar weight vectors, the system may infer they are overlapping in function and possibly consolidate tasks between them or share best practices. Similarly, the pattern of weights can reveal communities of practice or workflow patterns. The system might notice that certain sequences of tasks commonly occur together with high weights forming a chain. In future work, a pattern mining function find_sequential_patterns(min_frequency) could be run on the activity logs to automatically surface frequent agent collaboration patterns. Although these advanced analyses go beyond the basic Hebbian weight updates, Whitebook v3 lays the groundwork by ensuring all weight changes are propagated and stored in a way that supports such vector-space operations. All Hebbian updates are also reflected in Supabase as saliency metadata for the corresponding vector embeddings. In effect, the vector index not only stores static embeddings of the content, but now also encodes dynamic importance scores (e.g. highly weighted links could boost the ranking of certain nodes in search results). This cross-pollination means that as the system learns which knowledge is most salient, that knowledge becomes easier for agents to retrieve (a positive feedback loop improving semantic recall).

3.6 Integration with Memory Synchronization

It is important to note that the Hebbian learning engine operates in concert with the Memory Bus. Whenever a link weight is updated (for example, Agent X succeeds at Task Y and thus weight(X→Y) is updated via tanh(a·x·y)), this change is immediately synchronized to the persistent stores. The updated weight is written into the Obsidian vault (e.g. perhaps in the frontmatter of the related note or a separate graph metadata file) and the change in saliency is pushed to Supabase to update any affected vector ranking. In v2, we introduced a HebbianSyncService that batches these weight updates for efficiency — e.g. accumulating up to 100 link changes or 100 ms of updates before performing a bulk sync to the database. This prevents thrashing the vector index during bursts of activity, while still keeping knowledge metadata fresh in near real-time. During concurrent operations, the system handles potential conflicts between learning updates and query operations. If an agent query is in progress while weights are being adjusted, those weight changes will not impact the ongoing query’s results (to ensure consistency), but they will be applied to subsequent queries. This is analogous to eventual consistency: the learning updates propagate with a very short delay, after which all new queries see the improved saliency rankings. Tests confirm that even under heavy load (e.g. dozens of agents learning and querying simultaneously), the synchronization mechanism maintains at most a few hundred milliseconds of lag and never drops an update (verified via event logs). In summary, the Hebbian learning engine provides Artemis City a rudimentary form of memory and skill adaptation. By quantitatively capturing which agent-actions are effective, and feeding that back into both the routing logic and the knowledge base, the system moves toward autonomous optimization.

4. Active Sentinel & Immune System

Whitebook v3 promotes the sentinel from a passive monitoring layer to an active immune system that detects routing pathologies and intervenes in real-time. The sentinel does not merely flag anomalies — it learns from oscillation patterns and reroutes traffic to force exploration when the system becomes stuck.

4.1 Oscillation Detection

The sentinel monitors a rolling window of prediction errors within each domain. The key metric is the sign-change rate — the frequency at which error direction alternates between consecutive tasks: oscillation_rate = count(sign(e_t) ≠ sign(e_t-1)) / window_size Parameters: Window size: 30 tasks (tunable per domain) Oscillation threshold: 0.35 (35% sign-change rate triggers intervention) High oscillation indicates that the currently-selected agent is producing inconsistent results — sometimes good, sometimes bad — which suggests it may be operating near the boundary of its competence or receiving adversarial inputs.

4.2 Active Rerouting Mechanism

When the oscillation rate exceeds the threshold, the sentinel executes a reroute intervention:
if oscillation_rate > threshold:
  dominant_agent = argmax(W_domain)
  W[dominant_agent] *= reroute_penalty    # (penalty = 0.5)
  reroutes += 1
This halves the dominant agent’s weight, temporarily equalizing the competitive landscape and forcing the router to explore alternative agents in the domain. The penalty is not permanent — if the dominant agent truly is the best performer, it will re-accumulate weight through subsequent successes. But if it was failing, the system discovers a better alternative.

4.3 Simulation Results (v5 Test 1)

MetricPassive (v4)Active (v5)
Total MAEbaseline-17 improvement
Improvement+0.9%
Total reroutes016
Reroute concentration100% in Scaffold domain
Key insight: All 16 reroutes occurred in the Scaffold domain, which uses the quadratic generating function (f(x) = -2x₀² + x₁) — the most volatile domain. The sentinel correctly identifies where intervention is needed and leaves stable domains alone. This is emergent behavior, not programmed: the sentinel learns which domains are pathological through the oscillation signal.

4.4 Sentinel as Learning System

The sentinel embodies Apollo’s principle: every failure teaches. The rerouting mechanism creates a feedback loop: Agent fails → error oscillation increases Sentinel detects oscillation → reroutes to alternative Alternative succeeds → accumulates weight via Hebbian update Original agent’s weight decays → system has learned to avoid it If original agent improves → it can earn weight back This is an immune-system analogy: the sentinel is the thymus that identifies underperforming “cells” (agents) and triggers exploration of alternatives. The system doesn’t need to be told which agent is failing — it discovers this through the oscillation signal and acts autonomously.

5. Hebbian + k-NN Reconciliation Layer

Whitebook v3 positions the Hebbian marketplace not as a replacement for expensive k-NN or LLM inference, but as a cheap elimination layer — an assistant that filters bad options before the expensive system makes its final judgment. This reconciliation architecture dramatically reduces cost while maintaining accuracy.

5.1 Two-Layer Architecture

Layer 1: Hebbian Domain-Locked Router (cheap, O(1)) → Selects best agent in domain by weight → Produces prediction Layer 2: k-NN Verification (expensive, O(W)) → k=5 nearest neighbors in W=200 step window → Produces independent prediction Reconciliation Logic:
if |heb_pred - knn_pred| < threshold (3.0):
  AGREE → use cheap Hebbian answer (no k-NN cost)
else:
  DISAGREE → weighted average based on Hebbian confidence
  confidence = max(0.5, min(0.9, W[agent] / 5.0))
  final = confidence × heb_pred + (1 - confidence) × knn_pred
The insight is that most of the time, Hebbian routing produces the same answer as k-NN but at a fraction of the cost. You only invoke the expensive system when the cheap system flags uncertainty (disagreement). This is analogous to a medical triage: the nurse (Hebbian) handles routine cases; the specialist (k-NN) is called only when something unusual is detected.

5.2 Simulation Results (v5 Test 2)

| Metric                          | Value    |
| ------------------------------- | -------- |
| Hebbian alone MAE               | baseline |
| k-NN alone MAE                  | higher   |
| Reconciled MAE                  | 2,605    |
| Agreement rate                  | ~85%     |
| Disagreement rate               | ~15%     |
| In disagreements, Hebbian right | 94%      |
| Reconciled cost vs k-NN         | 71.9%    |
Critical finding: When Hebbian and k-NN disagree, Hebbian is correct 94% of the time. This is because the domain-locked agent has accumulated specialized knowledge through its Hebbian weight history — it “knows” its domain better than a general-purpose nearest-neighbor lookup.

5.3 Cost-Performance Tradeoff

The reconciliation layer achieves the best of both worlds: Cost: 71.9% of pure k-NN (saves 28.1% of computation) Accuracy: Better than either system alone — the weighted disagreement resolution captures cases where each system excels Scalability: As the agent population grows, Hebbian routing remains O(1) per domain; k-NN cost grows linearly with window size For production deployment in high-stakes domains (healthcare, utilities, regulatory compliance), the reconciliation layer provides an auditable decision trail: “Hebbian routed to Agent X with confidence 0.82; k-NN concurred” or “Hebbian and k-NN disagreed; reconciled at 0.7×Heb + 0.3×kNN.” This transparency is essential for regulatory compliance.

6. Distributed Weight Resilience & System Antifragility

One of the most powerful emergent properties of the Hebbian marketplace is its intrinsic resistance to single-point corruption. Unlike monolithic models where a single poisoned training batch can degrade the entire system, the distributed weight graph routes around damage automatically. Whitebook v3 validates this with four resilience tests.

6.1 Corpus Corruption Resistance (v5 Test 3)

Scenario: At task #300, one Scaffold-domain agent’s corpus is corrupted with 100 garbage samples. The simulation then runs an additional 700 tasks to measure damage propagation.
MetricHebbian MarketplaceMonolithic MLP
Damage (MAE increase)-1.0%+0.8%
Corrupted agent selections (post-poison)0N/A (all tasks affected)
System recoveryImmediate (automatic deselection)No recovery (permanent degradation)
Why it works: The corrupted agent’s predictions immediately become worse. The Hebbian update rule penalizes its weight (anti-Hebbian: ΔW = -η per failure). Within a few tasks, its weight drops below competitors. The routing graph automatically deselects it — it receives 0 further task assignments. The damage is contained to a single agent rather than propagating through the entire system. This validates Apollo’s core thesis: one corpus is easily corruptible, but the Hebbian weight graph learns from each movement. The distributed nature of the routing weights IS the resilience.

6.2 Missing Agent Flow Detection (v5 Test 4)

Scenario: At task #500, a new task type (“Optimize”) begins appearing at 30% frequency. No agent has been trained on this function (f(x) = 3·exp(-0.5·x₀²) + 2·x₁). The system must detect the gap.
MetricValue
Pre-gap failure rate (tasks 0–499)0.049
Post-gap failure rate (tasks 500–999)0.353
Signal ratio7.2× increase
DetectionCLEARLY DETECTABLE
A 7.2× spike in the rolling failure rate is an unmistakable signal that a new, unhandled task type has emerged. In production, this triggers the kernel’s expansion workflow: Detect: Failure rate exceeds 3× baseline for >50 consecutive tasks Diagnose: Isolate the failing tasks by ActionType — identify the gap domain Expand: Register a new agent flow for the detected domain Train: Pre-train new agents on the accumulated failure examples Route: Update domain routing tables to include the new domain This is how Artemis City grows organically — not through manual configuration, but through failure-driven expansion. High spikes in failure mean a missing agent flow, which the system spots and the kernel workflow registers.

6.3 Domain Ceiling and Expansion Triggers (v5 Test 5)

Scenario: Execute-domain tasks progressively increase in complexity (nonlinear contamination factor grows by 0.003 per task after task #400). This tests what happens when a domain’s agents hit the ceiling of their capability.
QuartileExecute MAEComplexity Factor
Q1 (simplest)1.0860.000
Q2~3.5~0.3
Q3~6.5~0.9
Q4 (hardest)9.624~1.8
Ceiling detected at Execute task #67 — the point where error exceeds 3× the baseline average. This triggers the expansion signal: the domain needs more capable agents or a new sub-domain specialization. Domain ceiling triggers expansion, not system failure. When a researcher needs full executed builds of data models, having that context of coding and systems will bloat — it can’t scale. So those high-risk agents will give quick outcomes but they need to learn. The system detects this ceiling and responds by expanding capacity rather than degrading gracefully.

6.4 Learning Velocity (v5 Test 6)

Scenario: Measure how quickly each system recovers after a failure event (error > 5.0). Recovery = 3 consecutive successes below threshold.
SystemRecovery StepsEvents
Hebbian (early phase)4.1 steps
Hebbian (mid phase)4.3 steps
Hebbian (late phase)4.6 steps
MLP (early phase)17 steps
MLP (mid phase)20 steps
MLP (late phase)24 steps
Hebbian recovers 4–5× faster than MLP. This is because the Hebbian system’s failure triggers an immediate routing response: the failing agent loses weight, competitors gain opportunity. The MLP must retrain its entire parameter space — a much slower process. The slight increase in Hebbian recovery time across phases (4.1 → 4.6 steps) reflects the increasing complexity of concept drift, not degradation. The MLP’s increasing recovery time (17 → 24 steps) shows compounding confusion as the monolithic model tries to accommodate all domains simultaneously.

7. Agent Registry and Task Routing

Coordinating a city of agents requires a clear registry of available agents, their capabilities, and a strategy to assign tasks optimally. Artemis City addresses this with an Agent Registry — a central catalog that knows about each agent’s skill set and trust metrics — combined with a domain-locked router that matches tasks to the best-suited agent. In Whitebook v3, the agent registry has been enhanced with domain-locked Hebbian routing, and the task routing logic now leverages both ATP ActionType domains and Hebbian weight competition to make assignments.

7.1 Agent Profiles and Capabilities

Every agent in Artemis City is represented in the registry by an Agent Profile containing its metadata: name, description, declared capabilities (skills or tools it can handle), and domain assignment. In v3, each agent is explicitly bound to an ActionType domain at registration time. For example, an agent might be registered as:
{
  "id": "executor_01",
  "domain": "Execute",
  "capabilities": ["linear_computation", "data_processing"],
  "sandbox_level": "strict",
  "trust_threshold": 0.75,
  "hebbian_weight": 1.0
}
When a new agent is added to the system, it must be registered in the Orchestrator along with its domain and capabilities. This allows the kernel to filter which agents are eligible for a given task based on both the task’s ActionType domain and specific capability requirements. Beyond static capabilities, the Agent Registry in v3 maintains dynamic status and context for each agent. This includes whether the agent is currently busy or idle, any resource usage quotas, and a history of tasks completed. These details help ensure that tasks are routed not just to an agent who can do it, but one who is available and not overloaded at that moment. The design is analogous to an operating system process scheduler combined with a service directory for available functionalities.

7.2 Trust Scoring and Composite Metrics

To facilitate merit-based task assignment, Artemis City v2 introduced a scoring system for agents. Each agent is evaluated along three key dimensions: Alignment, Accuracy, and Efficiency. These are defined as follows: Alignment Score: Measures how well the agent adheres to policies and instructions (ethical guardrails, no unauthorized actions, etc.). An agent that attempts disallowed actions or produces off-mission outputs will have a lower alignment score. Accuracy Score: Measures the quality of the agent’s outputs (e.g. correctness of answers, relevance of information). This can be inferred from task feedback or evaluation by a watchdog agent. Efficiency Score: Measures resource usage efficiency — how quickly and cost-effectively the agent completes tasks (latency, token usage, etc.). Each score is maintained as a normalized value in [0.0, 1.0]. A weighted composite score is then computed to summarize overall performance: in the current scheme, Alignment and Accuracy are weighted more heavily (40% each) and Efficiency contributes 20%. This formula can be tuned, but the default reflects that being correct and on-policy is more important than being fast. The registry updates these scores whenever new evidence comes in. A utility method update_score(agent_id, dimension, delta) adjusts a score by a delta and ensures it stays within [0,1]. Notably, this update function can also implement decay — for example, gradually reducing scores over time if an agent is inactive or if we want recent performance to count more. Every change in score is logged for audit, so one can trace how an agent’s trust profile evolves.

7.3 Adaptive Task Assignment Logic (ATP Routing)

When a new task arrives, the kernel (Orchestrator) consults the registry to decide which agent should execute it. In v3, this is a two-stage process combining domain-locked filtering with Hebbian weight competition: Domain Filter: The ATP parser extracts the ActionType from the task payload. Only agents registered in the matching domain are considered. A task labeled “Execute” would only consider agents in the Execute domain pool. Hebbian Weight Selection: Among the domain-filtered candidates, select the agent with the highest Hebbian weight: selected = argmax(W[domain_agents]) If multiple agents tie, additional criteria like composite trust score or random selection can break the tie. Dispatch: Assign the task to the selected agent. The task note’s agent field can be updated to reflect the chosen agent for transparency. This logic implements a meritocratic routing — over time, high-performing agents get more tasks, and low-performing ones get fewer (or only tasks they are uniquely capable of). This creates a positive reinforcement loop aligned with the Hebbian learning: an agent that succeeds raises its score and thus is more likely to get future tasks in its wheelhouse. The term ATP appears in our system (e.g. ATPParser) as shorthand for a structured message protocol used internally for agent task planning. “ATP” stands for Agent Task Protocol. When agents communicate complex plans or results to each other or back to the kernel, they format it with special tags (Mode, Priority, ActionType, TargetZone, etc.) that the ATPParser can read. In the context of routing, if an agent produces an ATP message indicating a new sub-task, the parser will extract the intended target zone or required capability and then the registry’s routing logic can assign the sub-task to the appropriate agent. The parsing latency is minimal (measured in the order of a few milliseconds) and is also instrumented for monitoring.

7.4 Routing Efficiency Benchmark

A key advantage of the domain-locked Hebbian routing over both LLM-based approaches and traditional registry-based routing is efficiency. Whitebook v3 includes an updated quantitative comparison:
RouterLatencyCostAccuracy (MAE)
Hebbian Domain-Locked~7 ms~$0.001,938
Registry (v2 composite score)~7 ms~$0.00~10,289 (unconstrained)
LLM-Based Router~800 ms~$0.05/decision~9,617 (single MLP)
k-NN Optimized~50 ms~$0.01/decision~10,087
The Hebbian domain-locked router achieves the best accuracy at the lowest cost: 7 ms latency, zero API cost, and 81.2% better MAE than the next-best alternative. Even more importantly, the approach is deterministic and transparent: one can explain exactly why an agent was chosen (highest Hebbian weight in the ActionType domain), whereas an LLM’s choice might be opaque.

8. Agent Sandboxing and Isolation

While the Agent Registry ensures tasks go to competent agents, Artemis City must also ensure agents operate within safe bounds. To that end, all agents run in a sandboxed execution environment that strictly limits their actions to an approved set. Whitebook v2 added a detailed Sandbox Policy specification and logging of any policy violations. This mechanism prevents agents from performing unauthorized operations (such as arbitrary file access or external network calls) and is crucial for system security and alignment.

8.1 Sandbox Policy and Allowed Actions

By default, each agent in Artemis City is confined to a “default jail” that permits only interactions with the memory bus API and its assigned in-memory workspace. This means an agent can read or write notes via the provided Obsidian integration, and call approved tools or APIs that are explicitly whitelisted for it, but nothing more. Any system calls outside these bounds are intercepted by the sandbox layer. Key aspects of the sandbox policy include: Allowed Syscalls: The only filesystem operations allowed are reading/writing within the designated vault directories (or other resource directories given to that agent). Direct file access to other paths, network access, or spawning processes is disallowed unless explicitly configured. Tool Whitelisting: If an agent has tools (e.g. a web search module, or a Python execution ability), each tool must be declared in its profile and whitelisted. The sandbox checks an agent’s action (like “invoke tool X”) against its whitelist. Unknown or unapproved tools result in a violation. Privilege Levels: No agent by default has root or system-level privileges. If an agent needs elevated access for some reason, a special approval (and likely a watchdog override) is required. Even then, such high-privilege actions are heavily monitored. This prevents agents from escalating their privileges or tampering with core system processes. In effect, agents operate in a restricted environment where they cannot do things like call external APIs (unless through a provided safe method), read arbitrary files on the host, or modify code at runtime. They are constrained to their role as defined in the registry.

8.2 Violation Detection and Response

The sandbox layer actively monitors agent actions and flags any that fall outside the policy. We formalize several ViolationTypes for categorizing incidents: UNAUTHORIZED_FILE_ACCESS — e.g. agent tries to read or write a file outside its allowed directory. UNAUTHORIZED_NETWORK — e.g. agent attempts an HTTP call when none are permitted. TOOL_NOT_WHITELISTED — e.g. agent invokes a tool or command not on its allowed list. PRIVILEGE_ESCALATION — e.g. agent tries to gain higher permissions or access restricted functions. When an action is checked, if it violates policy, the sandbox immediately denies the action and logs the details via a warning log entry. The log includes the agent ID, the action attempted, and the reason for denial. Internally, the sandbox also keeps a counter of how many violations an agent has incurred in the past 24 hours. If an agent accumulates 3 violations within 24 hours, the system will quarantine that agent automatically. Quarantine means the agent is temporarily suspended from receiving new tasks and its processes may be halted for inspection. The quarantine status can later be cleared manually by an admin or by a governance routine if the agent is deemed safe after review. Importantly, each violation also has an effect on the agent’s trust scores (Section 7.2). In particular, a policy violation will reduce the agent’s Alignment score since it did something against the rules. The registry’s update_score(agent, "alignment", negative_delta) is invoked on each violation, incrementally lowering trust. Thus, an agent that keeps hitting sandbox walls will soon find its composite score too low to be auto-selected for tasks (and might be subject to governance interventions).

8.3 Logging and Example Sandbox Run

All sandbox events are recorded to the central log (mcp_obsidian.log) with a special [SANDBOX_VIOLATION] tag for easy filtering. This log serves both as a realtime alert mechanism for the watchdog agent and as an audit trail for later analysis.

8.4 Alignment and Automated Penalties

The sandbox mechanism acts as a first line of defense for alignment. By constraining the possible actions of agents, it significantly reduces the risk of errant behavior. However, if an agent consistently attempts disallowed actions, it indicates a misalignment either in its objectives or understanding. Artemis City takes such signals seriously: Alignment Penalty: Each violation lowers the agent’s Alignment trust score, directly impacting its chance of being selected for tasks. Watchdog Intervention: A specialized governance agent (or “watchdog”) monitors the stream of sandbox logs. If an agent shows a pattern of violations, the watchdog can proactively intervene by pausing the agent or flagging it for review. Rehabilitation: The system is designed to allow agents to regain trust over time if they correct their behavior.

9. Governance and Self-Modification Workflow

One of the most forward-looking features of Artemis City is its ability to self-evolve — agents (or the system as a whole) can propose modifications to their own code or behaviors to improve over time. However, this poses obvious risks if not properly governed. Whitebook v2 introduced a formal CI/CD-style governance pipeline for any self-modification, ensuring that safety, reliability, and alignment are preserved. In essence, we treat self-generated updates similarly to how a software team treats pull requests: they must go through testing, review, and approval stages.

9.1 CI/CD Pipeline for Agent Self-Updates

Any change to the system initiated from within goes through a structured Proposal Pipeline: Self-Modification Proposal → Sandbox Testing → Static Analysis → Performance Regression Check → Trust-Weighted Approval → Staged Rollout → Monitor This pipeline is executed by the Governance Kernel — a component (or set of watchdog agents) responsible for vetting changes. The stages include: Proposal Creation: An agent or subsystem submits a proposed change along with metadata (who/what is proposing, rationale, timestamp). Sandbox Testing: The proposed change is deployed in an isolated sandbox where a battery of tests run (e.g. 1000 simulated queries). Static Analysis (Lint/Check): The change is analyzed without execution — e.g., code linting to ensure it follows schemas and doesn’t contain known bad patterns. Performance Regression Check: The system measures if the change would slow things down. If latency increases more than 20%, it’s flagged. Trust-Weighted Approval: Based on the proposing agent’s trust score and test results, the change is either auto-approved, conditionally approved, or requires human review: Score > 85%: Auto-approve minor changes Score 70–85%: Monitored auto-approval with expanded monitoring Score < 70%: Mandatory human approval Staged Rollout: For any approved change, deployment is done gradually (e.g. 10% → 50% → 100%). Monitoring: After rollout, the governance system closely monitors key metrics for signs of trouble.

9.2 Rollback Protocol and Audit Trails

No system is perfect, so Artemis City defines a robust rollback protocol to quickly undo a self-update that goes awry. The rollback mechanism has two triggers: Automatic rollback on failures: If the error/failure rate jumps above 5% within one hour of a change going live, it will automatically revert to the last known good state. Kill Switch: At any time, a human admin or the highest-privilege watchdog can hit a kill switch to immediately revert the system to a previous checkpoint. Under the hood, the system keeps versioned agent definitions and configurations (akin to a git version history). Each approved change increments a version, and the prior version is stored so it can be rolled back to. The Quantum Lock mechanism (Section 11) further helps by providing a hash of the last good state, so the system can verify it has indeed returned to that state after rollback. Every proposal and its outcome are recorded in an Audit Log containing the proposing agent ID, a diff of proposed changes, approval decisions, timestamps, and any rollback events.

9.3 Transparency and Plasticity

The governance framework not only prevents bad updates but also encourages accountability. Agents know (or can be designed to know) that their proposals will be scrutinized, which may incentivize them to be more careful. All stakeholders can inspect the logs and see how decisions are made. Whitebook v2 referred to these self-update and learning capabilities as “plastic workflows,” highlighting that the system’s architecture is not rigid — it can reconfigure itself. However, the plasticity is channeled through the above governance process to ensure it results in positive growth (like brain plasticity guided by a prefrontal cortex). In closing, Artemis City’s approach to self-modification can be seen as an early implementation of ethical autonomic computing — the system manages its own evolution in a manner that is auditable and aligned with predefined goals. This is a step toward scalable AGI architectures that can learn and adapt safely in production.

10. Memory Decay and Retention Policy

(This topic was partially covered in Section 3.4, but here we consolidate the formal policy.) In Whitebook v1, “memory decay” was mentioned abstractly as a way to prevent knowledge overload, but specifics were lacking. Version 2 codified a Memory Decay & Retention Policy that governs how knowledge in the Artemis City vault ages, when it gets archived, and when it’s ultimately deleted. By managing memory lifespan, the system avoids stale information cluttering the active context and focuses reasoning on current, relevant knowledge.

10.1 Decay Schedule and Thresholds

Link Decay: Every association in the knowledge graph carries a weight that can diminish over time if not used. Concretely, if a link hasn’t been activated in 30 days, its weight is reduced by ~5%. This decay is applied periodically (e.g. daily or weekly maintenance job). The 5%/30d figure is a default; more critical links could have slower decay, while less important ones decay faster. Stale Node Archival: If a knowledge node sees no accesses or updates in 180 days, the system flags it as stale and moves it to an archive state. Archival can mean moving the Markdown file to an “Archive” folder in Obsidian and marking it read-only. Archived nodes are effectively out of the active loop. Deletion Threshold: If a node remains untouched even longer (e.g. 365 days) and its weight falls below a very small threshold (like 0.01), it may be deemed obsolete and permanently deleted. Deletion is the last resort for memory cleanup, used sparingly.

10.2 Archival and Restoration Mechanisms

Archiving is not a one-way street. Artemis City provides ways to resurrect archived knowledge if it becomes relevant again: If an agent explicitly queries something that’s in an archived node, that access triggers an immediate weight boost (+10%) and automatic un-archiving. The “Visual Cortex” interface allows human operators to browse archived nodes and manually restore them. Archived nodes remain visible in a special tagged way (e.g., marked with [ARCHIVED] in the graph view). Agents generally ignore them unless directly told to look.

10.3 Decay Event Logging

All actions taken by the decay/retention subsystem are logged. Specifically, whenever a weight decay is applied, an archive is performed, or a deletion occurs, a structured event is recorded in the system logs. This log serves both as a way to audit memory changes and as a potential trigger for learning. These logs are summarized in a weekly maintenance report which is part of the KPI Dashboard (Section 13). Stakeholders can review how much knowledge was archived or deleted, ensuring memory management aligns with expectations.

11. State Integrity Checkpoints (“Quantum Lock”)

As Artemis City continuously learns and evolves, it is vital to have guarantees about the integrity of its state — particularly the memory graph that underpins all reasoning. Whitebook v2 introduced the concept of “Quantum Lock” checkpoints, a security measure designed to ensure that the system’s knowledge base and agent definitions have not been tampered with. This feature can be seen as analogous to version control checksums or blockchain hashes ensuring data integrity over time.

11.1 Concept and Purpose

The term Quantum Lock in Artemis City is a fanciful name for what is essentially a state integrity checksum. At designated intervals (for example, every midnight, or before and after any self-modification as in Section 9), the system computes a cryptographic hash of the entire memory state — including the content of all knowledge nodes, critical metadata like weights, and the registry of agents and their code versions. This hash acts as a compact fingerprint of the state. The primary purposes of the Quantum Lock are: Tamper Evidence: If any unauthorized or accidental change is made to the knowledge graph or agent definitions, the next hash will not match the expected value, indicating a discrepancy. Provenance Chain: Each checkpoint hash can be signed digitally and recorded, creating a chain of trust. Recovery Assurance: In case of catastrophic failure or a need to roll back, having known-good hashes for previous states allows verification that the system has indeed been restored exactly to a prior state.

11.2 Implementation Approach

The Quantum Lock is implemented using standard cryptographic hashing (e.g., SHA-256) over a canonical serialization of the state. Checkpointing occurs at system startup and after any self-update process. Each checkpoint consists of the hash and a timestamp, and is stored in an integrity log. Optionally, each hash is signed with a private key. Verification: On startup, Artemis City will compute the hash of the current state and compare it to the last recorded hash. If they differ, the system raises an alert. During runtime, after a self-update (post deployment), a new hash is computed and compared to the pre-update hash plus the known delta of the update.

11.3 Integration in Artemis City Lifecycle

The Quantum Lock ties into the governance workflow: before applying a self-update, the system may take a pre-update hash; after the update, it takes a post-update hash and stores both. If later an investigation is needed, one can inspect the hashes to see exactly which state was in effect at that time. This forms a tamper-evident audit trail.

12. Morphological Computation & Efficiency Gains

One of the driving motivations behind Artemis City is achieving morphological computation benefits — leveraging structure (the knowledge graph and specialized modules) to handle tasks more efficiently than a monolithic LLM approach. Whitebook v3 backs up this claim with both the original graph-traversal benchmarks from v2 and the new Hebbian marketplace simulation data.

12.1 Baseline: LLM-Only Query Processing

Consider a complex analytical query: “Find the causal chain between Regulation X and Audit Failure Y (with about 6 degrees of separation).” In a pure LLM approach, we would need to provide the LLM with a lot of context because it has no prior knowledge organized. Suppose we dump all possibly relevant documents (~50,000 tokens) into the prompt. Baseline metrics: Token consumption: ~52,000 tokens. Latency: ~4.5 seconds. Cost: ~$0.78 per query.

12.2 Artemis City Graph-Assisted Query

With Artemis City’s knowledge graph (morphological structure), the query essentially asks for a path of relationships linking X to Y, which the system can attempt to find via graph traversal algorithms. It retrieves only the 6 relevant nodes (~800 tokens), then asks the LLM to synthesize from that focused context. Metrics with Artemis City:
Token consumption: ~1,200 tokens (<3% of baseline). Latency: ~450 ms. Cost: ~$0.02 per query.

### 12.3 Comparison and Savings

From the above scenario, Artemis City demonstrates:

~95.7% cost reduction ($0.78 → $0.02)
~90% latency reduction (4.5 s → 0.45 s)
~97.7% fewer tokens used (52k → 1.2k)
### 12.4 Hebbian Marketplace Efficiency (NEW in v3)

The domain-locked Hebbian marketplace extends morphological computation to agent routing itself. Rather than using an LLM to decide which agent should handle a task (costing ~$0.05 and ~800 ms per routing decision), or even a k-NN lookup (costing ~$0.01 and ~50 ms), the Hebbian router makes decisions in ~7 ms at zero marginal cost.

Routing efficiency comparison (from v4 simulation):

| Approach                | Routing Cost  | Routing Latency | Task Accuracy (MAE) |
| ----------------------- | ------------- | --------------- | ------------------- |
| LLM Router              | $0.05/decision | 800 ms          | ~9,617              |
| k-NN Router             | $0.01/decision | 50 ms           | ~10,087             |
| Hebbian Domain-Locked   | $0.00         | 7 ms            | 1,938               |
Over 1000 routing decisions: LLM routing cost: 50.00Hebbian:50.00 → **Hebbian: 0.00** (100% cost elimination) k-NN routing cost: 10.00Hebbian:10.00 → **Hebbian: 0.00** (100% cost elimination) LLM routing latency: 800 seconds → Hebbian: 7 seconds (99.1% reduction) The Hebbian router is not merely cheaper — it is more accurate. This inverts the usual cost-performance tradeoff: the cheapest option is also the best-performing option, because domain-locked specialization accumulates routing intelligence that generalist approaches cannot match.

12.5 Broader Implications

The success of morphological computation in Artemis City suggests a pathway to scaling AI systems: by pre-organizing knowledge and dividing labor between symbolic structures and statistical learners, we achieve performance that neither alone could easily reach. The LLM is only invoked for what it’s truly needed (complex language synthesis, nuanced reasoning), not for brute-force search or remembering facts. One could argue Artemis City is an instance of the “Mixture of Experts” paradigm, where the knowledge graph and agents are experts that handle parts of a query so that the LLM doesn’t act alone. The result is not just efficiency, but also the potential for higher accuracy: by narrowing context, we reduce irrelevant distractions and give the LLM less chance to go off-track.

13. Human Validators & Credential Marketplace

Whitebook v3 introduces a forward-looking component of the Artemis City ecosystem: the role of human validators as weighted participants in the quality assurance loop.

13.1 The Validator Economy

As AI agents handle an increasing share of operational tasks, humans displaced from direct execution roles return to the ecosystem as credentialed validators. Their role: review agent outputs for accuracy, safety, and domain compliance. This creates a new economic layer: Credential weighting: A validator with verified professional credentials (e.g. licensed engineer, certified auditor, medical professional) has their reviews weighted higher than hobbyist or anonymous feedback. The LLM with the most real credential-backed reviews is trusted more heavily in reconciliation decisions. Validated Output Sharing: The platform shares validated results across the agent ecosystem. Agents train on domain-specific validated outputs without cross-domain pollution — the domain-locked architecture prevents a Scaffold agent from ingesting Execute-domain validations. Access as commodity: Access to high-quality validated reviews becomes a tradeable asset within the marketplace. Agents or organizations that invest in credential-backed validation accumulate a reputation advantage.

13.2 Integration with Hebbian Weights

Human validation events feed directly into the Hebbian weight graph: Positive validation (human confirms agent output is correct): triggers Hebbian reinforcement ΔW = tanh(a · x · 1.0) with an amplification factor reflecting the validator’s credential weight Negative validation (human identifies error): triggers anti-Hebbian update ΔW = -η with the same credential amplification This creates a human-in-the-loop feedback cycle that is fully compatible with the autonomous Hebbian learning — human validators accelerate weight convergence in high-stakes domains while the automated system handles routine routing in low-risk domains.

14. Performance Monitoring and Instrumentation

With the numerous moving parts and ambitious performance goals in Artemis City, it is crucial to continuously monitor the system’s health and efficiency. Whitebook v3 outlines an expanded Performance Metrics & Instrumentation strategy that includes the new Hebbian marketplace KPIs alongside the original system-level metrics.

14.1 Key Performance Targets (KPIs)

From earlier sections, we compile the main system-level KPIs: Token Usage Budget: ≤ 10 million tokens per day, per deployment Cost Budget: ≤ $500 per day for a standard enterprise deployment Task Routing Latency: p95 < 100 ms, p99 < 250 ms Memory Sync Lag: p95 < 300 ms, p99 < 500 ms Agent Success Rate: > 95% task completion without escalation Cache Hit Rate: > 80% for repeated queries New in v3 — Hebbian Marketplace KPIs: Routing Accuracy: Domain-locked MAE < 2,000 (total over 1000-task benchmark) Domain Monopoly Rate: Within-domain convergence to single best agent > 90% of tasks Mislabel Tolerance: System outperforms MLP baseline at up to 40% mislabel rate Sentinel Intervention Rate: < 5% of tasks trigger active rerouting Reconciliation Agreement Rate: Hebbian-kNN agreement > 80% Corruption Resilience: System damage < 2% from single-agent corpus poisoning Learning Velocity: Mean recovery < 5 steps from failure event Missing Flow Detection: Failure-rate signal ratio > 5× for unhandled task types

14.2 Observability and Metrics Collection

To collect data for the above KPIs and other insights, Artemis City integrates with Prometheus (for metrics scraping and storage) and Grafana (for dashboards). We have instrumented the system with counters, gauges, and histograms for all critical operations: Latency Histograms: Every memory bus write logs to a histogram artemis_memory_write_latency_ms with buckets (10 ms, 50 ms, 100 ms, … up to 1000 ms). Similarly, routing decisions log to routing_decision_latency_ms and Hebbian weight updates log to hebbian_update_latency_ms. Gauges: Real-time values like artemis_memory_sync_lag_ms, active_agent_count, domain_monopoly_rate per ActionType. Counters: Cumulative counts such as artemis_memory_cache_hits_total, hebbian_reroute_count, sentinel_intervention_total, and reconciliation_disagreement_total.

14.3 Alerting and Continuous Improvement

Artemis City v3 also sets up alerts for when things go out of bounds. For example: If memory sync lag p99 exceeds 500 ms for more than 5 minutes, trigger an alert If domain monopoly rate drops below 70% in any domain, flag for investigation If sentinel intervention rate exceeds 10% in any domain, flag for agent retraining If reconciliation disagreement rate exceeds 25%, flag for threshold tuning If agent success rate falls below 95% on any given day, flag it for analysis These alerts ensure that any regression in performance or reliability is noticed quickly. All the collected data feeds into weekly review reports covering total tokens used vs budget, average latency achievements vs SLO, any incidents, and Hebbian marketplace performance metrics.

15. Version History and Change Log

This section highlights the major changes introduced in each version of the Artemis City Whitebook, serving as a quick reference for readers familiar with previous editions.

15.1 Changes from Whitebook v1 to v2

Memory Bus Consistency: New detailed specification. V2 defines the memory bus sync process end-to-end, including atomic dual-writes, read-your-writes guarantee, conflict resolution, and latency figures. Sequence diagrams illustrating write/read flows are now included (Section 2.1–2.2). Hebbian Learning Integration: Expanded. V1 mentioned adaptive learning vaguely; v2 provides exact Hebbian update rules (+1/−1), a decay model for weights, pruning functionality, and how weight changes propagate to the vector index in sync (Section 3). Agent Registry & Routing: Enhanced. V2 introduces the Agent Registry’s scoring system and adaptive routing logic. V1’s static agent assignment is replaced with a capability+score based router (Section 4.3) and a comparative benchmark showing 800 ms → 7 ms routing improvement. Sandbox Security: Formalized. V1 alluded to sandboxing; v2 defines a Sandbox Policy with allowed actions, violation types, and a three-strikes quarantine rule. Governance Workflow: Major addition. V2 introduces a CI/CD-style self-update pipeline with trust-based approval tiers, automated testing, and rollback protocols. Section 6 is entirely new. Memory Decay & Retention: New section. V2 specifies how and when knowledge “fades” or archives over time, with quantitative rules (30 days decay, 180 days archive, 365 days deletion). “Quantum Lock” Integrity: New concept. V1 mentioned “quantum lock theory” without definition; v2 retained and clarified it as a cryptographic state hash mechanism (Section 8). Morphological Computation Proof: Added empirical data. V1 claimed efficiency but gave no proof. V2 provides a full example with token/cost/latency savings and emphasizes morphological computing in Section 9. Performance Metrics & Dashboard: Added. V2 defines clear SLOs and KPIs (token, cost, latency, etc.) and describes the monitoring setup (Prometheus/Grafana) in Section 10.

15.2 Changes from Whitebook v2 to v3

Hebbian Learning Engine — Major Upgrade (Section 3): Update formula: Replaced binary ±1 rule with bounded morphological update ΔW = tanh(a · x · y). Added anti-Hebbian penalty ΔW = -η for failures. Domain-locked architecture: Agents hard-constrained to ATP ActionType domains. Each domain maps to a stable generating function. Cross-domain pollution eliminated. Weight deviation signal: |W - 1.0| as measure of accumulated routing intelligence. Simulation evidence: 81.2% MAE improvement over unconstrained, 79.8% over single MLP, 80.8% over k-NN. 180× cheaper than k-NN. Validated across boundary conditions (mislabel tolerance, distribution skew, competition sweep). Within-domain monopoly: 100% convergence to single best agent per domain — Hebbian weight competition produces natural specialization. Active Sentinel & Immune System — New Section (Section 4): Sentinel promoted from passive alerting to active rerouting on oscillation detection. Sign-change rate metric with configurable threshold (0.35). Reroute penalty (0.5×) forces exploration when dominant agent is oscillating. Simulation: 16 reroutes concentrated in highest-volatility domain (Scaffold). Hebbian + k-NN Reconciliation Layer — New Section (Section 5): Hebbian as cheap elimination layer (O(1)) feeding expensive k-NN verification (O(W)). Disagreement resolution: Hebbian right 94% of disagreements. Cost: 71.9% of pure k-NN — 28.1% computational savings with equal or better accuracy. Production-ready architecture for high-stakes domains (healthcare, utilities). Distributed Weight Resilience — New Section (Section 6): Corpus corruption: Poisoned agent auto-deselected (0 selections). System damage -1.0% vs MLP +0.8%. Missing agent detection: 7.2× failure-rate spike for unhandled task types — clearly detectable signal for kernel expansion workflow. Domain ceiling detection: Expansion trigger at performance plateau. Organic growth, not configured. Learning velocity: 4.1–4.6 step recovery (Hebbian) vs 17–24 steps (MLP) — 4–5× faster adaptation. Agent Registry & Routing — Updated (Section 7): Domain-locked Hebbian weight selection replaces generic composite-score routing. Two-stage process: ATP ActionType domain filter → Hebbian weight competition. Updated efficiency benchmark incorporating Hebbian marketplace numbers. Human Validators & Credential Marketplace — New Section (Section 13): Human validators return as credentialed reviewers with weighted influence. Credential-backed validation amplifies Hebbian updates in high-stakes domains. Validated output sharing as tradeable commodity within marketplace. Performance Monitoring — Expanded (Section 14): New Hebbian marketplace KPIs: routing accuracy, domain monopoly rate, mislabel tolerance, sentinel intervention rate, reconciliation agreement, corruption resilience, learning velocity, missing flow detection. New Prometheus metrics and alert thresholds for marketplace health.

15.3 Future Outlook

(Beyond v3, a brief note on anticipated directions.) Subsequent versions of the Artemis City Whitebook may delve into: Multi-instance scaling: Distributed Hebbian weight synchronization across multiple Artemis City deployments Advanced multi-agent collaboration protocols: Beyond the basic ATP, exploring agent negotiation and collaborative problem-solving within domain boundaries Reinforcement learning integration: Combining Hebbian morphological learning with RL techniques for long-horizon task optimization Federated validation: Cross-organization credential validation networks for the human validator marketplace Real-world deployment learnings: Production data from healthcare, utility, and regulatory compliance deployments Formal verification: Mathematical proofs of Hebbian convergence properties and marketplace stability guarantees Artemis City continues to evolve as a pioneering agentic infrastructure, and this Whitebook will evolve with it, maintaining the rigorous and visionary standard set by previous versions.
Last modified on March 14, 2026