⚠️ IMPORTANT: All features are experimental, under active development. Use at your own risk. Customization to your workflow required. © 2026 GLG, a.s. | ← Back to Index
10. Why HALT & Rate Limiting Exist — Token Economics
10.1 The Problem: Multi-Agent Token Waste
Without coordination controls, multi-agent teams waste tokens dramatically:
| Scenario | Without Controls | With HALT + CLAIM | Savings |
|---|---|---|---|
| 2 agents answer same question | 2× tokens | 1× (leader assigns one) | 50% |
| Agent works during strategy change | Wasted work + redo | HALT stops immediately | 100% of wasted |
| 3 agents edit same file | Merge conflicts + fixes | CLAIM prevents overlap | 60-80% |
| Agent loads full context every recall | 50K tokens per query | Focus Engine: 2K tokens | 96% |
| 5 agents on cloud model idle | 5× heartbeat/compaction cost | Local models for idle agents | 80% |
Real example from our production team: - 5 agents, shared Anthropic API key - Without HALT: agents kept working during Pavel's "moment" → wasted ~$15 in tokens - With HALT: "STOP" in chat → all agents pause within 30 seconds → $0 waste
10.2 HALT — When to Use
# Owner says "moment" or "počkej" in chat → bridge detects automatically
# Supported keywords: stop, halt, počkej, moment, wait, zastav
# Programmatic HALT:
requests.post(f"{COORD}/halt", json={
"target": "coder", # specific agent
"reason": "Reviewing architecture — don't code yet"
})
# HALT all agents:
requests.post(f"{COORD}/halt", json={
"target": "*",
"reason": "Strategy meeting — everyone stop"
})
# Agent checks if halted:
if coord.is_halted("coder"):
# Don't start any work, wait for resolution
pass
10.3 CLAIM — Prevent Duplicate Work
# Before editing a file:
coord.claim(agent="coder", scope="src/api.py", reason="Adding /v2/recall")
# Before starting a task:
coord.claim(agent="researcher", scope="task/market-analysis",
reason="Researching competitor pricing")
# Check before acting:
status = coord.check_scope("src/api.py")
if status["claimed"] and status["agent"] != "me":
# Someone else is working on it — wait or pick another task
pass
# After finishing:
coord.release(agent="coder", scope="src/api.py")
10.4 Rate Limiting — Control Token Burn
# Input filter rate limiting prevents memory flooding:
config = {
"rate_limit_per_min": 50, # max 50 entries per minute
"max_entry_tokens": 1500, # reject oversized entries
}
# For recall — always use token budget:
context = uaml.recall("query", budget_tokens=800) # never unbounded!