10. Why HALT & Rate Limiting Exist — Token Economics

⚠️ IMPORTANT: All features are experimental, under active development. Use at your own risk. Customization to your workflow required. © 2026 GLG, a.s. | ← Back to Index

10. Why HALT & Rate Limiting Exist — Token Economics

10.1 The Problem: Multi-Agent Token Waste

Without coordination controls, multi-agent teams waste tokens dramatically:

Scenario	Without Controls	With HALT + CLAIM	Savings
2 agents answer same question	2× tokens	1× (leader assigns one)	50%
Agent works during strategy change	Wasted work + redo	HALT stops immediately	100% of wasted
3 agents edit same file	Merge conflicts + fixes	CLAIM prevents overlap	60-80%
Agent loads full context every recall	50K tokens per query	Focus Engine: 2K tokens	96%
5 agents on cloud model idle	5× heartbeat/compaction cost	Local models for idle agents	80%

Real example from our production team: - 5 agents, shared Anthropic API key - Without HALT: agents kept working during Pavel's "moment" → wasted ~$15 in tokens - With HALT: "STOP" in chat → all agents pause within 30 seconds → $0 waste

10.2 HALT — When to Use

# Owner says "moment" or "počkej" in chat → bridge detects automatically
# Supported keywords: stop, halt, počkej, moment, wait, zastav

# Programmatic HALT:
requests.post(f"{COORD}/halt", json={
    "target": "coder",                    # specific agent
    "reason": "Reviewing architecture — don't code yet"
})

# HALT all agents:
requests.post(f"{COORD}/halt", json={
    "target": "*",
    "reason": "Strategy meeting — everyone stop"
})

# Agent checks if halted:
if coord.is_halted("coder"):
    # Don't start any work, wait for resolution
    pass

10.3 CLAIM — Prevent Duplicate Work

# Before editing a file:
coord.claim(agent="coder", scope="src/api.py", reason="Adding /v2/recall")

# Before starting a task:
coord.claim(agent="researcher", scope="task/market-analysis",
            reason="Researching competitor pricing")

# Check before acting:
status = coord.check_scope("src/api.py")
if status["claimed"] and status["agent"] != "me":
    # Someone else is working on it — wait or pick another task
    pass

# After finishing:
coord.release(agent="coder", scope="src/api.py")

10.4 Rate Limiting — Control Token Burn

# Input filter rate limiting prevents memory flooding:
config = {
    "rate_limit_per_min": 50,     # max 50 entries per minute
    "max_entry_tokens": 1500,      # reject oversized entries
}

# For recall — always use token budget:
context = uaml.recall("query", budget_tokens=800)  # never unbounded!