HiveBoard — User Manual Part 5: Layer 2 Integration — Rich Events #
Version: 0.1.0 Last updated: 2026-02-12
LLM costs, plans, escalations, approvals, retries, issues — the full narrative of what your agents are thinking and why.
Table of Contents #
- What Layer 2 Gives You
- LLM Call Tracking
- Plans and Plan Steps
- Escalations
- Approvals
- Retries
- Issue Reporting
- Pipeline Enrichment
- Agent-Level Events (Outside Task Context)
- Cost Estimation
- What to Expect on the Dashboard
- Incremental Adoption Strategy
- Troubleshooting Layer 2
1. What Layer 2 Gives You #
Layer 1 told you what your agents are doing — tasks started, actions tracked, durations measured. Layer 2 tells you why they're doing it, how much it costs, and what went wrong in their reasoning.
Layer 2 is a collection of typed events that add narrative depth to the timeline. Each event type is independent — adopt them in any order, in any combination. There is no "all or nothing."
| Layer 2 event | What it answers | Dashboard impact |
|---|---|---|
task.llm_call() |
How much is each LLM call costing? What model was used? How many tokens? | Cost Explorer, purple timeline nodes, LLM/COST columns in Task Table |
task.plan() |
What plan did the agent create? | Plan progress bar above Timeline |
task.plan_step() |
Which step is it on? Which step failed? | Plan bar step indicators (green/blue/red/gray) |
task.escalate() |
When did the agent ask for help? | Amber nodes in Timeline, "human" filter in Activity Stream |
task.request_approval() |
What's waiting for human approval? | Agent WAITING badge, amber nodes |
task.approval_received() |
Was it approved or rejected? Who decided? | Green/red approval nodes in Timeline |
task.retry() |
How many times did it retry? Why? | Timeline branching, retry pattern visibility |
agent.report_issue() |
What persistent problems has the agent found? | Pipeline tab, issue badges on agent cards |
agent.queue_snapshot() |
How deep is the work queue? | Queue badges on agent cards, Pipeline tab |
agent.todo() |
What work items is the agent tracking? | Pipeline tab |
agent.scheduled() |
What recurring work is configured? | Pipeline tab |
The progression #
Layer 0: "Agent is alive"
Layer 1: "Agent is working on task X, step Y, for Z seconds"
Layer 2: "Agent called claude-sonnet with 1,500 tokens ($0.008), created a 4-step plan,
completed steps 1-3, failed on step 4 with a CRM permission error,
escalated to the sales team, and is now waiting for approval"
That's the difference. Layer 2 turns a timeline into a story.
2. LLM Call Tracking #
Priority: Highest. This is the single most valuable Layer 2 addition.
2.1 What it does #
task.llm_call() records a structured event for each LLM API call: which model, how many tokens, how much it cost, and how long it took. This data feeds the Cost Explorer, adds purple nodes to the Timeline, and fills in the LLM and COST columns in the Task Table.
2.2 Finding WHERE in your code — the catalog-first approach #
Before writing any instrumentation code, build a catalog of every LLM call site. This step saves significant time because — as real-world integration has shown — token extraction varies between call sites, even within the same codebase. You need to know what each site exposes before you can write correct instrumentation.
Step 1: Search for LLM client calls. Common patterns across frameworks:
# OpenAI / Azure OpenAI
response = client.chat.completions.create(model=..., messages=...)
# Anthropic
response = client.messages.create(model=..., messages=...)
# LiteLLM
response = litellm.completion(model=..., messages=...)
# LangChain
response = llm.invoke(prompt)
# Custom wrapper
response = my_llm_client.call(model=..., prompt=...)
Step 2: For each site, inspect the response and client objects. Add a temporary debug line after each call:
response = client.complete(...)
print(f"Response attrs: {[a for a in dir(response) if 'token' in a.lower() or 'usage' in a.lower()]}")
print(f"Client attrs: {[a for a in dir(client) if 'token' in a.lower() or 'usage' in a.lower()]}")
This reveals where each site stores token counts and model information.
Step 3: Build a table. Before writing any task.llm_call() code, document what you found:
| Site | File:line | Client method | Token source | Model source |
|---|---|---|---|---|
| 1 | loop.py:1275 |
client.complete_json() |
client._last_input_tokens |
client.model |
| 2 | loop.py:1432 |
client.complete_with_tools() |
response.usage.input_tokens |
response.model |
| 3 | reflection.py:367 |
client.complete_json() |
client._last_input_tokens |
client.model |
| 4 | planning.py:567 |
client.complete_json() |
client._last_input_tokens |
client.model |
| 5 | agent.py:199 |
haiku_client.complete() |
client._last_input_tokens |
client.model |
| 6 | context.py:319 |
client.complete() |
client._last_input_tokens |
client.model |
Notice the pattern: Sites 1, 3-6 all use the same complete_json() / complete() method family, which stores tokens on the client object (client._last_input_tokens). Site 2 uses complete_with_tools(), which returns tokens on the response object (response.usage.input_tokens). Same codebase, two different extraction patterns.
Step 4: Implement from the table. Now each task.llm_call() addition is mechanical — you know exactly where the tokens and model name come from.
This catalog-first approach typically takes 15-30 minutes and prevents the frustrating cycle of adding instrumentation, discovering it doesn't extract tokens correctly, debugging, fixing, and repeating for each site.
2.3 Adding task.llm_call() #
The structure is the same at every site — but the field extraction varies (see Section 2.6). Here's the general shape:
import time
from myproject.observability import get_current_task
# Measure timing
_llm_start = time.perf_counter()
# Existing LLM call (don't change this):
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=messages)
# After the call — add this:
_llm_elapsed = (time.perf_counter() - _llm_start) * 1000
_task = get_current_task()
if _task:
try:
_task.llm_call(
"descriptive_name", # what this call does
model=response.model, # or client.model — see Section 2.6
tokens_in=response.usage.input_tokens, # or client._last_input_tokens — see Section 2.6
tokens_out=response.usage.output_tokens, # or client._last_output_tokens — see Section 2.6
cost=0.008, # optional, USD
duration_ms=round(_llm_elapsed), # optional
prompt_preview=str(messages)[:500], # optional, for debugging
response_preview=str(response.content)[:500], # optional
metadata={"temperature": 0.7}, # optional
)
except Exception:
pass # never break agent for observability
Important: The tokens_in, tokens_out, and model fields above show one extraction pattern (response.usage.*). Your call site may use a different pattern (client._last_*, a config variable, etc.). Refer to your catalog (Section 2.2) and the extraction guide (Section 2.6) for the correct source at each site.
2.4 Parameter reference #
| Parameter | Type | Required | What it does |
|---|---|---|---|
name |
str | Yes | Label for the Timeline node. Use descriptive names: "phase1_reasoning", "score_lead", "generate_email". Not the model name. |
model |
str | Yes | Model identifier (e.g. "claude-sonnet-4-5-20250929"). Used for Cost Explorer grouping. |
tokens_in |
int | No | Input token count. Feeds Cost Explorer aggregation. |
tokens_out |
int | No | Output token count. Feeds Cost Explorer aggregation. |
cost |
float | No | Pre-calculated cost in USD. If absent, the call appears in timelines but is excluded from cost totals. |
duration_ms |
int | No | LLM API latency in milliseconds. Shown on the timeline connector. |
prompt_preview |
str | No | First ~500 chars of the prompt. For debugging in the detail panel. |
response_preview |
str | No | First ~500 chars of the response. For debugging in the detail panel. |
metadata |
dict | No | Arbitrary key-value pairs (temperature, top_p, caller info, etc.). |
Start minimal, add fields later. The only required fields are name and model. Everything else is optional and can be added incrementally:
# Minimum viable (still useful):
_task.llm_call("reasoning", model="claude-sonnet-4-5-20250929")
# Add tokens when you figure out the response shape:
_task.llm_call("reasoning", model="...", tokens_in=1500, tokens_out=200)
# Add cost when you build the cost helper:
_task.llm_call("reasoning", model="...", tokens_in=1500, tokens_out=200, cost=0.008)
# Add previews when you need to debug LLM behavior:
_task.llm_call("reasoning", model="...", prompt_preview=prompt[:500], response_preview=resp[:500])
2.5 Naming conventions #
Choose names that describe what the call does, not which model or API it uses:
| ✅ Good name | ❌ Bad name | Why |
|---|---|---|
"score_lead" |
"claude_call" |
Multiple calls may use Claude — the name should distinguish them |
"phase1_reasoning" |
"llm_call_1" |
Numbers don't tell you anything in the timeline |
"generate_email_draft" |
"anthropic_api" |
The model is already in the model field |
"context_compaction" |
"call" |
Too generic |
"reflection" |
"messages.create" |
The API method is implementation detail |
2.6 Extracting token usage — it's messier than you expect #
This is the step where most people lose time. Token extraction varies not just between SDKs, but between methods within the same SDK, and between the SDK and the wrapper your codebase uses on top of it. The catalog-first approach (Section 2.2) exists specifically to prevent this from becoming a per-site debugging session.
Pattern A: Tokens on the response object #
The cleanest pattern. The LLM SDK returns a response with usage data attached:
# Anthropic SDK (direct)
response.usage.input_tokens # int
response.usage.output_tokens # int
response.model # str
# OpenAI SDK (direct)
response.usage.prompt_tokens # int (note: different field name)
response.usage.completion_tokens # int
response.model # str
# LiteLLM
response.usage.prompt_tokens # int (follows OpenAI naming)
response.usage.completion_tokens # int
response.model # str
This works when you're calling the SDK directly. But most production codebases don't — they use a wrapper.
Pattern B: Tokens on the client object #
Many wrapper libraries store usage on the client instance after a call, not on the response:
# Common in custom wrappers:
response = client.complete_json(model="...", messages=...)
tokens_in = client._last_input_tokens # stored after the call
tokens_out = client._last_output_tokens # stored after the call
model_name = client.model # configured at init time
The underscore prefix (_last_input_tokens) means it's a private attribute — the wrapper author didn't consider it part of the public API. This is common. It works fine, but it may change without notice in wrapper updates.
Real-world example: In a 6-site integration, 5 out of 6 sites used client._last_input_tokens (Pattern B), while 1 site used response.usage.input_tokens (Pattern A) — because that particular client method (complete_with_tools()) had a different return type than the others (complete_json()). Same project, same LLM wrapper library, two extraction patterns.
Pattern C: Tokens in a callback or event #
Framework-level integrations (LangChain, CrewAI) often deliver token counts through callbacks rather than return values:
# LangChain
# Token counts come through the callback handler, not the response object.
# Use the HiveLoop framework integration instead of manual task.llm_call().
# CrewAI
# Similar — token tracking happens in the agent execution callback.
If you're using a framework integration, prefer the HiveLoop callback adapter over manual task.llm_call() — it handles extraction automatically.
Pattern D: Tokens not available #
Some wrappers don't expose usage at all. You have three options:
# Option 1: Send without tokens (the call still appears on Timeline)
_task.llm_call("reasoning", model="claude-sonnet-4-5-20250929")
# Option 2: Dig into the wrapper's internals
print(dir(response)) # look for usage, tokens, meta, _raw, etc.
print(dir(client)) # look for _last_*, _usage, _tokens, etc.
# Option 3: Estimate tokens from input/output length
# Rough heuristic: ~4 chars per token for English text
estimated_in = len(prompt_text) // 4
estimated_out = len(response_text) // 4
Option 1 is always safe. Option 2 usually finds something — most wrappers store usage somewhere, even if it's not documented. Option 3 is a last resort and only useful for rough cost estimation.
Finding where model name lives #
The model name also varies by pattern:
# On the response (Pattern A):
model_name = response.model
# On the client (Pattern B):
model_name = client.model # configured at client initialization
# Hardcoded (when you know the model won't change):
model_name = "claude-sonnet-4-5-20250929"
# From a config variable:
model_name = settings.LLM_MODEL
If different call sites use different models (e.g. Sonnet for reasoning, Haiku for summarization), the model name must come from the correct source at each site. Don't assume one model across all sites — this is exactly what the Cost Explorer's "by model" breakdown is designed to reveal.
Quick discovery recipe #
When you're not sure where tokens live for a given call site, run this once:
response = client.some_method(...)
# Check response:
for attr in dir(response):
if any(k in attr.lower() for k in ['token', 'usage', 'cost', 'model']):
print(f"response.{attr} = {getattr(response, attr, '?')}")
# Check client:
for attr in dir(client):
if any(k in attr.lower() for k in ['token', 'usage', 'cost', 'model', 'last']):
print(f"client.{attr} = {getattr(client, attr, '?')}")
Run this for each unique client method in your catalog (Section 2.2). Methods that share the same return type will have the same extraction pattern.
2.7 LLM calls outside a task context #
Some LLM calls happen outside of any task — startup routines, background maintenance, heartbeat summaries. For these, get_current_task() returns None, so task.llm_call() is skipped.
If you want to track these costs anyway, use the agent-level method:
_task = get_current_task()
if _task:
_task.llm_call("heartbeat_summary", model=model_name, ...)
elif hiveloop_agent:
hiveloop_agent.llm_call("heartbeat_summary", model=model_name, ...)
Agent-level LLM calls appear in the Cost Explorer (aggregated under the agent) but not on any task timeline.
3. Plans and Plan Steps #
3.1 What it does #
If your agent creates execution plans (multi-step strategies), task.plan() and task.plan_step() make those plans visible on the dashboard. A progress bar appears above the Timeline showing each step's status.
3.2 Finding WHERE in your code #
Look for code that:
- Creates a list of steps, phases, or stages
- Iterates through a strategy
- Tracks progress through sequential operations
- Uses words like "plan", "strategy", "pipeline", "workflow", "steps"
3.3 Adding plan tracking #
At plan creation:
_task = get_current_task()
if _task:
try:
_task.plan(
"Process and route incoming lead", # goal — what the plan aims to achieve
[ # steps — ordered list of descriptions
"Search CRM for existing record",
"Score lead based on criteria",
"Generate follow-up email",
"Update CRM with outcome",
],
)
except Exception:
pass
As each step progresses:
# When step starts:
if _task:
try:
_task.plan_step(step_index=0, action="started", summary="Searching CRM")
except Exception:
pass
# ... step executes ...
# When step completes:
if _task:
try:
_task.plan_step(
step_index=0,
action="completed",
summary="Found existing CRM record",
turns=2, # optional — LLM turns spent
tokens=3200, # optional — tokens spent
)
except Exception:
pass
# If step fails:
if _task:
try:
_task.plan_step(step_index=2, action="failed", summary="Email API returned 403")
except Exception:
pass
3.4 Parameter reference #
task.plan():
| Parameter | Type | Required | Description |
|---|---|---|---|
goal |
str | Yes | What the plan aims to achieve |
steps |
list[str] | Yes | Ordered step descriptions |
revision |
int | No | Plan revision number. Default: 0. Increment on replan |
task.plan_step():
| Parameter | Type | Required | Description |
|---|---|---|---|
step_index |
int | Yes | Zero-based step position |
action |
str | Yes | "started", "completed", "failed", "skipped" |
summary |
str | Yes | Description or outcome note |
total_steps |
int | No | Auto-inferred from task.plan() if previously called |
turns |
int | No | LLM turns spent (on completion/failure) |
tokens |
int | No | Tokens spent (on completion/failure) |
plan_revision |
int | No | Correlates with task.plan() revision |
3.5 Replanning #
If the agent changes its plan mid-task, call task.plan() again with an incremented revision:
_task.plan("Revised: Route to manual review", ["Notify manager", "Queue for review"], revision=1)
The dashboard shows the latest plan. Previous plan events remain in the timeline for the full history.
4. Escalations #
4.1 What it does #
task.escalate() records when an agent decides it cannot handle something alone and hands it off — to a human, another team, or another agent.
4.2 Finding WHERE in your code #
Look for code that:
- Sends alerts or notifications to humans
- Transfers work to another agent or queue
- Decides "I can't handle this" and routes elsewhere
- Logs a warning that requires human attention
4.3 Adding escalation tracking #
_task = get_current_task()
if _task:
try:
_task.escalate(
reason="Lead score below threshold (0.2) — needs manual review",
assigned_to="sales-team", # optional — who receives the escalation
)
except Exception:
pass
| Parameter | Type | Required | Description |
|---|---|---|---|
reason |
str | Yes | Why the agent escalated |
assigned_to |
str | No | Who or what receives the escalation |
4.4 Dashboard impact #
- Amber node in the Timeline labeled with the reason
escalatedevent in the Activity Stream- Visible under the "human" stream filter
5. Approvals #
5.1 What it does #
task.request_approval() and task.approval_received() track the human-in-the-loop approval workflow. The agent asks for permission, waits, and records the decision.
5.2 Finding WHERE in your code #
Look for code that:
- Pauses execution waiting for human input
- Sends approval requests to a queue, Slack, email, or UI
- Checks for approval status in a loop or callback
- Has "pending", "approved", "rejected" states
5.3 Adding approval tracking #
When requesting approval:
_task = get_current_task()
if _task:
try:
_task.request_approval(
approver="ops-queue", # who should approve
reason="Contract emails require human review", # why
)
except Exception:
pass
# Agent enters waiting state...
When approval is received:
if _task:
try:
_task.approval_received(
approved_by="jane@acme.com", # who approved/rejected
decision="approved", # "approved" or "rejected"
)
except Exception:
pass
# Agent continues (or handles rejection)...
5.4 Parameter reference #
task.request_approval():
| Parameter | Type | Required | Description |
|---|---|---|---|
approver |
str | Yes | Who should approve (person, queue, team) |
reason |
str | No | What needs approval and why |
task.approval_received():
| Parameter | Type | Required | Description |
|---|---|---|---|
approved_by |
str | Yes | Who made the decision |
decision |
str | Yes | "approved" or "rejected" |
5.5 Dashboard impact #
- Agent badge changes to WAITING (amber) after
request_approval() - Agent badge returns to PROCESSING after
approval_received() - Waiting count in Stats Ribbon increments/decrements
- Amber (request) and green/red (decision) nodes in Timeline
- Visible under the "human" stream filter
- If approvals pile up (Waiting count stays high), your review process is a bottleneck
6. Retries #
6.1 What it does #
task.retry() records when an agent retries a failed operation — rate limits, transient errors, timeouts. This makes retry patterns visible on the timeline.
6.2 Finding WHERE in your code #
Look for code that:
- Catches exceptions and retries
- Has
for attempt in range(max_retries):loops - Uses backoff libraries (
tenacity,backoff) - Has retry counters or sleep-between-attempts patterns
6.3 Adding retry tracking #
for attempt in range(max_retries):
try:
result = call_external_api()
break
except RateLimitError as e:
_task = get_current_task()
if _task:
try:
_task.retry(
attempt=attempt + 1,
reason=f"Rate limited: {e}",
backoff_seconds=2 ** attempt, # optional
)
except Exception:
pass
time.sleep(2 ** attempt)
| Parameter | Type | Required | Description |
|---|---|---|---|
attempt |
int | Yes | Attempt number (1-based) |
reason |
str | Yes | Why the retry happened |
backoff_seconds |
float | No | How long before the next attempt |
6.4 Dashboard impact #
- Retry nodes appear in the Timeline, showing how many attempts were needed
- Timeline branching shows the retry path
- Helps identify: Are retries common? Which operations trigger them? How much time is lost to retries?
7. Issue Reporting #
7.1 What it does #
agent.report_issue() lets agents self-report persistent problems — not task failures (those are tracked automatically), but ongoing issues like API permission errors, data quality degradation, or connectivity problems.
7.2 Finding WHERE in your code #
Look for code that:
- Logs warnings about external service problems
- Detects degraded conditions (slow responses, partial failures)
- Catches errors that don't fail the task but indicate a problem
- Has "circuit breaker" or "health check" patterns
7.3 Adding issue reporting #
# When the agent detects a persistent problem:
try:
hiveloop_agent.report_issue(
summary="CRM API returning 403 for workspace queries",
severity="high", # "critical", "high", "medium", "low"
category="permissions", # optional classification
context={ # optional debugging info
"tool": "crm_search",
"error_code": 403,
"last_seen": "2026-02-11T14:30:00Z",
},
issue_id="issue_crm_403", # optional — enables explicit tracking
occurrence_count=3, # optional — how many times so far
)
except Exception:
pass
# When the issue is resolved:
try:
hiveloop_agent.resolve_issue(
summary="CRM API returning 403 for workspace queries",
issue_id="issue_crm_403",
)
except Exception:
pass
7.4 Parameter reference #
agent.report_issue():
| Parameter | Type | Required | Description |
|---|---|---|---|
summary |
str | Yes | Issue title. Used for deduplication if no issue_id |
severity |
str | Yes | "critical", "high", "medium", "low" |
category |
str | No | "permissions", "connectivity", "configuration", "data_quality", "rate_limit", "other" |
context |
dict | No | Arbitrary debugging context |
issue_id |
str | No | Stable identifier for lifecycle tracking. If omitted, server deduplicates by summary hash |
occurrence_count |
int | No | Agent-tracked count of occurrences |
agent.resolve_issue():
| Parameter | Type | Required | Description |
|---|---|---|---|
summary |
str | Yes | Must match the original report (or use issue_id) |
issue_id |
str | No | If provided on report, use the same ID |
7.5 Dashboard impact #
- Red issue badge on the agent card in The Hive (e.g. "● 1 issue")
- Issues table in the Agent Detail → Pipeline tab with severity, category, and occurrence count
issueevents in the Activity Stream- Issues persist until explicitly resolved — they don't auto-clear
7.6 Issues vs. task failures #
| Concept | Example | HiveLoop feature |
|---|---|---|
| Task failure | "This specific task threw an exception" | Automatic — agent.task() catches it |
| Issue | "CRM API has been returning 403 for the last hour" | Manual — agent.report_issue() |
Task failures are per-task and transient. Issues are per-agent and persistent. A task can fail without an issue (transient error), and an issue can exist without any task failing (degraded performance).
8. Pipeline Enrichment #
Pipeline events give the dashboard visibility into the agent's operational context beyond individual tasks — its work queue, tracked work items, and scheduled recurring work.
8.1 Queue snapshots #
Report the current state of the agent's work queue:
hiveloop_agent.queue_snapshot(
depth=4, # items currently queued
oldest_age_seconds=120, # how old the oldest item is
items=[ # optional — individual items
{"id": "evt_001", "priority": "high", "source": "human",
"summary": "Review contract draft", "queued_at": "2026-02-11T14:28:00Z"},
{"id": "evt_002", "priority": "normal", "source": "webhook",
"summary": "Process CRM update", "queued_at": "2026-02-11T14:29:00Z"},
],
processing={"id": "evt_003", "summary": "Sending email", # optional — what's being processed now
"started_at": "2026-02-11T14:29:30Z", "elapsed_ms": 4500},
)
Dashboard impact: Queue depth badge on the agent card (e.g. "Q:4"), Queue section in Pipeline tab.
Best practice: Call queue_snapshot() on a periodic schedule (e.g. every heartbeat) rather than on every queue change. The queue_provider callback on hb.agent() automates this — it's called every heartbeat cycle:
def my_queue_provider():
return {"depth": len(my_queue), "items": [...]}
agent = hb.agent("my-agent", queue_provider=my_queue_provider)
8.2 TODOs #
Track work items the agent is managing:
# Created
hiveloop_agent.todo("todo_001", action="created", summary="Follow up with client", priority="high")
# Completed
hiveloop_agent.todo("todo_001", action="completed", summary="Follow up with client")
# Other actions: "failed", "dismissed", "deferred"
Dashboard impact: Active TODOs table in Pipeline tab.
8.3 Scheduled work #
Report recurring work the agent is configured to perform:
hiveloop_agent.scheduled(items=[
{"name": "CRM sync", "next_run": "2026-02-12T15:00:00Z", "interval": "1h", "status": "active"},
{"name": "Report generation", "next_run": "2026-02-13T09:00:00Z", "interval": "daily", "status": "active"},
])
Dashboard impact: Scheduled work table in Pipeline tab.
9. Agent-Level Events (Outside Task Context) #
Most Layer 2 events happen inside a task (task.llm_call(), task.plan(), etc.). But some events are agent-level — they happen independently of any task.
9.1 Agent-level methods #
| Method | When to use | Requires task context? |
|---|---|---|
agent.llm_call() |
LLM calls during startup, background maintenance, heartbeat generation | No |
agent.report_issue() |
Persistent problems detected outside task execution | No |
agent.resolve_issue() |
Previously reported issue resolved | No |
agent.queue_snapshot() |
Queue state reporting (typically on heartbeat) | No |
agent.todo() |
Work item lifecycle | No |
agent.scheduled() |
Recurring work configuration | No |
agent.event() |
Any custom agent-level event | No |
9.2 The pattern: task-level with agent-level fallback #
For events that might happen either inside or outside a task:
_task = get_current_task()
if _task:
# Preferred — event is attributed to the task and its project
_task.llm_call("reasoning", model=model_name, tokens_in=N, tokens_out=N)
elif hiveloop_agent:
# Fallback — event is agent-level, no task/project attribution
hiveloop_agent.llm_call("reasoning", model=model_name, tokens_in=N, tokens_out=N)
Agent-level events appear in the Cost Explorer and Activity Stream but not on any task timeline.
10. Cost Estimation #
10.1 Why cost matters #
LLM costs can be invisible and explosive. An agent that runs smoothly can quietly spend $40/hour if it's using an expensive model with large prompts. task.llm_call() with cost tracking surfaces this — per call, per task, per agent, per model.
10.2 Three approaches to cost #
Approach A — Don't calculate cost (simplest):
Just send model, tokens_in, tokens_out. The dashboard shows token counts and groups by model. You can calculate cost yourself from the model tables.
_task.llm_call("reasoning", model="claude-sonnet-4-5-20250929", tokens_in=1500, tokens_out=200)
Approach B — Cost helper function:
Build a lookup table and calculate cost locally:
COST_PER_MILLION = {
"claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
"claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
"claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
}
def estimate_cost(model: str, tokens_in: int, tokens_out: int) -> float | None:
rates = COST_PER_MILLION.get(model)
if not rates:
return None
return (tokens_in * rates["input"] / 1_000_000) + (tokens_out * rates["output"] / 1_000_000)
Usage:
_task.llm_call(
"reasoning",
model=model_name,
tokens_in=tokens_in,
tokens_out=tokens_out,
cost=estimate_cost(model_name, tokens_in, tokens_out),
)
Approach C — Use LLM client's cost reporting:
Some LLM clients report cost directly:
# LiteLLM
response._hidden_params.get("response_cost")
# Custom wrappers may expose .cost or .total_cost
If your client reports cost, use it directly — it's more accurate than a lookup table.
10.3 Keeping the cost table updated #
LLM pricing changes. When it does, update your COST_PER_MILLION table. Old events keep their recorded cost (it's stored per-event, not recalculated). New events use the updated rates.
If you don't want to maintain a cost table, Approach A (tokens only, no cost) is perfectly fine. The Cost Explorer still shows token counts and call counts grouped by model — you can calculate spend externally.
11. What to Expect on the Dashboard #
11.1 After adding task.llm_call() #
| Dashboard element | Change |
|---|---|
| Task Table | LLM column shows call count (e.g. "◆ 3"). COST column shows dollar amount |
| Timeline | Purple nodes appear for each LLM call, with model badge above the node |
| Timeline detail | Click an LLM node → shows model, tokens, cost, duration, and previews (if sent) |
| Cost Explorer | Fully functional — Cost by Model table, Cost by Agent table, Cost Ribbon totals |
| Stats Ribbon | Cost (1h) shows dollar amount |
| Mini-Charts | LLM Cost/Task chart populates |
| Activity Stream | "llm" filter shows every LLM call |
11.2 After adding task.plan() + task.plan_step() #
| Dashboard element | Change |
|---|---|
| Timeline | Plan progress bar appears above the timeline track |
| Plan bar | Each step is a segment: gray (not started), blue (in progress), green (completed), red (failed) |
| Plan bar hover | Hover a segment to see the step description |
11.3 After adding escalations and approvals #
| Dashboard element | Change |
|---|---|
| Agent card | WAITING badge when approval is pending |
| Stats Ribbon | Waiting count increments |
| Timeline | Amber escalation nodes, approval request/decision nodes |
| Activity Stream | "human" filter shows escalations and approvals |
11.4 After adding issue reporting #
| Dashboard element | Change |
|---|---|
| Agent card | Red issue badge (e.g. "● 1 issue") |
| Pipeline tab | Active Issues table with severity, category, occurrences |
| Activity Stream | Issue events appear with warning icons |
11.5 After adding pipeline enrichment #
| Dashboard element | Change |
|---|---|
| Agent card | Queue depth badge (e.g. "Q:4", amber if >5) |
| Pipeline tab | Queue, TODOs, and Scheduled sections populate |
12. Incremental Adoption Strategy #
You don't need to implement all of Layer 2 at once. Here's the recommended order, based on value per effort:
Tier 1 — High value, low effort (do these first) #
| Event | Effort | Why it's high value |
|---|---|---|
task.llm_call() |
2-5 lines per LLM call site | Unlocks the entire Cost Explorer. Answers "how much is this costing me?" |
agent.report_issue() |
1-3 lines per detection point | Surfaces persistent problems that silent monitoring would miss |
Tier 2 — High value, medium effort #
| Event | Effort | Why |
|---|---|---|
task.plan() + task.plan_step() |
5-10 lines at plan creation + 2 lines per step transition | Visual progress tracking. Answers "where in the plan did it fail?" |
task.escalate() |
1-2 lines per escalation point | Answers "when does the agent need human help?" |
Tier 3 — Medium value, depends on architecture #
| Event | Effort | Why |
|---|---|---|
task.request_approval() + task.approval_received() |
2-4 lines per approval workflow | Only valuable if you have human-in-the-loop approval flows |
task.retry() |
2-3 lines per retry loop | Only valuable if retries are common and you need to diagnose patterns |
agent.queue_snapshot() |
5-10 lines + queue access | Only valuable if your agent has a work queue |
Tier 4 — Nice to have #
| Event | Effort | Why |
|---|---|---|
agent.todo() |
1-2 lines per todo lifecycle event | Work item tracking — useful for complex agents |
agent.scheduled() |
1-2 lines at configuration time | Scheduled work visibility — useful for cron-based agents |
The practical path #
- Add
task.llm_call()to your 2-3 most important LLM call sites → validate on Cost Explorer - Add
agent.report_issue()where you have error handling → check Pipeline tab - Add plans if your agent creates them → watch the plan progress bar
- Add escalation/approval tracking if you have human-in-the-loop → monitor the Waiting count
- Add pipeline enrichment last → agent cards get richer
At each step, check the dashboard to confirm the new data appears before moving on.
13. Troubleshooting Layer 2 #
13.1 LLM calls not appearing on Timeline #
Symptom: You added task.llm_call() but no purple nodes appear.
Possible causes:
get_current_task()returnsNone— the LLM call is happening outside a task context. Check thatagent.task()wraps the call site andset_current_task()has been called.- The
try/exceptis swallowing an error. Temporarily remove thetry/except, trigger a task, and check for exceptions. - Token fields are wrong type.
tokens_inandtokens_outmust be integers, not strings.costmust be a float.
13.2 Cost Explorer shows zero despite LLM calls appearing #
Symptom: Purple nodes appear in Timeline, but Cost Explorer shows $0.00.
Cause: You're sending model and name but not tokens_in, tokens_out, or cost. The Cost Explorer aggregates cost and token data — if those fields are None, the call appears in the timeline (which only needs name + model) but has nothing to aggregate for cost.
Fix: Add token counts. Even without cost, tokens_in and tokens_out populate the "Tokens In" and "Tokens Out" columns in the Cost Explorer.
13.3 Token counts are always zero or None despite sending them #
Symptom: You're passing tokens_in and tokens_out but the values are always 0 or None on the dashboard.
Cause: You're extracting tokens from the wrong place. This is the most common integration mistake. Different client methods in the same codebase often expose tokens differently:
- Method A might put tokens on the response:
response.usage.input_tokens - Method B might put tokens on the client:
client._last_input_tokens - Method C might not expose tokens at all
If you're using the Pattern A extraction (response.usage.*) on a call site that uses Pattern B (client._last_*), you'll get AttributeError (caught by your try/except, so the call silently sends without tokens) or None.
Fix: Go back to the catalog (Section 2.2) and verify the token source for the specific call site. Use the discovery recipe in Section 2.6 to inspect what each response and client object actually exposes.
13.4 Plan progress bar not visible #
Symptom: You called task.plan() but no progress bar appears above the Timeline.
Possible causes:
- The plan event was emitted but the Timeline is showing a different task. Click the task that has the plan in the Task Table.
- The plan event was emitted outside the task context (
get_current_task()wasNone). stepsparameter was empty. The plan needs at least one step to render.
13.5 Issues not clearing from Pipeline tab #
Symptom: You called agent.resolve_issue() but the issue still appears.
Cause: The summary or issue_id in resolve_issue() doesn't match the original report_issue() call. Deduplication is hash-based — the strings must match exactly.
Fix: Use issue_id for explicit lifecycle tracking. It's more reliable than matching summary strings.
13.6 Agent card shows queue badge but Pipeline tab queue is empty #
Symptom: Agent card shows "Q:4" but the Pipeline → Queue section says empty.
Cause: The queue_snapshot() sent depth=4 but no items array. The badge uses depth (just a number), but the Pipeline table needs items (the actual queue entries).
Fix: Include the items array in your queue_snapshot() call for full Pipeline tab rendering.
13.7 Events not appearing in Activity Stream filters #
Symptom: LLM calls exist but the "llm" filter shows nothing. Or escalations exist but "human" filter is empty.
Cause: The stream filter maps event types to categories. Custom events (which is what task.llm_call() emits internally — event_type: "custom" with payload.kind: "llm_call") need the dashboard to map the kind field to the correct filter.
Fix: Verify the dashboard version supports kind-based filtering. If the "llm" filter expects a specific event structure, check that task.llm_call() is producing the expected payload shape.