acme-corp
Connected
Updated 3s ago
The Glance — Fleet Status at a Glance ~2 seconds
Fleet Status Q1
2
1
1
1
5 agents total · 2 processing · 1 idle · 1 waiting · 1 stuck
Needs Attention Q2
3
1 stuck · 1 error
2 active issues · 1 waiting approval
Stuck Agents Q3
1 stuck
doc-processor-02 — stuck for 4m12s (threshold: 120s)
Task: task-abc-789 Investigate →
Throughput Q4a
12.4 tasks/hr ↑ 8%
Success Rate Q4b
94.2% ↑ 2.1%
Errors Q4c
7 in 24h ↓ 3
LLM Cost/Task Q4d
$0.08 ↓ 12%
Live Activity LIVE
2s ago support-triage-01: Task task-xyz-456 completed — routed ticket to billing team
18s ago support-triage-01: LLM call classify_intent using gpt-4o (1.2K tokens)
1m ago doc-processor-02: Action extract_tables failed — timeout after 30s
3m ago doc-processor-02: Issue reported — PDF parsing intermittent failure (×4)
The Investigation — Agent & Task Deep Dive ~2–5 min
Click any agent name above to auto-select
Current Status Q6
processing
Task: task-xyz-456
Type: support_ticket
Elapsed: 1m 42s
Last 1h: 4 completed, 1 failed
Heartbeat Q7
Healthy
Last beat: 12s ago
Threshold: 120s
12s / 120s
Queue Q8
2 pending
PriSummaryAge
highBilling dispute #44212m
normalPassword reset #442245s
Issues Q9
1 active issue
medium api_reliability: Stripe API intermittent 503 (×6)
Recurring
Recent Tasks (Q10–Q16)
Task IDTypeStatusDurationCostErrorsActions
task-xyz-456 support_ticket processing 1m42s $0.12 0 View Timeline →
task-xyz-455 support_ticket completed 2m15s $0.09 0 View Timeline →
task-xyz-454 support_ticket failed 45s $0.04 2 View Errors →
The Optimization — Cost & Reliability ~10–15 min
Total Cost (24h) Q17
$4.82
147 LLM calls · 312K tokens in · 48K tokens out
By Agent
support-triage-01
$2.99
92 calls
doc-processor-02
$1.83
55 calls
Model Efficiency Q18
ModelCallsAvg $/Call$/1K TokVerdict
gpt-4o89$0.042$0.018 Efficient
gpt-4-turbo38$0.031$0.025 Review
claude-3-haiku20$0.003$0.002 Efficient
💡 gpt-4-turbo costs 1.4× more per token than gpt-4o for the same classify_intent calls
Cost Timeline Q19
$0.60 $0.30 $0.00 00:00 12:00 now
⚡ Spike at 12:00 — 18 calls, 42K tokens (vs avg 8 calls/bucket)
Agent Cost Comparison Q21
Group: support_agent (2 agents)
support-triage-01
$2.99
+37%
doc-processor-02
$1.83
avg
No outliers detected within same agent types
Prompt Bloat Analysis Q20
Fleet avg in/out ratio: 6.5:1 No bloat
0 calls exceeded threshold (ratio >10 AND tokens_in >5,000). Prompts look healthy.
Smart Detectors (Q22–Q28)
Silent Drop Detection
Q22
No silent drops detected. All queued items are within normal processing time.
Status/Queue Contradiction
Q23
1 contradiction found
AgentStatusQueueSeverity
doc-processor-02 idle 2 items medium
💡 Agent may need restart or queue processor check
Recurring Failures
Q24
1 recurring issue across agents
CategoryAgent×CountSeverity
api_reliability doc-processor-02 6 medium
Heartbeat Drift
Q25
Last 50 heartbeats analyzed. Payload structure is stable — no drift detected.
Approval Queue
Q26
1 agent waiting for approval
AgentTaskWaiting
support-triage-01 task-xyz-450 12m 30s
Action Failure Patterns
Q27
1 action with high failure rate
ActionFailOKRateAgents
extract_tables 8 14 36% 1
Unresolved Issues
Q28
1 chronic unresolved issue
medium Chronic api_reliability: Stripe API 503 (×6) — doc-processor-02
The Review — Trends & Fleet Health ~20–30 min
Success Rate Trend Q29
94.2% ↑ 2.1% from 92.1%
100% 90% 80% 00:00 now
Duration Trend Q30
2m 08s ↓ 15s faster
4m 2m 0m 00:00 now
Fastest: 32s · Slowest: 4m 10s
Agent Reliability Ranking Q31
support-triage-01
96%
48/50
doc-processor-02
87%
26/30
Deploy Comparison Q32
AgentVersionSuccess BeforeSuccess AfterΔ
support-triage-01 1.0 → 1.1 91% 96% +5%
Based on agent_version changes in registration events
Total Infrastructure Cost Q33
$4.82
24h · 147 calls · 360K total tokens
Reported: $3.91 · Estimated: ~$0.91
Cost per Task Trend Q34
$0.06 avg/task ↓ 18% cheaper
$0.09 $0.06
Most expensive bucket: $0.14 at 02:00
ROI Calculator Q35
Enter your pre-observability baselines to calculate ROI
$355/mo
Cost Savings
52s
Time Saved/Task
9.2%
Error Reduction
$4,260/yr
Projected Annual
Fleet Size Q36
5 agents
By environment: production=3, staging=2
By type: support_agent=3, doc_processor=2
+1 new agent in last 24h
Fleet Health Q37
72 / 100
DEGRADED
Stuck: −15 · Error: −10 · Issues: −3
Scale Readiness Q38
Success rate ≥ 90%
94.2%
No stuck agents
1 stuck
Queue manageable
2 / 15
Cost stable
↓ 18%
No critical issues
2 active
Error rate < 5%
5.8%
Heartbeats healthy
1 stale
⚠ Address 3 items before scaling