0004 - Taskbase Agent Module¶
Status¶
Superseded by ADR-0005 — Taskbase Agent System on 2026-05-02
Date¶
2026-04-02 (original) · 2026-04-18 (revised) · 2026-05-02 (superseded)
Superseded. ADR-0005 replaces this design with per-organization agent personas in Git, a runtime-agnostic brain, and a pure priority-dispatch model — no agent instances, no
session_idresumption, no brain-managed task lifecycle states. The content below is preserved as the historical 2026-04-18 decision; the live design is in ADR-0005.
Revision Note (2026-04-18)¶
This ADR was originally accepted on 2026-04-02 with Option B — Local MCP server as the chosen design: a persistent MCP process on the Mac Mini exposing typed tools (get_next_task, log_progress, etc.) to a long-lived Cowork session, authenticating to Anthropic via API key.
After running this design we revised it. The chosen option is now Option D — FastAPI executor with ephemeral claude -p subprocesses, recorded below. The drivers for the change:
- Cost model — moving to the Anthropic Max 20x plan eliminates per-token billing but requires OAuth (not an API key), and the
claude -pCLI is the natural way to inherit that OAuth session - Per-task isolation — runaway tool calls and context pollution were leaking between tasks in a long-lived MCP session; ephemeral subprocesses give clean isolation per task
- Browser automation — Claude-in-Chrome was added to the worker capability set; it works most cleanly when each task gets a fresh tab against a persistent logged-in Chrome
- Dispatch direction — pull-based dispatch (worker asks the brain for work) made retry, prioritization, and cancellation awkward; push-based dispatch (brain tells the executor what to run) puts those concerns in one place
- Observability —
stream-jsonfromclaude -pcarries far richer telemetry (thoughts, tool_use, tool_result, tokens, cost) than the discretereport_tokens/log_progresscalls Option B used - Persistent agent identity — fully fresh sessions per task lose accumulated repo/domain knowledge between tasks; we want named agents that keep a conversation across tasks, while still spawning a fresh subprocess each time for process isolation
The revision splits agents into types (reusable templates: system prompt + skill/tool/MCP allowlist, defined in taskbase/agents/<slug>/) and instances (concrete workers with a name, scope, and a persistent Claude session-id). Tasks assign to instances via the existing assignee_type=2 mechanism. The executor still spawns a fresh claude -p subprocess per task, but uses --resume <instance.session_id> so the worker continues the instance's ongoing conversation.
Options A, B, and C below are kept for historical context. Option D is the current design and is reflected in agent-architecture.md.
Context¶
ADR 0002 established the taskbase system: a Kubernetes-hosted task management app with a REST API that lets an agent pick up work, report progress, and track token consumption. That ADR defined the server side. This ADR defines the agent side — the module that runs on the agent machine and drives the interaction with the taskbase API autonomously.
The requirements for this module are:
- Autonomous task pickup — the agent should be able to start a new session, call the taskbase API, and receive the next task without a human typing anything
- Token consumption tracking — every Claude API call made while working a task emits a
usageobject; the module must accumulate these and report them back to taskbase after each interaction step and on task completion - Task lifecycle signalling — the agent must transition tasks through
pending → in_progress → done / pausedby calling the taskbase API at each phase boundary - Context budget awareness — when remaining token budget approaches a configurable threshold, the agent should checkpoint the current task (log a summary, mark it
paused), and pull the next task into a fresh context - Runs locally on the agent machine — the module lives on the Mac mini alongside Cowork, not in the Kubernetes cluster; it is the agent's interface to the taskbase system
Decision Drivers¶
- Low coupling — the module should wrap the taskbase REST API without tightly coupling to its internal implementation; if endpoints change, only the module needs updating
- Native tool interface — Claude works best when capabilities are presented as structured tools, not prose instructions; the module should expose named tools that Claude can call directly
- Zero manual steps — once a Cowork session starts, no human input should be required to pick up and begin executing the next queued task
- Auditability — every tool call the agent makes against taskbase (task pickup, log entries, token reports, status transitions) must be traceable in the taskbase management UI
- Token accuracy — token counts must come from the authoritative source: the
usagefield on Claude API responses, not estimates or scraping
Considered Options¶
- Option A — Cowork skill (markdown prompt instructions)
- Option B — Local MCP server exposing structured taskbase tools (originally selected, superseded 2026-04-18)
- Option C — Standalone automation script calling Claude API + taskbase API directly
- Option D — FastAPI executor with ephemeral
claude -psubprocesses per task (selected 2026-04-18)
Decision Outcome¶
Chosen option: Option D — FastAPI executor with ephemeral claude -p subprocesses, because:
- The brain (Taskbase) owns prioritization, retries and dependencies; pushing tasks to a thin executor keeps that logic in one place rather than splitting it between the brain and a stateful agent
- Each task runs in a fresh subprocess, so failures, context pollution, and runaway tools cannot leak between tasks
claude -p --output-format stream-jsonprovides richer telemetry (thoughts, tool_use, tool_result, tokens, cost, final_message) as a single stream — strictly more observable than discrete RPC calls back to the brain- The Anthropic Max 20x OAuth session is inherited naturally by each
claude -pinvocation, removing per-token billing - MCP servers (gitea, openbao, taskbase-mcp, token-tracker) move from being the dispatch protocol to being worker-side integrations, where they better fit their actual role
- Claude-in-Chrome with a persistent profile works cleanly in this model: per-task fresh tab, shared login state across tasks
Option A — Cowork Skill (Markdown prompt instructions)¶
Architecture:
- A skill file (e.g. agent-skills/taskbase-runner/SKILL.md) that instructs Claude to call the taskbase REST API using fetch or curl via the Bash or JavaScript tools
- Token counting done by instructing Claude to read the usage field from each response and accumulate it manually
Pros: - Zero new infrastructure — a markdown file is all that is needed - Works in any Cowork session immediately after the skill is loaded
Cons: - Claude constructing raw HTTP requests from prose instructions is fragile; URL paths, headers, and JSON payloads are error-prone when authored at inference time - Token accumulation relies on Claude not losing count across many tool calls in a long session — unreliable - Auth tokens must be embedded in the skill file or passed in plaintext through the conversation - No persistent state between tool calls; if Claude misses a step, there is no guard-rail
Option B — Local MCP Server (Selected)¶
Architecture: - A small Go or Node.js process running on the agent machine, registered with Cowork as a plugin - Exposes the following tools over the MCP protocol:
| Tool | Description |
|---|---|
get_next_task |
Returns the highest-priority pending task from taskbase, transitions it to in_progress, and returns its id, title, description, and token_budget |
log_progress |
Appends a progress note to the active task's activity log in taskbase |
report_tokens |
Sends an incremental token usage report (input_tokens, output_tokens) for the active task |
complete_task |
Marks the active task done, stores a completion summary, and sends the final token tally |
pause_task |
Marks the active task paused with a checkpoint summary when the context budget is running low |
get_token_budget |
Returns the remaining token budget for the active task so Claude can decide whether to continue or checkpoint |
- The MCP server holds the taskbase API base URL and auth token in its own config; Claude never sees credentials
- On each Claude API response, the MCP server reads the
usageobject from the response metadata and callsreport_tokensautomatically, so Claude does not need to handle this manually
Agent session flow:
Session starts
└─ Claude calls get_next_task
└─ Task returned (id, title, description, token_budget)
└─ Claude works the task
├─ Periodically calls log_progress with updates
├─ MCP server auto-reports tokens after each step
└─ Claude checks get_token_budget before each major step
├─ Budget OK → continue
└─ Budget low → calls pause_task with checkpoint summary
└─ Fresh context → calls get_next_task again
Pros:
- Structured, typed tool interface — Claude cannot construct a malformed API call
- Auth and URL management are encapsulated in the server config
- Token reporting is automatic and accurate — sourced from Claude API usage metadata
- Persistent process — available across all Cowork sessions without reloading
- Testable independently of Claude: the MCP server can be exercised with any MCP client
Cons: - Requires building and running a new local process on the agent machine - Adds a dependency: the MCP server must be running for the agent to interact with taskbase
Option D — FastAPI Executor with Ephemeral claude -p Subprocesses (Selected 2026-04-18)¶
Architecture:
Three runtime tiers, each with one job:
- Brain (Taskbase API in Kubernetes) — engine that prioritizes, schedules, handles dependencies and retries; stores full task state, event trace, and the agent fleet (types + instances)
- Executor (Mac Mini, FastAPI on
localhost:8765) — thin and stateless: receivesPOST /tasksfrom the brain, spawns aclaude -p --resume <session_id>subprocess, relays the subprocess'sstream-jsonoutput back to the brain viaPOST /tasks/{id}/events - Worker (per-task
claude -psubprocess) — does the actual work; resumes the assigned agent instance's session for persistent memory; has access to skills auto-loaded viaCLAUDE.md, MCP servers (gitea, openbao, taskbase-mcp, token-tracker), built-in tools (Read/Edit/Bash/WebFetch), and Claude-in-Chrome for browser automation
Agents are first-class:
- Agent type = template (system prompt + allowed skills/tools/MCP), defined in
taskbase/agents/<slug>/agent.yaml+system-prompt.md, synced into theagent_typestable - Agent instance = concrete worker (name + scope + persistent
session_id), stored inagent_instances; surfaced in the Taskbase UI as a fleet view (status, current task, lifetime tasks, tokens used) - Tasks assign to instances via
assignee_type=2+assignee = agent_instance_id - Process per task is still ephemeral — only the conversation persists (via
--resume); the OS process always starts fresh
Executor endpoints:
| Endpoint | Caller | Purpose |
|---|---|---|
POST /tasks |
Brain | Hand off a task: {task_id, prompt, skill_hints, tool_allowlist, permission_mode, timeout, callback_url} |
GET /tasks/{id}/stream (SSE) |
Brain or UI | Live event stream for an in-flight task |
GET /health |
Anyone | OAuth status, current task count, capacity |
Spawn command:
claude -p \
--resume <agent_instance.session_id> \
--output-format stream-json \
--allowedTools <from agent_type.allowed_tools> \
--append-system-prompt <agent_type.system_prompt> \
--permission-mode <from agent_type.default_permission_mode> \
--chrome \
<task.prompt>
If agent_instance.session_id is NULL (fresh instance or after a manual reset), --resume is omitted and the executor records the new session-id from the first init event in the stream.
Auth:
- Worker → Anthropic: Max 20x plan via OAuth (no API credits); session inherited from the executor's environment
- Brain → Executor: shared
X-Executor-Tokenheader - Worker → external services: through MCP servers that hold the credentials (Gitea PAT, OpenBao AppRole, etc.)
Pros:
- Single source of truth for scheduling decisions (the brain) — no split-brain between server and agent
- Per-task process isolation is structural — a failed task cannot affect the OS process of the next one
- Per-instance memory persistence — agents accumulate domain knowledge across tasks via
--resume, so they don't re-learn the repo every time - OAuth-based Max plan eliminates per-token billing within the plan's rate limits
stream-jsongives full observability without bespoke telemetry plumbing- Executor is so thin it can be replaced or moved without touching the brain or workers
- Adding new worker capabilities (skills, MCP servers, tools) requires no executor changes — just edit
taskbase/agents/<slug>/ - Agent fleet is visible in the Taskbase UI; non-engineer operators can see who is doing what
Cons:
- Cold-start cost per task (~1–2 s subprocess spawn + Anthropic prompt-cache miss); negligible for tasks that take minutes, noticeable for sub-second tasks
- Persistent sessions can drift — context window fills up over time, requiring an explicit reset (manual via UI, or auto-trigger at a token threshold)
- A poisoned conversation persists across tasks for the same instance until reset; mitigation is per-instance scoping (one bad task only affects that instance) plus easy reset
- The executor is a single point of failure for dispatch; if the Mac Mini is down, no tasks run (acceptable in v1)
- OAuth session expiry requires manual VNC re-auth (Anthropic does not yet expose a refresh API for Max-plan OAuth)
- Concurrency is bounded by Anthropic's rate limits and by the executor's
max_concurrent_tasks— no horizontal scale-out without a second OAuth session
Option C — Standalone Automation Script¶
Architecture:
- A script (Python or Go) that runs on a cron schedule on the Mac mini
- Calls Claude API directly with a system prompt instructing it to work the next taskbase task
- Reads usage from Claude API responses and posts them back to taskbase
Pros: - Fully autonomous — no Cowork session required; runs on a timer - Token tracking is clean since the script controls the API call loop
Cons: - Bypasses Cowork entirely — the agent cannot use Cowork skills, computer use, or other MCP tools while working a task; severely limits what the agent can actually do - The Claude context is managed by the script, not by Claude itself; context budget logic must be re-implemented in the script - Harder to observe: progress is only visible in taskbase logs, not in a Cowork session the operator can watch
Implementation Notes¶
- Agent types live in the Taskbase repo at
taskbase/agents/<slug>/withagent.yaml(name, allowed_skills/tools/mcp, defaults) andsystem-prompt.md; a startup-time sync upserts them into theagent_typestable so the UI and API don't need to read the filesystem at request time - Agent instances live in the
agent_instancestable:id,agent_type_id,name,slug,scope_type(organization/project/free),scope_id,session_id,status(idle/working/paused/offline),current_task_id,last_active_at,total_tasks_completed,total_tokens_used - Resetting an instance is just
UPDATE agent_instances SET session_id = NULL— the next task spawns without--resumeand captures a new session-id - The FastAPI executor config lives at
~/.config/taskbase-executor/config.yamlwith fields:listen,brain_base_url,executor_token,max_concurrent_tasks,default_timeout_seconds,claude_binary,log_dir - The executor runs under launchd (
~/Library/LaunchAgents/taskbase-executor.plist) — same reasoning as before (native macOS, starts on boot, restarts on crash) - Per-task
stream-jsonis persisted locally at~/Library/Logs/taskbase-executor/tasks/<task_id>.jsonlin addition to being streamed back to the brain — this is the local replay buffer if the brain is briefly unreachable - Token usage is read from the
usageevents insidestream-json(per-message, plus a final tally) and posted to the brain as part of the event stream — no separatereport_tokenscall needed - v1 supports concurrent tasks up to
max_concurrent_tasks(default 3) — bounded by Anthropic Max-plan rate limits; the executor enforces it; the brain treats429fromPOST /tasksas backpressure - Skills auto-load via
~/.claude/CLAUDE.mdon the Mac Mini;skill_hintsin the dispatch payload bias the worker toward the relevant ones via--append-system-prompt - MCP servers (gitea, openbao, taskbase-mcp, token-tracker) are registered globally so every spawned worker inherits them
- Claude-in-Chrome uses a persistent headed Chrome profile on the Mac Mini; per-task fresh tab, shared login state across tasks; initial login / CAPTCHA handled once via VNC