0004 - Taskbase Agent Module¶

Status¶

Superseded by ADR-0005 — Taskbase Agent System on 2026-05-02

Date¶

2026-04-02 (original) · 2026-04-18 (revised) · 2026-05-02 (superseded)

Superseded. ADR-0005 replaces this design with per-organization agent personas in Git, a runtime-agnostic brain, and a pure priority-dispatch model — no agent instances, no session_id resumption, no brain-managed task lifecycle states. The content below is preserved as the historical 2026-04-18 decision; the live design is in ADR-0005.

Revision Note (2026-04-18)¶

This ADR was originally accepted on 2026-04-02 with Option B — Local MCP server as the chosen design: a persistent MCP process on the Mac Mini exposing typed tools (get_next_task, log_progress, etc.) to a long-lived Cowork session, authenticating to Anthropic via API key.

After running this design we revised it. The chosen option is now Option D — FastAPI executor with ephemeral claude -p subprocesses, recorded below. The drivers for the change:

Cost model — moving to the Anthropic Max 20x plan eliminates per-token billing but requires OAuth (not an API key), and the claude -p CLI is the natural way to inherit that OAuth session
Per-task isolation — runaway tool calls and context pollution were leaking between tasks in a long-lived MCP session; ephemeral subprocesses give clean isolation per task
Browser automation — Claude-in-Chrome was added to the worker capability set; it works most cleanly when each task gets a fresh tab against a persistent logged-in Chrome
Dispatch direction — pull-based dispatch (worker asks the brain for work) made retry, prioritization, and cancellation awkward; push-based dispatch (brain tells the executor what to run) puts those concerns in one place
Observability — stream-json from claude -p carries far richer telemetry (thoughts, tool_use, tool_result, tokens, cost) than the discrete report_tokens / log_progress calls Option B used
Persistent agent identity — fully fresh sessions per task lose accumulated repo/domain knowledge between tasks; we want named agents that keep a conversation across tasks, while still spawning a fresh subprocess each time for process isolation

The revision splits agents into types (reusable templates: system prompt + skill/tool/MCP allowlist, defined in taskbase/agents/<slug>/) and instances (concrete workers with a name, scope, and a persistent Claude session-id). Tasks assign to instances via the existing assignee_type=2 mechanism. The executor still spawns a fresh claude -p subprocess per task, but uses --resume <instance.session_id> so the worker continues the instance's ongoing conversation.

Options A, B, and C below are kept for historical context. Option D is the current design and is reflected in agent-architecture.md.

Context¶

ADR 0002 established the taskbase system: a Kubernetes-hosted task management app with a REST API that lets an agent pick up work, report progress, and track token consumption. That ADR defined the server side. This ADR defines the agent side — the module that runs on the agent machine and drives the interaction with the taskbase API autonomously.

The requirements for this module are:

Autonomous task pickup — the agent should be able to start a new session, call the taskbase API, and receive the next task without a human typing anything
Token consumption tracking — every Claude API call made while working a task emits a usage object; the module must accumulate these and report them back to taskbase after each interaction step and on task completion
Task lifecycle signalling — the agent must transition tasks through pending → in_progress → done / paused by calling the taskbase API at each phase boundary
Context budget awareness — when remaining token budget approaches a configurable threshold, the agent should checkpoint the current task (log a summary, mark it paused), and pull the next task into a fresh context
Runs locally on the agent machine — the module lives on the Mac mini alongside Cowork, not in the Kubernetes cluster; it is the agent's interface to the taskbase system

Decision Drivers¶

Low coupling — the module should wrap the taskbase REST API without tightly coupling to its internal implementation; if endpoints change, only the module needs updating
Native tool interface — Claude works best when capabilities are presented as structured tools, not prose instructions; the module should expose named tools that Claude can call directly
Zero manual steps — once a Cowork session starts, no human input should be required to pick up and begin executing the next queued task
Auditability — every tool call the agent makes against taskbase (task pickup, log entries, token reports, status transitions) must be traceable in the taskbase management UI
Token accuracy — token counts must come from the authoritative source: the usage field on Claude API responses, not estimates or scraping

Considered Options¶

Option A — Cowork skill (markdown prompt instructions)
Option B — Local MCP server exposing structured taskbase tools (originally selected, superseded 2026-04-18)
Option C — Standalone automation script calling Claude API + taskbase API directly
Option D — FastAPI executor with ephemeral claude -p subprocesses per task (selected 2026-04-18)

Decision Outcome¶

Chosen option: Option D — FastAPI executor with ephemeral claude -p subprocesses, because:

The brain (Taskbase) owns prioritization, retries and dependencies; pushing tasks to a thin executor keeps that logic in one place rather than splitting it between the brain and a stateful agent
Each task runs in a fresh subprocess, so failures, context pollution, and runaway tools cannot leak between tasks
claude -p --output-format stream-json provides richer telemetry (thoughts, tool_use, tool_result, tokens, cost, final_message) as a single stream — strictly more observable than discrete RPC calls back to the brain
The Anthropic Max 20x OAuth session is inherited naturally by each claude -p invocation, removing per-token billing
MCP servers (gitea, openbao, taskbase-mcp, token-tracker) move from being the dispatch protocol to being worker-side integrations, where they better fit their actual role
Claude-in-Chrome with a persistent profile works cleanly in this model: per-task fresh tab, shared login state across tasks

Option A — Cowork Skill (Markdown prompt instructions)¶

Architecture: - A skill file (e.g. agent-skills/taskbase-runner/SKILL.md) that instructs Claude to call the taskbase REST API using fetch or curl via the Bash or JavaScript tools - Token counting done by instructing Claude to read the usage field from each response and accumulate it manually

Pros: - Zero new infrastructure — a markdown file is all that is needed - Works in any Cowork session immediately after the skill is loaded

Cons: - Claude constructing raw HTTP requests from prose instructions is fragile; URL paths, headers, and JSON payloads are error-prone when authored at inference time - Token accumulation relies on Claude not losing count across many tool calls in a long session — unreliable - Auth tokens must be embedded in the skill file or passed in plaintext through the conversation - No persistent state between tool calls; if Claude misses a step, there is no guard-rail

Option B — Local MCP Server (Selected)¶

Architecture: - A small Go or Node.js process running on the agent machine, registered with Cowork as a plugin - Exposes the following tools over the MCP protocol:

Tool	Description
`get_next_task`	Returns the highest-priority `pending` task from taskbase, transitions it to `in_progress`, and returns its `id`, `title`, `description`, and `token_budget`
`log_progress`	Appends a progress note to the active task's activity log in taskbase
`report_tokens`	Sends an incremental token usage report (`input_tokens`, `output_tokens`) for the active task
`complete_task`	Marks the active task `done`, stores a completion summary, and sends the final token tally
`pause_task`	Marks the active task `paused` with a checkpoint summary when the context budget is running low
`get_token_budget`	Returns the remaining token budget for the active task so Claude can decide whether to continue or checkpoint

The MCP server holds the taskbase API base URL and auth token in its own config; Claude never sees credentials
On each Claude API response, the MCP server reads the usage object from the response metadata and calls report_tokens automatically, so Claude does not need to handle this manually

Agent session flow:

Session starts
  └─ Claude calls get_next_task
       └─ Task returned (id, title, description, token_budget)
            └─ Claude works the task
                 ├─ Periodically calls log_progress with updates
                 ├─ MCP server auto-reports tokens after each step
                 └─ Claude checks get_token_budget before each major step
                      ├─ Budget OK → continue
                      └─ Budget low → calls pause_task with checkpoint summary
                           └─ Fresh context → calls get_next_task again

Pros: - Structured, typed tool interface — Claude cannot construct a malformed API call - Auth and URL management are encapsulated in the server config - Token reporting is automatic and accurate — sourced from Claude API usage metadata - Persistent process — available across all Cowork sessions without reloading - Testable independently of Claude: the MCP server can be exercised with any MCP client

Cons: - Requires building and running a new local process on the agent machine - Adds a dependency: the MCP server must be running for the agent to interact with taskbase

Option D — FastAPI Executor with Ephemeral `claude -p` Subprocesses (Selected 2026-04-18)¶

Architecture:

Three runtime tiers, each with one job:

Brain (Taskbase API in Kubernetes) — engine that prioritizes, schedules, handles dependencies and retries; stores full task state, event trace, and the agent fleet (types + instances)
Executor (Mac Mini, FastAPI on localhost:8765) — thin and stateless: receives POST /tasks from the brain, spawns a claude -p --resume <session_id> subprocess, relays the subprocess's stream-json output back to the brain via POST /tasks/{id}/events
Worker (per-task claude -p subprocess) — does the actual work; resumes the assigned agent instance's session for persistent memory; has access to skills auto-loaded via CLAUDE.md, MCP servers (gitea, openbao, taskbase-mcp, token-tracker), built-in tools (Read/Edit/Bash/WebFetch), and Claude-in-Chrome for browser automation

Agents are first-class:

Agent type = template (system prompt + allowed skills/tools/MCP), defined in taskbase/agents/<slug>/agent.yaml + system-prompt.md, synced into the agent_types table
Agent instance = concrete worker (name + scope + persistent session_id), stored in agent_instances; surfaced in the Taskbase UI as a fleet view (status, current task, lifetime tasks, tokens used)
Tasks assign to instances via assignee_type=2 + assignee = agent_instance_id
Process per task is still ephemeral — only the conversation persists (via --resume); the OS process always starts fresh

Executor endpoints:

Endpoint	Caller	Purpose
`POST /tasks`	Brain	Hand off a task: `{task_id, prompt, skill_hints, tool_allowlist, permission_mode, timeout, callback_url}`
`GET /tasks/{id}/stream` (SSE)	Brain or UI	Live event stream for an in-flight task
`GET /health`	Anyone	OAuth status, current task count, capacity

Spawn command:

claude -p \
  --resume <agent_instance.session_id> \
  --output-format stream-json \
  --allowedTools <from agent_type.allowed_tools> \
  --append-system-prompt <agent_type.system_prompt> \
  --permission-mode <from agent_type.default_permission_mode> \
  --chrome \
  <task.prompt>

If agent_instance.session_id is NULL (fresh instance or after a manual reset), --resume is omitted and the executor records the new session-id from the first init event in the stream.

Auth:

Worker → Anthropic: Max 20x plan via OAuth (no API credits); session inherited from the executor's environment
Brain → Executor: shared X-Executor-Token header
Worker → external services: through MCP servers that hold the credentials (Gitea PAT, OpenBao AppRole, etc.)

Pros:

Single source of truth for scheduling decisions (the brain) — no split-brain between server and agent
Per-task process isolation is structural — a failed task cannot affect the OS process of the next one
Per-instance memory persistence — agents accumulate domain knowledge across tasks via --resume, so they don't re-learn the repo every time
OAuth-based Max plan eliminates per-token billing within the plan's rate limits
stream-json gives full observability without bespoke telemetry plumbing
Executor is so thin it can be replaced or moved without touching the brain or workers
Adding new worker capabilities (skills, MCP servers, tools) requires no executor changes — just edit taskbase/agents/<slug>/
Agent fleet is visible in the Taskbase UI; non-engineer operators can see who is doing what

Cons:

Cold-start cost per task (~1–2 s subprocess spawn + Anthropic prompt-cache miss); negligible for tasks that take minutes, noticeable for sub-second tasks
Persistent sessions can drift — context window fills up over time, requiring an explicit reset (manual via UI, or auto-trigger at a token threshold)
A poisoned conversation persists across tasks for the same instance until reset; mitigation is per-instance scoping (one bad task only affects that instance) plus easy reset
The executor is a single point of failure for dispatch; if the Mac Mini is down, no tasks run (acceptable in v1)
OAuth session expiry requires manual VNC re-auth (Anthropic does not yet expose a refresh API for Max-plan OAuth)
Concurrency is bounded by Anthropic's rate limits and by the executor's max_concurrent_tasks — no horizontal scale-out without a second OAuth session

Option C — Standalone Automation Script¶

Architecture: - A script (Python or Go) that runs on a cron schedule on the Mac mini - Calls Claude API directly with a system prompt instructing it to work the next taskbase task - Reads usage from Claude API responses and posts them back to taskbase

Pros: - Fully autonomous — no Cowork session required; runs on a timer - Token tracking is clean since the script controls the API call loop

Cons: - Bypasses Cowork entirely — the agent cannot use Cowork skills, computer use, or other MCP tools while working a task; severely limits what the agent can actually do - The Claude context is managed by the script, not by Claude itself; context budget logic must be re-implemented in the script - Harder to observe: progress is only visible in taskbase logs, not in a Cowork session the operator can watch

Implementation Notes¶

Agent types live in the Taskbase repo at taskbase/agents/<slug>/ with agent.yaml (name, allowed_skills/tools/mcp, defaults) and system-prompt.md; a startup-time sync upserts them into the agent_types table so the UI and API don't need to read the filesystem at request time
Agent instances live in the agent_instances table: id, agent_type_id, name, slug, scope_type (organization/project/free), scope_id, session_id, status (idle/working/paused/offline), current_task_id, last_active_at, total_tasks_completed, total_tokens_used
Resetting an instance is just UPDATE agent_instances SET session_id = NULL — the next task spawns without --resume and captures a new session-id
The FastAPI executor config lives at ~/.config/taskbase-executor/config.yaml with fields: listen, brain_base_url, executor_token, max_concurrent_tasks, default_timeout_seconds, claude_binary, log_dir
The executor runs under launchd (~/Library/LaunchAgents/taskbase-executor.plist) — same reasoning as before (native macOS, starts on boot, restarts on crash)
Per-task stream-json is persisted locally at ~/Library/Logs/taskbase-executor/tasks/<task_id>.jsonl in addition to being streamed back to the brain — this is the local replay buffer if the brain is briefly unreachable
Token usage is read from the usage events inside stream-json (per-message, plus a final tally) and posted to the brain as part of the event stream — no separate report_tokens call needed
v1 supports concurrent tasks up to max_concurrent_tasks (default 3) — bounded by Anthropic Max-plan rate limits; the executor enforces it; the brain treats 429 from POST /tasks as backpressure
Skills auto-load via ~/.claude/CLAUDE.md on the Mac Mini; skill_hints in the dispatch payload bias the worker toward the relevant ones via --append-system-prompt
MCP servers (gitea, openbao, taskbase-mcp, token-tracker) are registered globally so every spawned worker inherits them
Claude-in-Chrome uses a persistent headed Chrome profile on the Mac Mini; per-task fresh tab, shared login state across tasks; initial login / CAPTCHA handled once via VNC