Skip to content

0004 - Taskbase Agent Module

Status

Superseded by ADR-0005 — Taskbase Agent System on 2026-05-02

Date

2026-04-02 (original) · 2026-04-18 (revised) · 2026-05-02 (superseded)

Superseded. ADR-0005 replaces this design with per-organization agent personas in Git, a runtime-agnostic brain, and a pure priority-dispatch model — no agent instances, no session_id resumption, no brain-managed task lifecycle states. The content below is preserved as the historical 2026-04-18 decision; the live design is in ADR-0005.

Revision Note (2026-04-18)

This ADR was originally accepted on 2026-04-02 with Option B — Local MCP server as the chosen design: a persistent MCP process on the Mac Mini exposing typed tools (get_next_task, log_progress, etc.) to a long-lived Cowork session, authenticating to Anthropic via API key.

After running this design we revised it. The chosen option is now Option D — FastAPI executor with ephemeral claude -p subprocesses, recorded below. The drivers for the change:

  • Cost model — moving to the Anthropic Max 20x plan eliminates per-token billing but requires OAuth (not an API key), and the claude -p CLI is the natural way to inherit that OAuth session
  • Per-task isolation — runaway tool calls and context pollution were leaking between tasks in a long-lived MCP session; ephemeral subprocesses give clean isolation per task
  • Browser automation — Claude-in-Chrome was added to the worker capability set; it works most cleanly when each task gets a fresh tab against a persistent logged-in Chrome
  • Dispatch direction — pull-based dispatch (worker asks the brain for work) made retry, prioritization, and cancellation awkward; push-based dispatch (brain tells the executor what to run) puts those concerns in one place
  • Observabilitystream-json from claude -p carries far richer telemetry (thoughts, tool_use, tool_result, tokens, cost) than the discrete report_tokens / log_progress calls Option B used
  • Persistent agent identity — fully fresh sessions per task lose accumulated repo/domain knowledge between tasks; we want named agents that keep a conversation across tasks, while still spawning a fresh subprocess each time for process isolation

The revision splits agents into types (reusable templates: system prompt + skill/tool/MCP allowlist, defined in taskbase/agents/<slug>/) and instances (concrete workers with a name, scope, and a persistent Claude session-id). Tasks assign to instances via the existing assignee_type=2 mechanism. The executor still spawns a fresh claude -p subprocess per task, but uses --resume <instance.session_id> so the worker continues the instance's ongoing conversation.

Options A, B, and C below are kept for historical context. Option D is the current design and is reflected in agent-architecture.md.

Context

ADR 0002 established the taskbase system: a Kubernetes-hosted task management app with a REST API that lets an agent pick up work, report progress, and track token consumption. That ADR defined the server side. This ADR defines the agent side — the module that runs on the agent machine and drives the interaction with the taskbase API autonomously.

The requirements for this module are:

  • Autonomous task pickup — the agent should be able to start a new session, call the taskbase API, and receive the next task without a human typing anything
  • Token consumption tracking — every Claude API call made while working a task emits a usage object; the module must accumulate these and report them back to taskbase after each interaction step and on task completion
  • Task lifecycle signalling — the agent must transition tasks through pending → in_progress → done / paused by calling the taskbase API at each phase boundary
  • Context budget awareness — when remaining token budget approaches a configurable threshold, the agent should checkpoint the current task (log a summary, mark it paused), and pull the next task into a fresh context
  • Runs locally on the agent machine — the module lives on the Mac mini alongside Cowork, not in the Kubernetes cluster; it is the agent's interface to the taskbase system

Decision Drivers

  • Low coupling — the module should wrap the taskbase REST API without tightly coupling to its internal implementation; if endpoints change, only the module needs updating
  • Native tool interface — Claude works best when capabilities are presented as structured tools, not prose instructions; the module should expose named tools that Claude can call directly
  • Zero manual steps — once a Cowork session starts, no human input should be required to pick up and begin executing the next queued task
  • Auditability — every tool call the agent makes against taskbase (task pickup, log entries, token reports, status transitions) must be traceable in the taskbase management UI
  • Token accuracy — token counts must come from the authoritative source: the usage field on Claude API responses, not estimates or scraping

Considered Options

  • Option A — Cowork skill (markdown prompt instructions)
  • Option B — Local MCP server exposing structured taskbase tools (originally selected, superseded 2026-04-18)
  • Option C — Standalone automation script calling Claude API + taskbase API directly
  • Option D — FastAPI executor with ephemeral claude -p subprocesses per task (selected 2026-04-18)

Decision Outcome

Chosen option: Option D — FastAPI executor with ephemeral claude -p subprocesses, because:

  • The brain (Taskbase) owns prioritization, retries and dependencies; pushing tasks to a thin executor keeps that logic in one place rather than splitting it between the brain and a stateful agent
  • Each task runs in a fresh subprocess, so failures, context pollution, and runaway tools cannot leak between tasks
  • claude -p --output-format stream-json provides richer telemetry (thoughts, tool_use, tool_result, tokens, cost, final_message) as a single stream — strictly more observable than discrete RPC calls back to the brain
  • The Anthropic Max 20x OAuth session is inherited naturally by each claude -p invocation, removing per-token billing
  • MCP servers (gitea, openbao, taskbase-mcp, token-tracker) move from being the dispatch protocol to being worker-side integrations, where they better fit their actual role
  • Claude-in-Chrome with a persistent profile works cleanly in this model: per-task fresh tab, shared login state across tasks

Option A — Cowork Skill (Markdown prompt instructions)

Architecture: - A skill file (e.g. agent-skills/taskbase-runner/SKILL.md) that instructs Claude to call the taskbase REST API using fetch or curl via the Bash or JavaScript tools - Token counting done by instructing Claude to read the usage field from each response and accumulate it manually

Pros: - Zero new infrastructure — a markdown file is all that is needed - Works in any Cowork session immediately after the skill is loaded

Cons: - Claude constructing raw HTTP requests from prose instructions is fragile; URL paths, headers, and JSON payloads are error-prone when authored at inference time - Token accumulation relies on Claude not losing count across many tool calls in a long session — unreliable - Auth tokens must be embedded in the skill file or passed in plaintext through the conversation - No persistent state between tool calls; if Claude misses a step, there is no guard-rail


Option B — Local MCP Server (Selected)

Architecture: - A small Go or Node.js process running on the agent machine, registered with Cowork as a plugin - Exposes the following tools over the MCP protocol:

Tool Description
get_next_task Returns the highest-priority pending task from taskbase, transitions it to in_progress, and returns its id, title, description, and token_budget
log_progress Appends a progress note to the active task's activity log in taskbase
report_tokens Sends an incremental token usage report (input_tokens, output_tokens) for the active task
complete_task Marks the active task done, stores a completion summary, and sends the final token tally
pause_task Marks the active task paused with a checkpoint summary when the context budget is running low
get_token_budget Returns the remaining token budget for the active task so Claude can decide whether to continue or checkpoint
  • The MCP server holds the taskbase API base URL and auth token in its own config; Claude never sees credentials
  • On each Claude API response, the MCP server reads the usage object from the response metadata and calls report_tokens automatically, so Claude does not need to handle this manually

Agent session flow:

Session starts
  └─ Claude calls get_next_task
       └─ Task returned (id, title, description, token_budget)
            └─ Claude works the task
                 ├─ Periodically calls log_progress with updates
                 ├─ MCP server auto-reports tokens after each step
                 └─ Claude checks get_token_budget before each major step
                      ├─ Budget OK → continue
                      └─ Budget low → calls pause_task with checkpoint summary
                           └─ Fresh context → calls get_next_task again

Pros: - Structured, typed tool interface — Claude cannot construct a malformed API call - Auth and URL management are encapsulated in the server config - Token reporting is automatic and accurate — sourced from Claude API usage metadata - Persistent process — available across all Cowork sessions without reloading - Testable independently of Claude: the MCP server can be exercised with any MCP client

Cons: - Requires building and running a new local process on the agent machine - Adds a dependency: the MCP server must be running for the agent to interact with taskbase


Option D — FastAPI Executor with Ephemeral claude -p Subprocesses (Selected 2026-04-18)

Architecture:

Three runtime tiers, each with one job:

  • Brain (Taskbase API in Kubernetes) — engine that prioritizes, schedules, handles dependencies and retries; stores full task state, event trace, and the agent fleet (types + instances)
  • Executor (Mac Mini, FastAPI on localhost:8765) — thin and stateless: receives POST /tasks from the brain, spawns a claude -p --resume <session_id> subprocess, relays the subprocess's stream-json output back to the brain via POST /tasks/{id}/events
  • Worker (per-task claude -p subprocess) — does the actual work; resumes the assigned agent instance's session for persistent memory; has access to skills auto-loaded via CLAUDE.md, MCP servers (gitea, openbao, taskbase-mcp, token-tracker), built-in tools (Read/Edit/Bash/WebFetch), and Claude-in-Chrome for browser automation

Agents are first-class:

  • Agent type = template (system prompt + allowed skills/tools/MCP), defined in taskbase/agents/<slug>/agent.yaml + system-prompt.md, synced into the agent_types table
  • Agent instance = concrete worker (name + scope + persistent session_id), stored in agent_instances; surfaced in the Taskbase UI as a fleet view (status, current task, lifetime tasks, tokens used)
  • Tasks assign to instances via assignee_type=2 + assignee = agent_instance_id
  • Process per task is still ephemeral — only the conversation persists (via --resume); the OS process always starts fresh

Executor endpoints:

Endpoint Caller Purpose
POST /tasks Brain Hand off a task: {task_id, prompt, skill_hints, tool_allowlist, permission_mode, timeout, callback_url}
GET /tasks/{id}/stream (SSE) Brain or UI Live event stream for an in-flight task
GET /health Anyone OAuth status, current task count, capacity

Spawn command:

claude -p \
  --resume <agent_instance.session_id> \
  --output-format stream-json \
  --allowedTools <from agent_type.allowed_tools> \
  --append-system-prompt <agent_type.system_prompt> \
  --permission-mode <from agent_type.default_permission_mode> \
  --chrome \
  <task.prompt>

If agent_instance.session_id is NULL (fresh instance or after a manual reset), --resume is omitted and the executor records the new session-id from the first init event in the stream.

Auth:

  • Worker → Anthropic: Max 20x plan via OAuth (no API credits); session inherited from the executor's environment
  • Brain → Executor: shared X-Executor-Token header
  • Worker → external services: through MCP servers that hold the credentials (Gitea PAT, OpenBao AppRole, etc.)

Pros:

  • Single source of truth for scheduling decisions (the brain) — no split-brain between server and agent
  • Per-task process isolation is structural — a failed task cannot affect the OS process of the next one
  • Per-instance memory persistence — agents accumulate domain knowledge across tasks via --resume, so they don't re-learn the repo every time
  • OAuth-based Max plan eliminates per-token billing within the plan's rate limits
  • stream-json gives full observability without bespoke telemetry plumbing
  • Executor is so thin it can be replaced or moved without touching the brain or workers
  • Adding new worker capabilities (skills, MCP servers, tools) requires no executor changes — just edit taskbase/agents/<slug>/
  • Agent fleet is visible in the Taskbase UI; non-engineer operators can see who is doing what

Cons:

  • Cold-start cost per task (~1–2 s subprocess spawn + Anthropic prompt-cache miss); negligible for tasks that take minutes, noticeable for sub-second tasks
  • Persistent sessions can drift — context window fills up over time, requiring an explicit reset (manual via UI, or auto-trigger at a token threshold)
  • A poisoned conversation persists across tasks for the same instance until reset; mitigation is per-instance scoping (one bad task only affects that instance) plus easy reset
  • The executor is a single point of failure for dispatch; if the Mac Mini is down, no tasks run (acceptable in v1)
  • OAuth session expiry requires manual VNC re-auth (Anthropic does not yet expose a refresh API for Max-plan OAuth)
  • Concurrency is bounded by Anthropic's rate limits and by the executor's max_concurrent_tasks — no horizontal scale-out without a second OAuth session

Option C — Standalone Automation Script

Architecture: - A script (Python or Go) that runs on a cron schedule on the Mac mini - Calls Claude API directly with a system prompt instructing it to work the next taskbase task - Reads usage from Claude API responses and posts them back to taskbase

Pros: - Fully autonomous — no Cowork session required; runs on a timer - Token tracking is clean since the script controls the API call loop

Cons: - Bypasses Cowork entirely — the agent cannot use Cowork skills, computer use, or other MCP tools while working a task; severely limits what the agent can actually do - The Claude context is managed by the script, not by Claude itself; context budget logic must be re-implemented in the script - Harder to observe: progress is only visible in taskbase logs, not in a Cowork session the operator can watch


Implementation Notes

  • Agent types live in the Taskbase repo at taskbase/agents/<slug>/ with agent.yaml (name, allowed_skills/tools/mcp, defaults) and system-prompt.md; a startup-time sync upserts them into the agent_types table so the UI and API don't need to read the filesystem at request time
  • Agent instances live in the agent_instances table: id, agent_type_id, name, slug, scope_type (organization/project/free), scope_id, session_id, status (idle/working/paused/offline), current_task_id, last_active_at, total_tasks_completed, total_tokens_used
  • Resetting an instance is just UPDATE agent_instances SET session_id = NULL — the next task spawns without --resume and captures a new session-id
  • The FastAPI executor config lives at ~/.config/taskbase-executor/config.yaml with fields: listen, brain_base_url, executor_token, max_concurrent_tasks, default_timeout_seconds, claude_binary, log_dir
  • The executor runs under launchd (~/Library/LaunchAgents/taskbase-executor.plist) — same reasoning as before (native macOS, starts on boot, restarts on crash)
  • Per-task stream-json is persisted locally at ~/Library/Logs/taskbase-executor/tasks/<task_id>.jsonl in addition to being streamed back to the brain — this is the local replay buffer if the brain is briefly unreachable
  • Token usage is read from the usage events inside stream-json (per-message, plus a final tally) and posted to the brain as part of the event stream — no separate report_tokens call needed
  • v1 supports concurrent tasks up to max_concurrent_tasks (default 3) — bounded by Anthropic Max-plan rate limits; the executor enforces it; the brain treats 429 from POST /tasks as backpressure
  • Skills auto-load via ~/.claude/CLAUDE.md on the Mac Mini; skill_hints in the dispatch payload bias the worker toward the relevant ones via --append-system-prompt
  • MCP servers (gitea, openbao, taskbase-mcp, token-tracker) are registered globally so every spawned worker inherits them
  • Claude-in-Chrome uses a persistent headed Chrome profile on the Mac Mini; per-task fresh tab, shared login state across tasks; initial login / CAPTCHA handled once via VNC