Skip to content

0003 - Taskbase Agent Module

Status

Accepted

Date

2026-04-02

Context

ADR 0002 established the taskbase system: a Kubernetes-hosted task management app with a REST API that lets an agent pick up work, report progress, and track token consumption. That ADR defined the server side. This ADR defines the agent side — the module that runs on the agent machine and drives the interaction with the taskbase API autonomously.

The requirements for this module are:

  • Autonomous task pickup — the agent should be able to start a new session, call the taskbase API, and receive the next task without a human typing anything
  • Token consumption tracking — every Claude API call made while working a task emits a usage object; the module must accumulate these and report them back to taskbase after each interaction step and on task completion
  • Task lifecycle signalling — the agent must transition tasks through pending → in_progress → done / paused by calling the taskbase API at each phase boundary
  • Context budget awareness — when remaining token budget approaches a configurable threshold, the agent should checkpoint the current task (log a summary, mark it paused), and pull the next task into a fresh context
  • Runs locally on the agent machine — the module lives on the Mac mini alongside Cowork, not in the Kubernetes cluster; it is the agent's interface to the taskbase system

Decision Drivers

  • Low coupling — the module should wrap the taskbase REST API without tightly coupling to its internal implementation; if endpoints change, only the module needs updating
  • Native tool interface — Claude works best when capabilities are presented as structured tools, not prose instructions; the module should expose named tools that Claude can call directly
  • Zero manual steps — once a Cowork session starts, no human input should be required to pick up and begin executing the next queued task
  • Auditability — every tool call the agent makes against taskbase (task pickup, log entries, token reports, status transitions) must be traceable in the taskbase management UI
  • Token accuracy — token counts must come from the authoritative source: the usage field on Claude API responses, not estimates or scraping

Considered Options

  • Option A — Cowork skill (markdown prompt instructions)
  • Option B — Local MCP server exposing structured taskbase tools (Selected)
  • Option C — Standalone automation script calling Claude API + taskbase API directly

Decision Outcome

Chosen option: Option B — Local MCP server, because:

  • It exposes the taskbase API as discrete, named tools (get_next_task, log_progress, report_tokens, complete_task, pause_task) that Claude can call with type-safe arguments — much more reliable than asking Claude to construct raw HTTP requests from prose instructions
  • It runs as a persistent process on the agent machine and is available to any Cowork session without per-session setup
  • Token usage can be forwarded to taskbase directly from inside the MCP server by intercepting the Claude API usage object before returning results to the session
  • It keeps the taskbase REST API internal to the server; Claude never constructs API URLs or manages auth tokens directly

Option A — Cowork Skill (Markdown prompt instructions)

Architecture: - A skill file (e.g. agent-skills/taskbase-runner/SKILL.md) that instructs Claude to call the taskbase REST API using fetch or curl via the Bash or JavaScript tools - Token counting done by instructing Claude to read the usage field from each response and accumulate it manually

Pros: - Zero new infrastructure — a markdown file is all that is needed - Works in any Cowork session immediately after the skill is loaded

Cons: - Claude constructing raw HTTP requests from prose instructions is fragile; URL paths, headers, and JSON payloads are error-prone when authored at inference time - Token accumulation relies on Claude not losing count across many tool calls in a long session — unreliable - Auth tokens must be embedded in the skill file or passed in plaintext through the conversation - No persistent state between tool calls; if Claude misses a step, there is no guard-rail


Option B — Local MCP Server (Selected)

Architecture: - A small Go or Node.js process running on the agent machine, registered with Cowork as a plugin - Exposes the following tools over the MCP protocol:

Tool Description
get_next_task Returns the highest-priority pending task from taskbase, transitions it to in_progress, and returns its id, title, description, and token_budget
log_progress Appends a progress note to the active task's activity log in taskbase
report_tokens Sends an incremental token usage report (input_tokens, output_tokens) for the active task
complete_task Marks the active task done, stores a completion summary, and sends the final token tally
pause_task Marks the active task paused with a checkpoint summary when the context budget is running low
get_token_budget Returns the remaining token budget for the active task so Claude can decide whether to continue or checkpoint
  • The MCP server holds the taskbase API base URL and auth token in its own config; Claude never sees credentials
  • On each Claude API response, the MCP server reads the usage object from the response metadata and calls report_tokens automatically, so Claude does not need to handle this manually

Agent session flow:

Session starts
  └─ Claude calls get_next_task
       └─ Task returned (id, title, description, token_budget)
            └─ Claude works the task
                 ├─ Periodically calls log_progress with updates
                 ├─ MCP server auto-reports tokens after each step
                 └─ Claude checks get_token_budget before each major step
                      ├─ Budget OK → continue
                      └─ Budget low → calls pause_task with checkpoint summary
                           └─ Fresh context → calls get_next_task again

Pros: - Structured, typed tool interface — Claude cannot construct a malformed API call - Auth and URL management are encapsulated in the server config - Token reporting is automatic and accurate — sourced from Claude API usage metadata - Persistent process — available across all Cowork sessions without reloading - Testable independently of Claude: the MCP server can be exercised with any MCP client

Cons: - Requires building and running a new local process on the agent machine - Adds a dependency: the MCP server must be running for the agent to interact with taskbase


Option C — Standalone Automation Script

Architecture: - A script (Python or Go) that runs on a cron schedule on the Mac mini - Calls Claude API directly with a system prompt instructing it to work the next taskbase task - Reads usage from Claude API responses and posts them back to taskbase

Pros: - Fully autonomous — no Cowork session required; runs on a timer - Token tracking is clean since the script controls the API call loop

Cons: - Bypasses Cowork entirely — the agent cannot use Cowork skills, computer use, or other MCP tools while working a task; severely limits what the agent can actually do - The Claude context is managed by the script, not by Claude itself; context budget logic must be re-implemented in the script - Harder to observe: progress is only visible in taskbase logs, not in a Cowork session the operator can watch


Implementation Notes

  • The MCP server config file should live at ~/.config/taskbase-agent/config.yaml with fields: api_base_url, api_token, default_token_budget, budget_warning_threshold
  • The server should be registered in the Cowork plugins directory so it starts automatically when Cowork launches
  • Token budget is enforced on the agent side (via get_token_budget) and recorded on the server side (via report_tokens); both are necessary — the server-side record is the audit trail, the agent-side check is the guard-rail
  • v1 does not handle parallel tasks; the server enforces that only one task can be in_progress per agent session at a time