Skip to content

Recipes & best practices

Corpus is simple on purpose. The leverage is in how you structure collection. These are patterns that work in practice.

Design collections around an audience, not a topic

Section titled “Design collections around an audience, not a topic”

A collection is “what this kind of agent should know,” not “everything about X.” Name them for the reader:

  • backend-agent — conventions, error model, testing rules.
  • release-reviewer — the checklist a PR-review agent applies.
  • support-bot — product facts and tone for a customer-facing agent.

A document (say error-handling) can live in several of these. Write it once, reuse it everywhere — that’s the whole point. Don’t fork a doc per audience; attach it to each collection.

Each collection has its own always-include budget (default 8,000 tokens; raise it per collection if you’re feeding a larger context window). Past the budget the meter warns — a bloated always-include set dilutes the agent’s attention and burns the window before it does any work. On-demand documents stay in collection://<slug>/outline and are pulled by path only when relevant.

When a collection gets heavy:

  • Split it by task (backend-agent-writing vs backend-agent-reviewing).
  • Move rarely-needed detail to on-demand (toggle Always include off).
  • Lead with the most important framing first — it’s what the agent sees before it decides what else to pull.

“Always-include these few, plus a big on-demand library” is the common shape: toggle Always include on for the small set the agent must internalize, leave the rest on-demand. Both live in the same Collection and the agent sees the outline of all of them.

  • One clear # H1 per document; explicit imperative rules (“Always validate input with Zod at the boundary”) beat prose.
  • State the rule and the rationale — agents follow rules they understand better than bare edicts.
  • Avoid implicit collection (“as discussed last sprint”). The agent has only what’s in the corpus.
  • Link related documents by relative path; the editor flags broken links and agents can traverse the resolved link graph via the collection outline.

Put status, owner, audience in frontmatter. It never changes the file the agent reads, but an agent can call read_document_meta and skip or downweight a doc marked status: draft. Good for staging changes before they’re authoritative.

Don’t rely on remembering to say “read the collection.” Put it in the agent’s standing instructions:

  • Claude Code — add a line to CLAUDE.md: “At the start of any task, work from the corpus collection you’re connected to: read its always-included guidance, browse its outline, and read docs/error-handling.md when the task touches errors.”
  • Cursor / others — the equivalent rules file.

You can name a specific document by Corpus path — “read docs/error-handling.md from Corpus” — and the agent will pull just that one within the Connection’s bound Collection. Naming on-demand documents in standing instructions is how you keep prompts precise without bloating the always-include set.

Now every session starts from your canonical collection, and updating it in Corpus updates every agent on the next read — no redeploy, no re-paste.

Nothing is lost: every save is a new version, concurrent edits get a resolution screen, and you can verify the chain. Non-engineers should edit directly — that’s the workflow Corpus exists to enable. Treat the changes feed as the “what moved” check before a consequential agent run.

Mix-and-match collections let you keep a small, rarely-changing core (engineering-principles) attached to many collections, while volatile specifics (q3-migration-plan) live in their own document you attach only where and when it’s relevant — and detach when it’s done, without deleting the history.