Migration: Active-tense chunks

This migration brings every ACTIVE chunk in your project into the present-tense, intent-owning standard described in Chunks. Older chunks often open retrospectively ("Currently, X") or chunk-centrically ("This chunk adds Y"). Both fail the principle that a chunk's GOAL.md should read true at every status it passes through; the migration fixes them.

Run it after upgrading ve to a version that ships the four principles and the five-status taxonomy (FUTURE, IMPLEMENTING, ACTIVE, COMPOSITE, HISTORICAL).

What it does

The agent walks every ACTIVE chunk and applies one of six actions per chunk:

  • Rewrite retrospective framing inline (preserving intent, only fixing tense).
  • Log over-claims to docs/trunk/INCONSISTENCIES/ when the goal asserts behaviors the code doesn't realize.
  • Fix broken code_paths / code_references in place when the correct target is unambiguous (typically post-refactor drift).
  • Historicalize chunks with no enduring intent (bug-fix-only or fully superseded).
  • Log cross-artifact inconsistencies discovered along the way (template-vs-template, doc-vs-code mismatches).
  • Leave clean chunks untouched.

Prerequisites

You need a recent enough ve to have the four principles and the new status taxonomy. Quick sanity check: open docs/trunk/CHUNKS.md and confirm it declares the four principles. If the file doesn't exist or doesn't list them, upgrade and re-run ve init:

$ uv tool upgrade vibe-engineer
$ ve init

(pip install --upgrade vibe-engineer works too.) The upgrade gets you the new runtime; ve init refreshes the workflow files. Your chunks and other artifacts are preserved.

The migration assumes your working tree is clean (no uncommitted changes outside the audit's scope). It commits per wave; an interrupted run is safe to resume because the operations are idempotent.

The three-phase rollout

The migration scales operator effort with confidence. You start small and hands-on, scale to medium parallel waves once you trust the pattern, then hand off to full auto once you've seen the rules behave.

Phase Scale Operator role
1. Kickoff 1 sub-agent × 5 chunks Walk every diff. Read every inconsistency entry. Verify the rules behave.
2. Confidence 10 sub-agents × 5 chunks (50/wave) Spot-check, not review every diff. Watch counts and anchors per wave.
3. Full auto All remaining chunks, wave-paced Authorize the run. Review the final report and the inconsistency log.

Phase 1: Kickoff

Pick five chunks for the first batch. Include at least one chunk you suspect is problematic: an old foundational chunk, or one whose name suggests a bug fix. The first batch's job isn't speed; it's letting you see the rules fire on cases you have an opinion about.

Run a single sub-agent against those five chunks. The agent will produce a structured summary per chunk and either rewrite the prose, log an inconsistency entry, or leave the chunk clean.

What to check

  • Read every diff. Each rewrite should change only narrative prose: no edits to success_criteria, code_paths, code_references, or the chunk's architectural claims. If a rewrite touches structured fields, something went wrong; reset and refine the prompt.
  • Read every inconsistency entry. The format should give you what you need for later triage: claim, reality, workaround, fix paths. If an entry feels vague, the detection caught something real but the agent couldn't articulate it. Resolve it manually before scaling up.
  • Verify the veto rule fired correctly. If a chunk had over-claimed scope (some success criteria not implemented in code), the agent should have logged rather than rewritten. If it rewrote anyway, the veto rule isn't enforcing, and that's a blocker.
  • Check any historicalizations. Pattern A (bug-fix-only) and Pattern B (intent fully superseded) have a high bar. If a chunk flipped to HISTORICAL, confirm it really has no enduring intent. The agent leaves the goal text alone for HISTORICAL chunks; that preserves the original record as archaeology.

Commit the batch when satisfied. If you found issues, refine the sub-agent prompt before the next batch. Do not scale up until kickoff is clean.

Phase 2: Confidence-building waves

Once kickoff is clean, scale up. Each wave runs 10 sub-agents in parallel, each handling 5 chunks. A wave covers 50 chunks and typically takes 5–10 minutes of wall-clock time. Each wave produces 1–3 commits: one for prose rewrites, one for inconsistency entries, sometimes a third for historicalizations.

What to check per wave

  • Aggregate counts. A healthy wave on a typical codebase reports something like 50–70% rewrites, 20–30% clean, and 10–20% logged or historicalized. If you see 100% rewrites or 100% clean, the detector is mis-calibrated.
  • Anchor cases. If you have known-problematic chunks, drop them into the first wave. They should fire the detector you expect (retrospective framing, over-claimed scope, etc.). If they don't, you've found a detection gap.
  • Spot-check a few diffs. Read 2–3 rewrites from each wave to confirm quality. You don't need to review every diff; the rules are doing the work.
  • Watch the inconsistency log grow. Each wave adds entries. The patterns you'll see most: stale references after a large refactor (file moved, package split), half-shipped chunks (criteria 4 of 5 are unmet), quantitative criteria that drifted (file size targets that grew past the chunk's claim).
  • Watch for systemic signals. If a single wave hits 100% veto rate, all the chunks in that wave share a root cause (often a refactor that invalidated criteria across a family). That signals "fix the root cause first" rather than "audit one chunk at a time."

Two or three waves is usually enough to build confidence. The behaviors you see in waves 2–4 are the same behaviors you'll see in waves 8–10.

Phase 3: Full auto

Once you trust the pattern, hand off the rest. Tell the agent: "run the remaining waves required to migrate all remaining chunks." The agent will:

  • Build the audit pool from remaining ACTIVE chunks not already touched.
  • Run wave after wave (10 sub-agents × 5 chunks) until the pool is empty.
  • Commit per wave with consistent messages so you can audit history later.
  • Produce a final cumulative report when the corpus is exhausted.

What to check after full auto

  • The cumulative report. Counts per action across all waves; top patterns (refactor drift, half-shipped chunks, quantitative slips); follow-up recommendations.
  • The inconsistency log. Each open entry is a piece of operator triage. Group them by type:
    • Mechanical drift (stale references after refactor): usually a single sweep chunk fixes the whole class.
    • Half-shipped behavior: per-chunk decision. Finish the implementation, narrow the goal, or historicalize.
    • Quantitative criteria slipped: relax the criteria or finish the decomposition.
    • Cross-artifact mismatches: small targeted fixes (template-vs-template, README-vs-code).
  • Spot-check a few historicalizations. The bar is high but the cost of a wrong historicalization is real (a chunk that owned intent silently demoted). Read the chunk's goal and confirm the agent's reasoning before merging.

Reference numbers

The migration was developed against the vibe-engineer project itself. For calibration, that run looked like this:

  • 341 ACTIVE chunks at the start.
  • 11 waves: 1 kickoff, 3 confidence-building waves, then 7 full-auto waves (50 chunks each, 14 in the final wave).
  • ~180 prose rewrites (the dominant action; older chunks tended to open with Currently, … framing).
  • ~25 codepath fixes, mostly post-refactor drift (src/ve.py → src/cli/…, src/models.py → src/models/, src/orchestrator/api.py → src/orchestrator/api/).
  • 7 historicalizations: six Pattern B (intent fully superseded) and one Pattern A (a deleted command's chunk).
  • 62 inconsistency entries logged for triage, of which 6 self-resolved during the run.
  • ~85 chunks audited clean.

Your numbers will differ. Younger projects with cleaner authorship will see more clean and fewer rewrites; projects with multiple major refactors in their history will see more logs and historicalizations.

Triaging the inconsistency log

The inconsistency log is the durable record of what the migration couldn't auto-fix. It's designed to outlive the migration: future audits add to it, and operator-led triage closes entries by referencing the resolving chunk or commit.

A reasonable triage cadence: review the log weekly until it's drained to a stable backlog, then revisit on demand. For each entry:

  • If the fix is mechanical and unambiguous, land it directly and mark the entry resolved.
  • If the fix needs design judgment (which way to narrow a goal, whether to finish unimplemented criteria), open a chunk for it.
  • If the entry turns out to be a false positive on closer inspection, mark it resolved with a note explaining why.

SUPERSEDED is deprecated, not removed

Older chunk corpora often contain chunks with status: SUPERSEDED, a value the workflow has retired in favor of distinguishing HISTORICAL (no longer owns intent) from COMPOSITE (shares intent with other chunks). The runtime continues to parse status: SUPERSEDED for backward compatibility, so projects can upgrade ve without first migrating their corpus.

What changes is the on-ramp. The state machine no longer accepts the transition ACTIVE → SUPERSEDED, so no new chunk can become SUPERSEDED. The off-ramp (SUPERSEDED → HISTORICAL) stays open, so the existing SUPERSEDED set can only shrink as the migration drains chunks into HISTORICAL or COMPOSITE. Parsing a SUPERSEDED chunk emits a DeprecationWarning pointing here, so unmigrated projects know what to run.

This shape avoids an upgrade-cycle trap. Removing SUPERSEDED outright would block projects from upgrading ve until they migrated their corpus, but they couldn't migrate without upgrading first. Deprecation breaks the cycle: keep parsing, close the on-ramp, let the set drain over time.

Practically: when you run /audit-intent on a corpus with SUPERSEDED chunks, one of the spawned phases walks each one and decides whether it should be HISTORICAL (intent no longer in force) or COMPOSITE (intent shared with peers), then transitions the chunk forward. SUPERSEDED chunks the operator chooses to leave alone keep working; they just keep firing the deprecation warning until handled.

When to re-run

The migration is idempotent: clean chunks stay clean, rewritten chunks don't trigger again. Re-run after:

  • A large refactor that invalidates references across many chunks.
  • A chunk-template change that introduces new conventions.
  • An extended period where chunks were authored without the present-tense gate at /chunk-create.

Re-running on a project where most chunks are already migrated is cheap: most waves will be mostly clean, with a small number of new entries reflecting recent drift.