Forge Platform

Documentation hydration and rehydration design

Canonical design for the Forge documentation hydration pipeline: how Markdown content seeds become reviewable multi-surface proposals, how humans promote approved material into owning repositories, and how teams…

Updated

Operator handbook: Documentation hydration agent

Contracts: schemas/doc_hydration_request.v1.schema.json, schemas/doc_hydration_result.v1.schema.json

Sample seed: samples/doc_hydration_seed.sample.md


Executive capsule

Forge hydration turns one well-written idea into governed documentation across the whole ecosystem — blog, architecture handbook, methodology, product guides — without letting generation decide what is true. Humans promote; the pipeline proposes, traces, and diffs. In v2 the same machinery becomes a documentation intelligence layer: registries know where every page came from, claims carry maturity and ownership, and drift between sources and published pages drives a reviewable rehydration backlog.

The problem this solves

Documentation across a multi-repo ecosystem decays in two directions at once: good ideas never reach all the surfaces where readers need them, and published pages silently drift from the sources that justified them. Hydration solves the first (one seed, many governed surfaces); rehydration and the claim/drift machinery solve the second (diff-driven refresh instead of rot).

How it fits the ecosystem

Forge Platform owns the schemas, registries, routing vocabulary, and this design. Execution belongs elsewhere: LCDL runs governed classification/drafting tasks, forge-workcells hosts the doc_hydration_worker runtime, Fleet provides the approved execution rail for build/link/scorecard jobs, and Lenses is where humans review packs and record decisions. Kitchensink's forge-autodoc emits docs_source_map.json at build time so every site exposes its lineage.


Vocabulary

Term Meaning
Content seed A Markdown file with YAML frontmatter that captures one idea, narrative, or field note before it is split across Forge surfaces.
Hydration First pass: parse seed → route to surfaces → write review artifacts under forge-platform/docs/hydration-runs/.
Rehydration Repeat pass on the same or updated seed: regenerate review artifacts, diff against prior run and promoted canonical docs, decide what to re-promote.
Surface A destination family (blog, platform architecture, methodology, product handbook) with a canonical repo root.
Promotion Human-approved copy from hydration output into a canonical repository, followed by that repo's normal build and deploy.
Trust boundary Hydration never writes directly to destination repos in v1; all generated material is review-only until promoted.

Hydration is proposal generation. Rehydration is controlled refresh when the source of truth (seed) or the ecosystem (routing tables, surface ownership) evolves.


Design goals

  1. Split one idea across the Forge ecosystem without collapsing Platform, SDLC, Blueprints, Lenses, LCDL, and Fleet into one monolithic doc.
  2. Keep human judgment in the loop — routing is scored, not proved; promotion is explicit.
  3. Preserve traceability — every promoted page should link back to hydration_source and seed metadata where practical.
  4. Support batch packs — many related seeds (e.g. Agentic SDLC standout series) hydrate, review, and promote as a set.
  5. Enable safe rehydration — updated seeds can refresh drafts without silently overwriting canonical narrative.

Pipeline overview

Markdown seed (.md)
       |
       v
+---------------------------+
| Normalize request         |  forge.doc_hydration_request.v1
| (frontmatter + excerpt)   |  doc-hydration-request.json
+---------------------------+
       |
       v
+---------------------------+
| Route to surfaces         |  deterministic scorer (tags, uses, title, body)
| (destination candidates)  |
+---------------------------+
       |
       v
+---------------------------+
| Generate review artifacts |  forge.doc_hydration_result.v1
|                           |  hydration-brief.md
|                           |  drafts/<surface>/<slug>.md
|                           |  promotion-checklist.md
+---------------------------+
       |
       v  (human review)
+---------------------------+
| Promote to canonical repo |  one commit per owning repo
| Build + deploy handbooks  |
+---------------------------+

v1 CLI: python3 scripts/doc_hydration_agent.py <seed.md>

v1 promotion helper (batch packs): python3 scripts/promote_hydration_pack.py

Default output directory: docs/hydration-runs/<seed-slug>/

Batch review hub example: docs/hydration-runs/forge-agentic-sdlc-standout-pack/REVIEW-INDEX.md


Content seed Markdown design

Seeds are the authoritative narrative input for hydration and rehydration. Canonical promoted pages are downstream; when in doubt, edit the seed and rehydrate.

Required shape

---
title: "Human-readable title"
status: "content-seed"
intended_uses:
  - article creation
  - blog post drafting
  - architecture documentation hydration
tags: [agentic-sdlc, governance, control-plane]
---

# Same or shortened title

Body sections...
Field Purpose
title Primary routing signal and draft headline.
status content-seed for new material; content-seed-revised when rehydrating after edits.
standout_area Optional series index (e.g. 0114) for batch packs.
intended_uses Declares audience intent — blog, architecture, methodology, knowledge, etc.
tags Ecosystem vocabulary for surface scoring.
source_note Provenance (chat export, zip bundle, certification prep).
hydration_pack Optional pack id (e.g. forge-agentic-sdlc-standout-pack) for batch runs.
prior_hydration_run Optional path to previous hydration-result.json when rehydrating.

Structured sections make promotion predictable and surface-specific trimming easier:

Section Typical surfaces
Core thesis All surfaces
Condensed thought Blog, LCDL guides
Why it stands out Blog, Lenses, Fleet
Forge ecosystem hooks Platform architecture, Blueprints
Architecture implications Platform, product handbooks
Blog post seed paragraph Blog only
Risks and counterarguments Platform, methodology
Idea evolution questions Blueprints methodology

The promotion script (promote_hydration_pack.py) maps these sections per surface. Custom seeds should follow the same headings when targeting multiple repos.

Parser support (v1)

The CLI accepts:

  • Quoted and unquoted YAML-like frontmatter strings
  • Inline tag lists: [a, b, c]
  • Indented string lists under intended_uses: and similar keys
  • Optional standout_area, source_note, and extension fields (passed through to JSON)

Surface routing design

Routing follows Forge source-of-truth ownership. One seed may propose multiple surfaces; each surface gets its own draft with audience-appropriate framing.

Surface key Canonical root When to route
forgesdlc_blog forgesdlc/blog/ Public product narrative, positioning, adoption story
forge_platform_architecture forge-platform/docs/ Control plane, workcells, contracts, boundaries
blueprints_methodology blueprints/sdlc/ Reusable process and framework guidance
forgesdlc_knowledge forgesdlc/knowledge/ Encyclopedia-style conceptual depth
forge_lcdl_docs forge-lcdl/docs/ Governed LLM tasks, structured output, evals
forge_lenses_docs forge-lenses/docs/ Workspace visibility, evidence, approvals
forge_fleet_docs forge-fleet/docs/ Controlled job execution, operator runbooks

Scoring inputs (deterministic v1)

  1. tags — keyword overlap with surface vocabularies
  2. intended_uses — explicit hydration / blog / architecture hints
  3. title — product vs process vs operator language
  4. body_excerpt — first ~500 words for mechanism terms (Fleet, LCDL, Versona, etc.)

Status outcomes

hydration-result.json status Meaning
drafted One or more destination candidates with drafts
needs_review Seed too meta/broad (index, README, compendium) — human decides manually
partial Some surfaces drafted, others blocked
failed Parse or contract error

Meta documents (pack index, README, compendium) often land in needs_review with zero candidates — by design, not failure.


Human-oriented draft pattern

Every hydration-brief.md and generated draft uses this order so material stays readable before exposing architecture depth:

  1. User problem — what the reader is struggling to explain or decide
  2. Outcome — what clarity or decision the doc should enable
  3. Forge mechanism — which Platform / SDLC / product pieces apply
  4. Trust boundary — what automation must not claim; where humans approve
  5. Next action — CTA: read, assess, adopt pattern, open Lenses, run Fleet job, etc.

Artifacts per hydration run

File Schema / type Role
doc-hydration-request.json forge.doc_hydration_request.v1 Normalized seed metadata
hydration-result.json forge.doc_hydration_result.v1 Routing, status, draft paths
hydration-brief.md Prose Human summary, rationale, risks
drafts/<surface>/<slug>.md Markdown Review-only surface drafts
promotion-checklist.md Prose Pre-promotion gate

For batch packs, add:

File Role
REVIEW-INDEX.md Reading order and links to all seed runs
_batch-summary.jsonl Machine-readable batch status

Promotion design

Promotion is always manual in v1 (optionally scripted after human approval).

Steps

  1. Read hydration-brief.md and promotion-checklist.md.
  2. Pick the draft that matches surface ownership (do not promote blog tone into Blueprints verbatim).
  3. Copy or merge into the canonical path in the owning repo (not forge-platform hydration-runs).
  4. Add frontmatter traceability when useful:
  5. hydration_source: <pack-or-run-id>
  6. status: promoted (or leave as handbook default)
  7. Update surface index (blog/index.md, docs/standout/index.md, etc.).
  8. Run the repo build (build-site.py, build-handbook.py, …).
  9. Bump website submodule pointers and deploy.

One commit per repo

Never stage promotion changes across forgesdlc, blueprints, forge-platform, and product repos in a single commit.

Surface Promoted path pattern
Blog forgesdlc/blog/<slug>.md
Platform forge-platform/docs/standout/<slug>.md
Blueprints blueprints/sdlc/methodologies/forge/standout/<slug>.md
LCDL forge-lcdl/docs/guides/standout/<slug>.md
Lenses forge-lenses/docs/forge/standout/<slug>.md
Fleet forge-fleet/docs/learn-101/standout/<slug>.md

Rehydration workflow

Rehydration applies when:

  • The seed Markdown changed (new claims, revised honesty, updated hooks)
  • Routing policy changed (new surface, renamed canonical root)
  • Canonical docs drifted from seed and you want a fresh diff baseline
  • A batch pack gets a v2 drop (e.g. new standout areas 15–16)

Rehydration procedure

  1. Edit the seed (or add a new seed file); set status: content-seed-revised and optional prior_hydration_run.
  2. Re-run hydration into a new or same run folder: bash python3 scripts/doc_hydration_agent.py path/to/seed.md \ --out-dir docs/hydration-runs/<pack>/<NN-seed-slug>
  3. Diff new hydration-result.json and drafts against:
  4. Previous run in docs/hydration-runs/…
  5. Promoted files in canonical repos (git diff, or compare hydration_source blocks)
  6. Review hydration-brief.md for routing changes (new surfaces, dropped candidates).
  7. Selective promote — only changed sections or full replace per surface; do not blind overwrite human edits in canonical repos without review.
  8. Rebuild and deploy affected handbooks only.

Rehydration rules

Rule Rationale
Seeds win for narrative intent Canonical docs are published views; seeds hold the evolving story.
Canonical docs win for repo-specific facts URLs, commands, schema ids, and build steps must match the owning repo.
Never auto-promote on rehydrate in v1 Same trust boundary as initial hydration.
Keep hydration_source stable across promotes Enables tracing which pack/run produced a page.
Batch rehydrate via REVIEW-INDEX.md Review series coherence before piecemeal promotion.

When not to rehydrate

  • Typo fixes in one canonical page only → edit canonical repo directly.
  • Pure handbook template/nav churn → kitchensink / generator fix, not seed rehydration.
  • Emergency factual correction → patch canonical doc; backport to seed if the seed is still active.

Claim lifecycle (v2)

Hydration v2 treats individual claims — capability facts, architecture directions, positioning statements, process guidance — as governed objects in docs-governance/claim_registry.jsonl (forge.doc_claim.v1).

States

State Meaning Who moves it
candidate Extracted by scripts/extract_claims.py or an LCDL task; not yet trusted extractor
needs_owner_review Flagged for a named owner (e.g. overclaim risk detected) planner / reviewer
accepted Owner agrees the claim is directionally true owner
canonical_fact Backed by T0 evidence; may support published pages owner
deprecated No longer true; pages citing it must rehydrate owner
conflicted Two sources disagree; publication blocked pending a conflict report planner
blocked Explicitly barred from publication owner

Rules

  1. Claim extraction is deterministic first (sentence heuristics over capability verbs and maturity words); LCDL classification may refine claim_type and maturity_signal but never sets canonical_fact.
  2. Every hydration run emits claim-inventory.json (forge.claim_inventory.v1) and route-confidence.json alongside the v1 artifacts.
  3. A page's frontmatter claim_ids ties it to registry claims; deprecating a claim puts every citing page on the rehydration backlog.

Drift triggers (v2)

scripts/detect_drift.py emits docs-governance/drift_report.json (forge.doc_rehydration_diff.v1) and appends non-info findings to docs-governance/rehydration_backlog.jsonl. Triggers:

Finding Comparison Severity
seed_page_drift Seed body hash + section headings vs promoted page info
maturity_claim_mismatch Registry maturity vs page maturity wording; deprecated/conflicted claims still on pages warning–critical
stale_review Registry last_reviewed vs source file mtime warning
missing_source_file Registry entry vs disk critical
unregistered_page / orphan_registry_entry scripts/build_content_registry.py --check warning
external_claim_expired T3/T4 claims past review_by warning

Rehydration in v2 is diff-driven: a rehydration run consumes backlog entries and emits claim_delta.json plus diff_against_canonical.md, so the reviewer sees exactly what changed and why before any promotion.


Trust boundary and risks

Generated material may:

  • Overstate implementation readiness (defined vs demonstrated vs vision)
  • Route process guidance into product blog surfaces
  • Route product claims into Blueprints methodology
  • Omit required cross-links or build steps
  • Invent claims not present in the seed

promotion-checklist.md exists to force explicit checks before any promote or re-promote.


Knowledge events and planning (v2)

Change signals from registered sources land in an append-only ledger (docs-governance/events/*.jsonl, forge.knowledge_event.v1). Three shipped adapters feed it: adapter_markdown_repo.py (git-diff watcher over T0 repos), adapter_design_doc.py (design-wiki status transitions: accepted → rehydrate, superseded/rejected → deprecate + rehydrate), and adapter_external_note.py (manual T3/T4 capture with observation date and auto-expiry). adapter_search_log.py imports search misses. The quarantine gate is absolute: events from unregistered or paused sources are recorded as QUARANTINED and never planned.

scripts/hydration_planner.py turns un-planned events plus the claim and content registries into reviewable forge.doc_hydration_plan.v1 records with an 11-decision enum (no_actionblock_due_to_conflict). When registered sources disagree, the planner emits block_due_to_conflict and writes docs-governance/conflicts/<id>.md; nothing drafts while a conflict is open.

Workcell path (v2, shipped)

The doc_hydration_worker workcell is implemented in forge-workcells:

  1. It consumes a forge.workcell_request.v1 (launch_pack.seed_path, optional target_surface, persona, use_llm).
  2. Deterministic claim extraction runs by default; with use_llm the governed forge-lcdl tasks doc_claim_extraction / doc_risk_classification (and doc_draft_expansion for drafts) run instead — the LLM classifies and drafts, never decides truth; all claims stay candidate.
  3. It emits hydration-brief.md, claim-inventory.json (forge.claim_inventory.v1), and a forge.workcell_result.v1 carrying the caller's frun_* id with status: needs_approval.
  4. forge-lenses presents review packs read-only at GET /api/doc-hydration/review-packs; the reviewer decision manifest is the approval record (writeback is a later increment — vision).
  5. Docs build / link-check / scorecard runs execute on forge-fleet as template-only docker_argv jobs (docs/examples/doc-hydration/ in the fleet repo).

See the workcell catalog for the envelope contract and promotion gates.

Telemetry and the adaptive loop (v2)

Per-site artifacts (link_report.json, orphan_pages.json, stale_claims.json, persona_coverage.json) come from scripts/site_docs_artifacts.py; scripts/docs_scorecard.py aggregates them into a weighted 8-dimension scorecard with build-to-build deltas (printed in the deploy summary). Scorecard regressions, expired claims, link rot, drift findings, and agent_retrieval_eval.py misses all append to docs-governance/rehydration_backlog.jsonl automatically.

scripts/hydration_cycle.py is the one-command PDCA loop: ingest → resolve → plan (including archive proposals for pages unreviewed > 365 days) → report (quality gates G1–G8, scorecard, retrieval eval). The cycle never auto-promotes, drafts, or archives. Operating rhythm: docs-governance/OPERATING-CADENCE.md.


Who this is for

Architects and platform engineers evaluating Forge (evaluate stage). Executives can stop after the core thesis; agents should honor the agent contract in the page frontmatter.

Evidence and maturity

Maturity: defined — this page records design intent captured from the Agentic SDLC standout analysis and reviewed for the platform handbook. Where behavior is shipped and observable it is called out explicitly; everything else should be read as direction, not commitment. Evidence trail: the hydration pack seed, the content registry entry for this content_id, and the claim registry entries linked via claim_ids.

How to use this page

  • Evaluating Forge? Read the core thesis and ecosystem hooks, then continue with the standout index.
  • Designing against the platform? Verify boundaries in the platform reference architecture before implementation.
  • Automating? Use the frontmatter agent_contract and cite content_id in generated output.

Agent contract

Allowed actions
  • summarize this page for evaluation questions
  • cross-reference claims via claim_ids and source_refs
Safe to infer
  • the architectural boundary described here is the intended design
Do not infer
  • production readiness beyond the stated maturity level
  • specific vendor, customer, or benchmark commitments
Key artifacts
  • forge-platform/docs-governance/content_registry.yaml

Machine-readable guidance from this page's frontmatter: what automated consumers may do, infer, and must not infer.