AI Agent Continuity Guide
This project is intentionally designed for multi-session, multi-agent development. Use this guide to continue work safely and efficiently across handoffs.
AI Agent Continuity Guide
This project is intentionally designed for multi-session, multi-agent development. Use this guide to continue work safely and efficiently across handoffs.
Primary Goal
Preserve correctness of AINL language/runtime behavior while improving model quality on strict canonical AINL generation.
First-Read Checklist (Every New Session)
- If you will do implementation work: Read
docs/BOT_ONBOARDING.mdand complete the steps indocs/OPENCLAW_IMPLEMENTATION_PREFLIGHT.mdbefore coding (seetooling/bot_bootstrap.json). - Read
README.mdanddocs/DOCS_INDEX.md. - Read
docs/AINL_SPEC.mdandSEMANTICS.md. - Read
docs/RUNTIME_COMPILER_CONTRACT.md(compiler/runtime + grammar ownership contract). - Read
docs/TRAINING_ALIGNMENT_RUNBOOK.mdbefore touching train/eval scripts. - Read
docs/DOCS_MAINTENANCE.mdbefore broad documentation edits. - Read latest reports:
corpus/curated/alignment_run_health.jsoncorpus/curated/model_eval_trends.json
- Inspect current generation quality report:
corpus/curated/model_eval_report_v5_aligned.json(or latest variant)
- If planning an integration or major change, review consultant reports:
AI_CONSULTANT_REPORT_APOLLO.md— OpenClawocladapter integration strategydocs/ZEROCLAW_INTEGRATION.md— ZeroClaw skill + MCP bootstrap (parallel integration path)
- Operator / field narratives (what shipped in live stacks):
agent_reports/README.md— indexed OpenClaw agent field reports (e.g. Day 2 AINL King, 2026-03-19). - Intelligence monitors (memory / bootstrap / summarizer AINL):
docs/INTELLIGENCE_PROGRAMS.mdandscripts/run_intelligence.py.
Ground Rules for Safe Progress
- Do not weaken strict AINL validation just to improve score.
- Do not patch runtime to hide compiler-analysis gaps; fix compiler-owned contract logic.
- Prefer additive changes (new flags/diagnostics) over breaking behavior.
- Keep constrained decoding deterministic for eval comparability.
- Preserve machine-readable artifacts in
corpus/curated/. - If adding optimization knobs, wire them through:
- script CLI
- cycle script
- output diagnostics
- docs
Pipeline Components and Ownership
scripts/finetune_ainl.py: model training.scripts/sweep_checkpoints.py: objective-aligned checkpoint selection.scripts/eval_finetuned_model.py: constrained generation + compile/repair + diagnostics.scripts/analyze_eval_trends.py: cross-run trend and regression gate logic.scripts/run_alignment_cycle.sh: orchestrates all above and emits final run-health verdict.
Canonical Quality Metrics
Use these as top-line quality signals:
strict_ainl_rateruntime_compile_ratenonempty_rate
Treat eval_loss as secondary for model selection in this project.
Runtime/Compiler Contract Reminder
- Canonical runtime semantics live in
runtime/engine.py(RuntimeEngine). ExecutionEngineis compatibility-only (runtime/compat.py, re-exported viaruntime.py/runtime/__init__.py).- Runtime normalization/shape helpers are compiler-owned (
compiler_v2.pyruntime helper functions). - In strict mode, bare identifier-like tokens in read positions are treated as vars; quote string literals explicitly.
Required Diagnostics to Keep
- Constraint diagnostics (
fallback_used_steps,eos_blocked, strict rejects) - Failure-family counters
- Timing breakdown (prepare, generate, compile, repair, total)
- Length-bucket breakdown for quality and speed
- Quantization diagnostics when enabled
Fast Triage Procedure
When quality drops:
- Check
alignment_run_health.jsonstatus and gate failures. - Check
model_eval_trends.jsondeltas and worst bucket. - Check failure families in final eval report.
- Inspect constraint alerts for over-pruning symptoms.
- Decide between:
- data boost (target failing families)
- constraint tuning
- decoding/token budget adjustments
Handoff Template
When ending a session, leave a concise note with:
- What changed (files + behavior)
- What was validated (commands + pass/fail)
- Current bottleneck
- Recommended next command
This keeps both humans and smaller agents effective with minimal context loss.
