Training Alignment Runbook
This runbook documents the end-to-end AINL model alignment pipeline currently used in this repository.
Training Alignment Runbook
This runbook documents the end-to-end AINL model alignment pipeline currently used in this repository.
Purpose
Run one command that:
- builds supervision datasets,
- trains LoRA,
- sweeps checkpoints by task metrics,
- runs final constrained eval gate,
- computes trends + pass/fail gate,
- writes machine-readable run health.
Entrypoint: scripts/run_alignment_cycle.sh
Command Shape
bash scripts/run_alignment_cycle.sh \
[ADAPTER_OUT] \
[VARIANTS_PER_PROMPT] \
[EPOCHS] \
[MAX_NEW_TOKENS] \
[DISTILL_MODE] \
[SAMPLES_PER_PROMPT] \
[TOP_K] \
[BOOST_EVAL_REPORT] \
[MIN_STRICT_RATE] \
[MIN_RUNTIME_RATE] \
[MIN_NONEMPTY_RATE] \
[MAX_REGRESSION_STRICT] \
[MAX_REGRESSION_RUNTIME] \
[MAX_REGRESSION_NONEMPTY] \
[QUANTIZATION_MODE] \
[CANONICALIZE_CHUNK_LINES] \
[CANONICALIZE_MAX_LINES]
Argument Reference
ADAPTER_OUT(defaultmodels/ainl-phi3-lora-v5-aligned): output adapter rootVARIANTS_PER_PROMPT(default24): prompt variant count for dataset buildersEPOCHS(default2): fine-tune epochsMAX_NEW_TOKENS(default40): generation budget for sweep/final gateDISTILL_MODE(default1): include teacher-distill dataset stage (1on)SAMPLES_PER_PROMPT(default40): distill mix target sizeTOP_K(default3): checkpoints to retain after sweepBOOST_EVAL_REPORT(default empty): optional eval report path for failure-focused boost datasetMIN_STRICT_RATE(default0.60): trend gate minimum strict AINL rateMIN_RUNTIME_RATE(default0.75): trend gate minimum runtime compile rateMIN_NONEMPTY_RATE(default0.70): trend gate minimum nonempty output rateMAX_REGRESSION_STRICT(default0.05): allowed strict rate drop vs previous reportMAX_REGRESSION_RUNTIME(default0.05): allowed runtime compile drop vs previous reportMAX_REGRESSION_NONEMPTY(default0.05): allowed nonempty drop vs previous reportQUANTIZATION_MODE(none|dynamic-int8, defaultnone): eval/infer quantization modeCANONICALIZE_CHUNK_LINES(default256): chunk size for host-side canonicalizationCANONICALIZE_MAX_LINES(default512): cap retained canonicalized lines
Stage-by-Stage Flow
Stage 1: Regression supervision build
Script: scripts/build_regression_supervision.py
Key behavior:
- generates canonical paired supervision
- enforces minimum complexity (
--min-lines 3)
Stage 2: Teacher distillation build (optional)
Script: scripts/teacher_distill_dataset.py
Mix used by cycle:
50% gold35% repair15% check-rewrite
Stage 3: Optional failure-family boost build
Script: scripts/build_failure_boost_dataset.py
If BOOST_EVAL_REPORT is provided, failing prompt IDs from that report get targeted
extra supervision.
Stage 4: Fine-tuning
Script: scripts/finetune_ainl.py
Cycle default profile:
- balanced profile
- MPS device
- frequent save/eval checkpoints for sweep selection
Stage 5: Checkpoint sweep
Script: scripts/sweep_checkpoints.py
Ranking priority:
strict_ainl_rateruntime_compile_ratenonempty_rate
Configured with:
- constrained decoding
- repair attempts
- canonicalization
- prompt length bucketing
- diagnostics emission
- optional quantization mode forwarding
Stage 6: Final eval gate (selected checkpoint)
Script: scripts/eval_finetuned_model.py
Configured with:
- constrained decoding
- repair loop
- canonicalization with chunk bounds
- prompt-length bucketing
- timing + constraint diagnostics
- optional quantization mode
Stage 7: Trend analysis + quality gate
Script: scripts/analyze_eval_trends.py
Gate behavior:
- enforces absolute minimum quality thresholds
- enforces regression limits vs previous report
- exits non-zero on gate failure
Stage 8: Run health summary
Writes: corpus/curated/alignment_run_health.json
Contains:
- pass/fail status
- selected adapter
- top metrics
- gate details
- artifact pointers
Artifacts to Inspect After a Run
corpus/curated/checkpoint_sweep_report_v5_aligned.jsoncorpus/curated/model_eval_report_v5_aligned.jsoncorpus/curated/model_eval_trends.jsoncorpus/curated/alignment_run_health.json
Quick Validation Commands
python scripts/analyze_eval_trends.py --help
python scripts/eval_finetuned_model.py --help
python scripts/sweep_checkpoints.py --help
Troubleshooting
-
High fallback rate / EOS never allowed
- inspect
constraint_healthalerts in eval diagnostics - adjust grammar/strict-prefix rules or EOS minimum-structure gating
- inspect
-
Gate fails despite low eval loss
- this is expected if structure/compile metrics regress
- select checkpoints by strict/runtime metrics (already sweep default)
-
Long eval runtimes
- reduce prompt set size for sweeps (
--limit) - reduce
MAX_NEW_TOKENS - keep prompt-length bucketing enabled
- reduce prompt set size for sweeps (
-
Quantization instability
- use
QUANTIZATION_MODE=none - dynamic-int8 is safe only on CPU path; scripts fall back automatically
- use
