Audience: Engineering teams, SREs, DevOps, platform teams
The challenge
Every engineering team eventually builds a "watchdog" — something that notices when Caddy, Cloudflare Tunnel, Maddy, or a CRM service has gone down and does something about it. The usual path: a bash script, then a Python script, then a Python script with retry logic, then one with cooldown, then one with history, then you realize you've got 400 lines of glue that nobody fully understands.
The AINL approach
examples/autonomous_ops/infrastructure_watchdog.lang compiles the full watchdog into a single auditable graph:
# infrastructure_watchdog.lang (core pattern)
S svc cron
Cr L_tick "*/5 * * * *"
include "modules/common/token_cost_memory.ainl" as tokenmem
include "modules/common/ops_memory.ainl" as opsmem
L_tick:
R svc caddy ->caddy_status
R svc cloudflared ->cloudflared_status
R svc maddy ->maddy_status
R svc crm ->crm_status
# Cooldown-gated alert: only alert if down AND cooldown expired
X caddy_alert (core.and caddy_down
(core.or (core.eq last_caddy 0)
(core.gt (- now caddy_ts) cooldown_seconds)))
If caddy_alert ->L_caddy_alert ->L_continue
L_caddy_alert:
X restart_ok (svc.restart "caddy")
R queue Put "notify" { "service": "caddy", "restart_ok": restart_ok }
Call opsmem/WRITE ->_ # persist restart event for 7-day history
J L_continue
What engineering teams get
| Capability | Raw script | AINL watchdog |
|---|---|---|
| Cooldown gating | Manual timers | Compiled into graph |
| 7-day restart history | Custom DB table | Memory adapter, TTL-managed |
| Structured health envelopes | Ad-hoc JSON | Standardized envelope format |
| Audit trail | Log files | JSONL execution tape |
| Extend without risk | Rewrite risk | Add a node, re-compile |
| MCP-accessible | No | Yes — ainl-mcp server |
Structured health envelopes
Every alert emitted follows the standard AINL health envelope:
{
"envelope": { "version": "1.0", "generated_at": "<timestamp>" },
"module": "infrastructure_watchdog",
"status": "alert",
"metrics": {
"service": "caddy",
"status": "down",
"restart_attempted": true,
"restart_ok": true
},
"history_24h": { "restart_count": 2 }
}
Downstream alerting, PagerDuty, or your own queue consumer gets a consistent shape — no per-service schema drift.
Try it
pip install ainativelang
git clone https://github.com/sbhooley/ainativelang.git
ainl check examples/autonomous_ops/infrastructure_watchdog.lang --strict
ainl visualize examples/autonomous_ops/infrastructure_watchdog.lang --output watchdog.mmd
Related: Autonomous Ops with AINL · Built with AINL: monitoring 7.2× cheaper
