AI Native Lang
showcaseFeatured

Showcase: Self-Healing Infrastructure Watchdog for Engineering Teams

A compiled AINL graph polls four services every 5 minutes, auto-restarts downed processes with cooldown gating, persists a 7-day restart history, and emits structured health envelopes to your alerting queue — deterministically, with a full JSONL execution tape.

March 28, 2026·2 min read
#showcase#enterprise#infrastructure#SRE#monitoring#self-healing#compile-once-run-many#mcp
Share:TwitterLinkedIn

Audience: Engineering teams, SREs, DevOps, platform teams

The challenge

Every engineering team eventually builds a "watchdog" — something that notices when Caddy, Cloudflare Tunnel, Maddy, or a CRM service has gone down and does something about it. The usual path: a bash script, then a Python script, then a Python script with retry logic, then one with cooldown, then one with history, then you realize you've got 400 lines of glue that nobody fully understands.

The AINL approach

examples/autonomous_ops/infrastructure_watchdog.lang compiles the full watchdog into a single auditable graph:

# infrastructure_watchdog.lang (core pattern)
S svc cron
Cr L_tick "*/5 * * * *"
include "modules/common/token_cost_memory.ainl" as tokenmem
include "modules/common/ops_memory.ainl" as opsmem

L_tick:
  R svc caddy ->caddy_status
  R svc cloudflared ->cloudflared_status
  R svc maddy ->maddy_status
  R svc crm ->crm_status

  # Cooldown-gated alert: only alert if down AND cooldown expired
  X caddy_alert (core.and caddy_down
    (core.or (core.eq last_caddy 0)
             (core.gt (- now caddy_ts) cooldown_seconds)))

  If caddy_alert ->L_caddy_alert ->L_continue

L_caddy_alert:
  X restart_ok (svc.restart "caddy")
  R queue Put "notify" { "service": "caddy", "restart_ok": restart_ok }
  Call opsmem/WRITE ->_   # persist restart event for 7-day history
  J L_continue

What engineering teams get

| Capability | Raw script | AINL watchdog | |---|---|---| | Cooldown gating | Manual timers | Compiled into graph | | 7-day restart history | Custom DB table | Memory adapter, TTL-managed | | Structured health envelopes | Ad-hoc JSON | Standardized envelope format | | Audit trail | Log files | JSONL execution tape | | Extend without risk | Rewrite risk | Add a node, re-compile | | MCP-accessible | No | Yes — ainl-mcp server |

Structured health envelopes

Every alert emitted follows the standard AINL health envelope:

{
  "envelope": { "version": "1.0", "generated_at": "<timestamp>" },
  "module": "infrastructure_watchdog",
  "status": "alert",
  "metrics": {
    "service": "caddy",
    "status": "down",
    "restart_attempted": true,
    "restart_ok": true
  },
  "history_24h": { "restart_count": 2 }
}

Downstream alerting, PagerDuty, or your own queue consumer gets a consistent shape — no per-service schema drift.

Try it

pip install ainativelang
git clone https://github.com/sbhooley/ainativelang.git
ainl check examples/autonomous_ops/infrastructure_watchdog.lang --strict
ainl visualize examples/autonomous_ops/infrastructure_watchdog.lang --output watchdog.mmd

Related: Autonomous Ops with AINL · Built with AINL: monitoring 7.2× cheaper

A

AI Native Lang Team

The team behind AI Native Lang — building deterministic AI workflow infrastructure.

Related Articles