LLM Adapter Usage Guide
This guide covers how to configure and use the expanded LLM adapter ecosystem in AINL, including fallback chains, JSON mode, and cost monitoring.
LLM Adapter Usage Guide
This guide covers how to configure and use the expanded LLM adapter ecosystem in AINL, including fallback chains, JSON mode, and cost monitoring.
Migrating from llm_query
The legacy llm_query runtime adapter remains available behind --enable-adapter llm_query or AINL_ENABLE_LLM_QUERY=true, but instantiating it emits a DeprecationWarning. Prefer the unified pipeline: register llm via register_llm_adapters and use R llm completion ... (or your graph’s equivalent) with config.yaml llm: / fallback chain as below. See adapters/llm_runtime.py and adapters/__init__.py for registration details.
Configuration
Create a config.yaml in your project root:
llm:
default_provider: openrouter
fallback_chain: [openrouter, anthropic, ollama]
circuit_breaker:
failure_threshold: 5
recovery_timeout_s: 300
providers:
openrouter:
api_key: ${OPENROUTER_API_KEY}
base_url: https://openrouter.ai/api/v1
json_mode: true
anthropic:
api_key: ${ANTHROPIC_API_KEY}
json_mode: true
ollama:
base_url: http://localhost:11434
json_mode: false
Environment variables are expanded automatically (e.g., ${OPENROUTER_API_KEY}).
Tell AINL to load this configuration:
- CLI flag:
--config path/to/config.yaml - Environment variable:
AINL_CONFIG=path/to/config.yaml
As an alternative, you can enable specific adapters without a config file using:
- CLI flag:
--enable-adapter llm.openrouter(repeatable) - Environment variable:
AINL_ENABLED_ADAPTERS=llm.openrouter,llm.anthropic
The --config approach is recommended for LLM setup because it also defines the fallback chain, circuit breaker settings, and provider-specific options.
Bootstrap Integration
In your application startup code (CLI or server):
from runtime.adapters.base import AdapterRegistry
from adapters import register_llm_adapters
import yaml
with open("config.yaml") as f:
cfg = yaml.safe_load(f)
runtime_reg = AdapterRegistry(allowed=["core", "llm", ...])
register_llm_adapters(runtime_reg, cfg)
# Then pass runtime_reg to RuntimeEngine
The register_llm_adapters helper constructs a fallback chain from the config and registers a runtime adapter named llm (and optionally individual provider adapters).
Graph Usage
AINL graphs can now use the llm adapter for text completion:
R adapter=llm target=completion args=["Translate to French: Hello", 100] out=translated
To force a specific provider, you can either:
- Use
adapter=openrouterif you registered each provider individually. - Or adjust
fallback_chainorder to prioritize one provider.
JSON Mode
When json_mode: true is set for a provider in the config, the adapter requests JSON-formatted responses (where supported). For example, to extract structured data:
R adapter=llm target=completion args=[
"Extract name and age from: John is 30 years old.",
50
] out=json_str
X fn=json_parse args=[json_str] out=data
The json_parse core function parses the string into a dictionary.
Cost Estimation & Validation
Each adapter estimates cost based on token usage and built‑in pricing tables. The CostValidator service periodically fetches live prices (OpenRouter) and compares them to the estimates. If the drift exceeds 10%, it:
- Logs a warning.
- Increments Prometheus counter
cost_estimate_drift_total{provider, model}. - Sets gauge
cost_estimate_drift_pct{provider, model}.
To enable cost validation, ensure OPENROUTER_API_KEY is set and optionally COST_VALIDATOR_INTERVAL_HOURS (default 6). The validator starts automatically on first use.
Integration with OpenClaw Token‑Budget Alerts
The AINL adapters report token usage to the existing OpenClaw cost tracking system by calling CostTracker.add_cost(...) (if available in your environment). This integrates with the token‑budget alerts you already have configured (Telegram, email). Ensure the intelligence/monitor/cost_tracker.py module is importable and the database path ~/.ainl/costs.db is writable.
If you wish to manually connect:
from intelligence.monitor.cost_tracker import CostTracker
tracker = CostTracker()
tracker.add_cost(
run_id=run_id,
provider="anthropic",
model="claude-3-5-sonnet-20241022",
prompt_tokens=usage.prompt_tokens,
completion_tokens=usage.completion_tokens,
cost_usd=adapter.estimate_cost(usage.prompt_tokens, usage.completion_tokens)
)
The BudgetPolicy can then enforce thresholds.
Observability
Circuit breaker state is emitted as a gauge: circuit_breaker_state{provider,state="closed|open|half_open"} via runtime/observability.py. You can scrape these with Prometheus or view them in the dashboard (if running the monitoring stack).
Security Notes
- LLM adapters are considered
network_facingand may be restricted in sandboxed execution. - API keys must come from environment or config; never hardcoded.
- When running untrusted graphs, ensure the adapter’s privilege tier aligns with your security policy.
Troubleshooting
- No LLM calls succeed: check API keys and network connectivity; the circuit breaker may be open. Inspect logs for
CircuitOpenError. - Cost drift warnings: review your adapter’s
MODEL_PRICINGconstants; OpenRouter prices may have changed. Consider updating your config to use a different model or adjust expected costs. - JSON mode not working: not all providers support JSON mode; ensure
json_mode: trueis set and the model is capable (e.g., GPT‑4, Claude). If the response is not JSON, check provider documentation and adapter version.
Future Extensions
- Tool use passthrough for Anthropic/Cohere (out of scope for Phase 1).
- Optional SDK‑based adapters via
ainativelang[anthropic].
Monitoring & Alerting
The AINL ecosystem includes a lightweight monitoring stack:
-
Runtime metrics: Adapters and circuit breakers emit metrics via
RuntimeObservability.emit. These are:- Printed as JSON to stderr when
AINL_OBSERVABILITY=1 - Optionally written to a JSONL file via
AINL_OBSERVABILITY_JSONL=/path/metrics.jsonl - Always forwarded to the in‑memory collector for Prometheus export
- Printed as JSON to stderr when
-
Prometheus endpoint: Run
python scripts/serve_dashboard.pyand scrapehttp://localhost:8080/api/metricsfor:circuit_breaker_state{provider="...",state="open|closed|half_open"}cost_estimate_drift_total,cost_estimate_drift_pct{provider="...",model="..."}
-
Health checks:
GET /health/livereturns200 OKif the process is alive.
GET /health/readyreturns200when collector and cost DB are ready. -
Cost validation service:
services/cost_validator.pyruns in the background (if started) and compares adapter estimates against live provider prices (OpenRouter) every 6 hours by default. On drift >10%, it emitscost_estimate_drift_totalandcost_estimate_drift_pct.
See docs/AGENT_GUIDE_INDEX.md for a full feature map.
Using LLM Adapters with MCP (Hermes‑Agent / OpenClaw Skill)
When running the AINL MCP server (scripts/ainl_mcp_server.py), LLM adapters are not enabled by default to preserve a minimal attack surface. To enable them:
- Set
AINL_CONFIGto point to your YAML config file or setAINL_MCP_LLM_ENABLED=1. - Start the server:
python scripts/ainl_mcp_server.py. - The server will:
- Load the config (with env‑var expansion).
- Call
register_llm_adapters()to register thellmcomposite and the individual provider adapters. - Relax the
networkprivilege tier restriction so LLM adapters can make outbound requests.
- Connect your MCP client (Hermes‑Agent, OpenClaw skill) to the server as usual.
If AINL_CONFIG is set, the server automatically registers the LLM adapters defined under llm.providers and uses llm.fallback_chain. If AINL_MCP_LLM_ENABLED=1 but no config is provided, the server will attempt to use default (no‑op) adapters; you should provide a config.
See docs/AGENT_GUIDE_INDEX.md for a complete agent‑oriented reference.
