Testing AINL Graphs
How to write unit tests, integration tests, and CI/CD pipelines for AINL programs.
Testing AINL Graphs
How to write unit tests, integration tests, and CI/CD pipelines for AINL programs.
🧪 Why Test AINL Graphs?
Even with compile-time validation, runtime can bring surprises:
- Adapter failures (API down, auth errors)
- LLM output format changes (despite prompts)
- Data format mismatches from external systems
- Performance regressions (latency, token cost)
Testing ensures your graphs behave correctly under various conditions.
🛠️ Test Types
| Type | Scope | Tooling | |------|-------|---------| | Unit | Single node in isolation | ainl test --unit | | Integration | Whole graph with mocked adapters | ainl test --integration | | Property | Graph invariants (always terminates, no cycles) | ainl validate --strict | | Performance | Latency, token usage budgets | ainl benchmark, ainl run --trace-jsonl | | Contract | Graph output matches schema for all valid inputs | JSON Schema validation |
📁 Test Project Structure
my-ainl-project/
├── graphs/
│ ├── monitor.ainl
│ └── alert.ainl
├── tests/
│ ├── unit/
│ │ └── test_classify_node.py
│ ├── integration/
│ │ ├── test_monitor_graph.py
│ │ └── fixtures/
│ │ └── sample_log.json
│ └── conftest.py # shared fixtures
├── ainl.yaml # config for tests (mock adapters)
├── pyproject.toml
└── README.md
🧪 Unit Testing Nodes
Mock adapters to test node logic in isolation.
Example: Test that classify node correctly parses LLM response.
# tests/unit/test_classify_node.py
import pytest
from ainl import Node, LLMNode
from ainl.testing import MockAdapter
def test_classify_node_parses_critical():
"""Classification returns CRITICAL for severity keywords."""
# Arrange
mock = MockAdapter()
mock.set_response("classify-error", "CRITICAL")
node = LLMNode(
id="classify",
adapter="mock",
model="mock-model",
prompt="Classify: {{input.message}}"
)
node.adapter = mock
# Act
result = node.run({
"input": {
"message": "Database timeout after 30s",
"level": "error"
}
})
# Assert
assert result == "CRITICAL"
Run unit tests:
ainl test --unit tests/
🔗 Integration Testing Graphs
Test entire graph with sample inputs and mocked adapters.
# tests/integration/test_monitor_graph.py
import json
import pytest
from ainl import Graph
from ainl.testing import MockAdapter
@pytest.fixture
def mock_adapters():
# Configure mock responses for all LLM nodes
mocks = {
"classify": MockAdapter(response="CRITICAL"),
"alert": MockAdapter(response="🚨 Database timeout detected")
}
return mocks
def test_monitor_critical_path(mock_adapters):
"""Critical error should trigger Slack alert."""
# Load graph
graph = Graph.from_file("graphs/monitor.ainl")
# Inject mocks
for node_id, adapter in mock_adapters.items():
graph.get_node(node_id).adapter = adapter
# Run with sample input
with open("tests/fixtures/sample_log.json") as f:
input_data = json.load(f)
output = graph.run(input_data)
# Assert output
assert output["severity"] == "CRITICAL"
assert output["action_taken"] == "send_slack"
# Verify Slack node was called (we can add call count to MockAdapter)
slack_node = graph.get_node("send_slack")
assert slack_node.called is True
Fixtures store sample inputs/outputs:
// tests/fixtures/sample_log.json
{
"timestamp": "2025-03-30T14:22:15Z",
"level": "error",
"message": "Database connection timeout",
"service": "payment-processor"
}
🎯 Property-Based Testing
Test invariants across many random inputs using Hypothesis.
# tests/property/test_graph_invariants.py
from hypothesis import given, strategies as st
from ainl import Graph
graph = Graph.from_file("graphs/monitor.ainl")
@given(
level=st.sampled_from(["info", "warning", "error", "critical"]),
service=st.text(min_size=1, max_size=50),
message=st.text(min_size=1, max_size=200)
)
def test_graph_never_crashes(level, service, message):
"""Graph should not raise exceptions for any valid input."""
input_data = {
"timestamp": "2025-03-30T14:22:15Z",
"level": level,
"message": message,
"service": service
}
# Graph should complete without error
output = graph.run(input_data, mock_adapters=True)
# Output should always have severity and action_taken
assert "severity" in output
assert "action_taken" in output
assert output["severity"] in ["CRITICAL", "WARNING", "INFO"]
Run with:
pytest tests/property/ -v
📊 Performance Regression Testing
Ensure graphs don't get slower or more expensive over time.
# tests/performance/test_monitor_performance.py
import time
from ainl import Graph
from ainl.testing import MockAdapter
def test_monitor_performance_budget():
"""Monitor graph should complete within budget."""
graph = Graph.from_file("graphs/monitor.ainl")
# Mock all external calls for consistent timing
for node in graph.nodes:
if node.type in ["llm", "http"]:
node.adapter = MockAdapter(latency_ms=100)
input_data = load_test_input()
start = time.time()
output = graph.run(input_data)
elapsed = time.time() - start
# Assertions
assert elapsed < 2.0, f"Graph took {elapsed}s, budget 2s"
assert output["severity"] is not None
Track token usage from trace JSONL and assert within budget.
🔁 CI/CD Integration
Add to GitHub Actions:
name: AINL CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install AINL
run: pip install ainl ainl-adapter-mock pytest
- name: Validate graphs
run: |
ainl validate --strict graphs/*.ainl
- name: Unit tests
run: |
ainl test --unit tests/unit/
- name: Integration tests
run: |
ainl test --integration tests/integration/
- name: Performance baseline
run: |
ainl benchmark graphs/monitor.ainl --output benchmarks.json
python scripts/check_regression.py benchmarks.json
Benchmark script checks that runtime hasn't increased >10% since last commit.
🧩 Mock Adapters
AINL includes a MockAdapter for testing:
from ainl.testing import MockAdapter
# Return fixed response
mock = MockAdapter(response="CRITICAL")
# Simulate latency
mock = MockAdapter(latency_ms=500, response="OK")
# Simulate failure
mock = MockAdapter(error=RuntimeError("Service unavailable"))
# Expect specific prompt (fails if prompt doesn't match)
mock = MockAdapter(expected_prompt="Classify: {{input}}")
Use in tests to isolate graph logic from external dependencies.
📈 Test Coverage
AINL's ainl test reports coverage:
tests/
unit/ 100% nodes tested
integration/ 85% edge coverage (2/4 branches untested)
Overall: 92%
Improve coverage by:
- Testing all
switchbranches - Testing error paths (adapter failures)
- Testing edge cases (empty inputs, null values)
🐛 Debugging Failed Tests
Test hangs
Likely deadlock or infinite retry. Set adapter timeout:
mock = MockAdapter(response="OK", timeout=1.0) # seconds
Assertion fails: node not called
Remember: when conditions skip nodes. Ensure test input triggers branch.
Mock not used
Node may be using cached result. Clear cache with graph.clear_cache().
📚 Example: Full Test Suite
# tests/conftest.py
import pytest
from ainl import Graph
from ainl.testing import MockAdapter
@pytest.fixture
def monitor_graph():
graph = Graph.from_file("graphs/monitor.ainl")
# Configure mocks
for node in graph.nodes:
if node.type == "llm":
node.adapter = MockAdapter(response="CRITICAL")
elif node.type == "http":
node.adapter = MockAdapter(success_code=200)
return graph
# tests/integration/test_monitor.py
def test_critical_triggers_slack(monitor_graph):
input_data = {"log": {"level": "error", "msg": "DB down"}}
output = monitor_graph.run(input_data)
assert output["action"] == "slack_alert"
assert monitor_graph.get_node("send_slack").called
def test_info_skips_alert(monitor_graph):
input_data = {"log": {"level": "info", "msg": "Started"}}
output = monitor_graph.run(input_data)
assert output["action"] == "log_file"
assert not monitor_graph.get_node("send_slack").called
🎯 Best Practices
- Mock all external calls – never hit real APIs in unit/integration tests
- Test all branches – cover every
switchcase - Test error scenarios – adapter failures, malformed LLM output
- Keep fixtures small – one JSON per test scenario
- Use property testing for random input resilience
- Benchmark in CI – catch performance regressions early
🔗 Related
- Monitoring Guide – Production observability
- Graphs & IR – Understand IR for debugging
- CLI Reference –
ainl testoptions
Write tests, ship with confidence! →
